.. fcsArchiver.py command line archival documentation

fcsArchiver.py (CLI)
=============================================

.. automodule::fcsArchiver

Final Cut Server is deeply integrated with PresStore archival software to 
provide seamless access to media assets while preserving available disk space. 
There are two distinct components involved in the actual archive and restore 
process: Final Cut Server, and Archiware PresStore archival software. The
integration between these two systems is provided mainly by the Python script 
``/usr/local/bin/fcsArchiver.py``. This script takes a number of options, 
though the primary usage will be through the ``-p`` flag, for process queues:

>>> /usr/local/bin/transmogrifier.py -p

When ran in this form, transmogrifier will process first restore queues, 
followed by archive queues. These queues are stored in the form of a SQLite3 
database named ``backupHistory.db``, found at the path designated in our settings
file by attribute ``supportPath``. For any previously submitted jobs, 
``fcsArchiver.py`` will check with PresStore on their status. 
For any jobs that have completed, ``fcsArchiver.py`` will remove their entry 
from the queue, and create a record in the ``archiveHistory`` table recording 
the accomplishment. For any failed or cancelled jobs, ``fcsArchiver.py`` will
resubmit the job to PresStore for reprocessing. For more information on the 
``backupHistory.db`` database, see `The fcsArchiver Database`_.

Files are added to the queue via two command line scripts, 
``addToArchiveQueue.sh`` and ``addToRestoreQueue.sh``, each located by 
default at ``/usr/local/bin``. Both commands take a single argument, a path to 
a file to add to either the archive or restore queue, respectively. Each script
has a corresponding queue, in the form of two plain text files, ``filesToArchive``,
and ``filesToRestore``, located in the support folder designated by
our configuration parameter ``supportPath``.

When either ``addToArchiveQueue.sh`` or ``addToRestoreQueue.sh`` are called, 
they will consult these plain text queue files, and if the provided file path 
is not already cataloged in the file, it will append the new path. When 
``fcsArchiver.py`` is ran with the ``-p`` flag, it will first process queues 
loaded into the SQLite3 database. After this is done, it will consult each of 
these flat file queues, and add the new file paths into the system. 
This is when MD5 checksums are ran and compared against the archive history, 
if the file has never been archived in it’s current form, it will be sent 
to PresStore for archiving. When restore queues are processed at this point, 
``fcsArchiver.py`` will first ensure that the file does not already exist on 
disk, either in it’s online location, or at it’s designated location on the 
archive device. If the file does not exist in either location, ``fcsArchiver.py``
will restore the asset from tape. However, a restore job will only be submitted
to PresStore once the tape is available in the library; if the tape is not 
available, ``fcsArchiver.py`` will instead send an offline media notification,
and the file will be reprocessed during the next execution. This will continue 
until the appropriate restore media is placed inside the library.


Syntax
++++++++++++++++++++++++++++++++++++++++++

The fcsArchiver.py script has the following usage: ::
   
  FCS Archiver
    Version: 1.0b Build: 2011040702
    Framework Version: 1.0b Build: 2011041301

  Copyright (C) 2009-2011 Beau Hunter, 318 Inc.

  Usage: 
  
  fcsArchiver.py [option]
    
  Options: 
    -h, --help                   Displays this help message
    -v, --version                Display version number
    -f configfilepath,           Use specified config file
      --configFile=configfilepath
    -p, --processQueue           Process archive and restore queues
        --processRestoreQueue    Process restore queues
        --processArchiveQueue    Process archive queues
        
    --getVolumeBarcode           Lists volume barcode for the requested file
    --getVolumeLabel             (must be used with --file option)
    --file='/path/to/file'
    
    --getVolumeBarcodeForFile=   Outputs barcode for specified file
    --getVolumeLabelForFile=     Outputs label for specified file
    --getVolumeBarcodeForLabel=  Outputs the barcode for the specified label

  Examples:
    fcsArchiver.py --processArchiveQueue
    fcsArchiver.py --getVolumeBarcode --file='/myfile.txt'
    fcsArchiver.py --getVolumeBarcodeForFile='/myfile.txt'
    fcsArchiver.py --getVolumeBarcodeForLabel=10001
  


  


Configuration
++++++++++++++++++++++++++++++++++++++++++  
By default, the fcsArchiver.py script utilizes the configuration file found at 
/usr/local/etc/fcsArchiver.conf. In this file, 
fcsArchiver will queue off of a number of parameters configured under
the ``[fcsArchiver]`` Section. 

The following shows an example fcsArchiver configuration: ::

  [GLOBAL]
  archivePath=/Users/Shared/FCSStore/Archive
  supportPath=/Users/Shared/FCSStore/Support/Archive
  debug=False


  [BACKUP]
  useOffsitePlan=True
  archivePlan=10001
  offsiteArchivePlan=10002
  backupSystem=PresStore
  nsdchatpath=/usr/local/aw/bin/nsdchat
  nsdchatUseSSL=True
  nsdchatUseSudo=False
  remoteSSLHost=hax.lbc
  remoteSSLUserName=root
  trustRestoreChecksumMismatch=True
  preventArchiveDuplicates=True

  [NOTIFICATIONS]
  SMTPServer=hax.lbc
  SMTPPort=25
  SMTPUser=''
  SMTPPassword=''
  emailToNotify=hunterbj@hax.lbc
  emailFromAddress=fcs@hax.lbc

As shown above, there are four specific settings which we will read in from
this file, broken off into several sections

[GLOBAL]
  **archivePath**
    The full path to the Final Cut Server archive device root
  
  **supportPath**
    The full path to a support folder which contains our sqlite3 database and 
    queue files
    
  **debug**
    Specify whether we run in debug mode

[BACKUP]
  **archivePlan**
    (*str*) -- The name of the archive plan to utilize (*i.e. 'FCSOnsitePlan'*)
    
  **useOffsitePlan**
    (*bool*) -- Specifies whether to duplicate archive jobs to a separate 
    offsite archive plan
  
  **offsiteArchivePlan**
    (*str*) -- The name of the offsite archive plan to use if useOffsitePlan 
      is True
  
  **backupSystem**
    (*str*) -- The name of the backup system
    
  **nsdchatpath**
    (*str*) -- The filesystem path to nsdchat binary
  
  **nsdchatUseSSL**
    (*bool*) -- Specifies whether we use ssh to a remote host for nsdchat calls
      if True, we will reference 'remoteSSLHost' and 'remoteSSLUserName' for
      connection information.
      
  **remoteSSLHost**
    (*str*) -- The IP or DNS name of remote host to call for nsdchat
  
  **remoteSSLUserName**
    (*str*) -- The Username of remote host to call for nsdchat
  
  .. note:
    In order to utilize nsdchatUseSSL, you will need to setup key-based SSH
    authentication.   
  
  **trustRestoreChecksumMismatch**
    (*bool*) -- Specify whether we trust checksum mismatches for restores: if 
      an asset exists on disk with a differing checksum, we will replace it when
      restoring if this option is set to ``False``. If set to ``True``, we will
      forego the restore from tape.
      
  **preventArchiveDuplicates**
    (*bool*) -- Specify whether we trust checksum's to skip tape archives. If
      this option is set to ``True``, if an asset is archived and the asset 
      already has an entry in our ``archiveHistory`` database with an identical
      checksum, we will forego the archive to tape and remove the asset from
      disk. If set to ``False``, we will re-archive the asset.
  

[NOTIFICATIONS]
  **SMTPServer**
    (*str*) -- The IP or DNS name of remote host to utilize for email notifications.
  
  **SMTPUser**
    (*str*) -- The username to utilize for authenticated SMTP email notifications.
    This value should be ommited if unauthenticated SMTP is desired. 
  
  **SMTPPassword**
    (*str*) -- The password to utilize for authenticated SMTP email notifications.
    This value should be ommited if unauthenticated SMTP is desired.
  
  **emailToNotify**
    (*str*) -- The email address that notifications are sent to.
  
  **emailFromAddress**
    (*str*) -- The From address used by email notifications

  
.. note:
  If desired, an alternate configuration file can be used through 
  the ``--configFile=`` parameter.
  

Example Usage
++++++++++++++++++++++++++++++++++++++++++
The fcsArchiver.py script has fairly limited scope in regards to command
line options. In it's typical usage, we will simply have process both
restore and archive queues (in that order). To accomplish this, we simply use
the ``-p`` flag: ::

  >>> fcsArchiver.py -p
  Apr 15 02:56:11:  INFO   :   Processing Restore Queues...
  Apr 15 02:56:11:  INFO   :     Found 0 running restore jobs.
  Apr 15 02:56:11:  INFO   :   Checking for new restore files...
  Apr 15 02:56:11:  INFO   :     Restore Queue is empty.
  Apr 15 02:56:11:  INFO   :   Finished processing all restore queues.
  Apr 15 02:56:11:  INFO   :   Processing Archive Queues...
  Apr 15 02:56:11:  INFO   :     Found 0 running archive jobs.
  Apr 15 02:56:11:  INFO   :   Checking for new archive files...
  Apr 15 02:56:11:  INFO   :     Archive Queue is empty.
  Apr 15 02:56:11:  INFO   :   Finished processing all archive queues..


When ran in this form, transmogrifier will process first restore queues, 
followed by archive queues. These queues are stored in the form of a SQLite3 
database named ``backupHistory.db``, found at the path designated in our settings
file by attribute ``supportPath``. For any previously submitted jobs, 
``fcsArchiver.py`` will check with PresStore on their status. 
For any jobs that have completed, ``fcsArchiver.py`` will remove their entry 
from the queue, and create a record in the ``archiveHistory`` table recording 
the accomplishment. For any failed or cancelled jobs, ``fcsArchiver.py`` will
resubmit the job to PresStore for reprocessing. For more information on the 
``backupHistory.db`` database, see `The fcsArchiver Database`_.



We can also process solely archive or restore queues by using the flags
``--processArchiveQueue`` or ``--processRestoreQueue``, respectively: ::

  >>> fcsArchiver.py --processRestoreQueue
  Apr 15 02:58:23:  INFO   :   Processing Restore Queues...
  Apr 15 02:58:23:  INFO   :     Found 0 running restore jobs.
  Apr 15 02:58:23:  INFO   :   Checking for new restore files...
  Apr 15 02:58:23:  INFO   :     Restore Queue is empty.
  Apr 15 02:58:23:  INFO   :   Finished processing all restore queues.

  
The ``fcsArchiver.py`` script can also be utilized to query PresStore to 
determine the tape that a particular file has been archived to: ::

  >>> fcsArchiver.py --getVolumeLabelForFile='/my/archive/device/myfile.mov' --tapeSet=onsite
  LABEL: 10001
  
  >>> fcsArchiver.py --getVolumeBarcodeForFile='/my/archive/device/myfile.mov' --tapeSet=onsite
  BARCODE: A00001
  
  >>> fcsArchiver.py --getVolumeBarcodeForFile='/my/archive/device/myfile.mov' --tapeSet=offsite
  BARCODE: B00001


The Scheduler
++++++++++++++++++++++++++++++++++++++++++
fcsArchiver.py is routinely fired via a launchd plist, located at 
``/Library/LaunchDaemons/com.318.fcsarchiver.plist``. This plist has a few 
notable declarations. First and foremost, it will execute the ``fcsArchiver.py``
script with the syntax every 15 minutes:

  >>> /usr/local/bin/fcsArchiver.py -p

This launchd plist will also redirect stdout and stderr from runtime to the file
located at ``/var/logs/transmogrifier/fcsArchiver.log``. This log can be 
consulted to determine any current activity being pursued by the script.

.. note:
  In some configurations it is desirable to break this launchd.plist 
  into two separate plists: ``com.318.fcsarchiver.archive.plist``, and
  ``com.318.fcsarchiver.restore.plist``, so that archive’s and restore’s 
  are processed independently. Unfortunately, this creates problems as 
  Archiware’s ``nsdchat`` command line interface seems to deal poorly with 
  multiple concurrent sessions: resulting in the command erratically 
  returning back bad data. Thus it is recommended to only segregate the two 
  automations if utilizing a disk-based workflow using ``fcsDiskArchiver.py``.
  
Starting or stopping the automatic schedule for ``fcsArchiver`` is achieved
using the standard ``launchctl`` cli tool. To start ``fcsArchiver`` to run 
every 15 minutes, the following command can be used:

  >>> sudo launchctl load -w /Library/LaunchDaemons/com.318.fcsarchiver.plist


To stop the automation, we simple substitute ‘unload’:

  >>> sudo launchctl unload -w /Library/LaunchDaemons/com.318.fcsarchiver.plist

It’s a good idea to stop ``fcsArchiver.py`` in the event that the backup
server is taken down, to prevent unnecessary processing cycles.

.. warning:
  If ``fcsArchiver.py`` is terminated while reading in new queue items from the 
  ``filesToArchive`` or ``filesToRestore`` flat files, any unprocessed entries
  present in the flat file will be lost.
  
The following shows the contenst of the ``com.318.fcsarchiver.plist`` launch daemon: ::

  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
  <plist version="1.0">
  <dict>
          <key>Label</key>
          <string>com.318.fcsArchiver</string>
          <key>UserName</key>
          <string>admin</string>
          <key>ProgramArguments</key>
          <array>
                  <string>/usr/local/bin/fcsArchiver.py</string>
                  <string>-p</string>
          </array>
          <key>StartInterval</key>
          <integer>600</integer>
          <key>StandardOutPath</key>
          <string>/var/log/transmogrifier/fcsArchiver.log</string>
          <key>StandardErrorPath</key>
          <string>/var/log/transmogrifier/fcsArchiver.log</string>
          <key>RunAtLoad</key>
          <true/>
  </dict>
  </plist>

When configuring, it is important to ensure that the value of key ``UserName``
to the user which Final Cut Serve was installed under. It is also important
to ensure that the log file at ``/var/log/transmogrifier/fcsArchiver.log`` is
writable by that user.

The fcsArchiver Database
++++++++++++++++++++++++++++++++++++++++++

Upon each operation, ``fcsArchiver.py`` will consult it's populated queue 
database to keep track of file archive operations and requests. These queues 
are stored in the form of a SQLite3 database, located at the root of the 
fcsArchiver support path in a file named ``backupHistory.db``.
This database file contains three tables: ``archiveQueue``, ``restoreQueue``,
and ``archiveHistory``. The first table, ``archiveQueue``, holds all records 
for active archive requests and has the following schema: ::

  CREATE TABLE archiveQueue (fcsID,filePath,checksum,archiveSet,tapeSet,jobID,jobSubmitDate,retryCount,status);

There are a few notable fields: the ``fcsID`` naturally is used for 
communication back with the Final Cut Server Asset. The ``filePath`` field is
the full path to the asset when as it resides on the FCS Archive Device. The
``checksum`` field is an md5 checksum of the file, used to detect changes and
prevent duplication for assets which have been restored and re-archived without 
changes. The ``archiveSet`` field designates the batch name in which the file 
was processed, this will typically be a Date+Time stamped identifier. The ``jobID``
field stores the PresStore jobid for the job in which the file was submitted 
to PresStore. The ``tapeSet`` field help us keep track of the asset should it 
need to pass through multiple tapesets (i.e. onsite, followed by offsite). 
The ``status`` field specifies where in the process the file is, i.e. 
‘archiveQueued’, ‘archiveRunning’,‘archiveFailed’, etc..

The ``restoreQueue`` table is structure identically to the ``archiveQueue``
table, and functions in the same way. The only major difference is the removal 
of the ``checksum`` field and the addition of a ``barcode`` field, which is used 
to designate the tape from which the asset will be restored. ::

  CREATE TABLE restoreQueue (fcsID,filePath,archiveSet,tapeSet,barcode,jobID,jobSubmitDate,retryCount,status);

The ``archiveHistory`` table represents both archive and restore jobs which have 
either completed successfully or terminated fatally. It has the following schema: ::

  CREATE TABLE archiveHistory (fcsID,filePath,checksum,barcode,tapeSet,archiveSet,jobID,completionDate,status);

This table is structured a bit differently than either the archive or restore 
jobs, and will record both the ``checksum`` and ``barcode`` for finished jobs. 
When new assets are archived, their checksums will be compared to the checksum 
for any previous record in the ``archiveHistory`` database; provided a previous 
record is found, the asset will simply be removed from storage rather than 
re-archived.