.. fcsArchiver.py command line archival documentation fcsArchiver.py (CLI) ============================================= .. automodule::fcsArchiver Final Cut Server is deeply integrated with PresStore archival software to provide seamless access to media assets while preserving available disk space. There are two distinct components involved in the actual archive and restore process: Final Cut Server, and Archiware PresStore archival software. The integration between these two systems is provided mainly by the Python script ``/usr/local/bin/fcsArchiver.py``. This script takes a number of options, though the primary usage will be through the ``-p`` flag, for process queues: >>> /usr/local/bin/transmogrifier.py -p When ran in this form, transmogrifier will process first restore queues, followed by archive queues. These queues are stored in the form of a SQLite3 database named ``backupHistory.db``, found at the path designated in our settings file by attribute ``supportPath``. For any previously submitted jobs, ``fcsArchiver.py`` will check with PresStore on their status. For any jobs that have completed, ``fcsArchiver.py`` will remove their entry from the queue, and create a record in the ``archiveHistory`` table recording the accomplishment. For any failed or cancelled jobs, ``fcsArchiver.py`` will resubmit the job to PresStore for reprocessing. For more information on the ``backupHistory.db`` database, see `The fcsArchiver Database`_. Files are added to the queue via two command line scripts, ``addToArchiveQueue.sh`` and ``addToRestoreQueue.sh``, each located by default at ``/usr/local/bin``. Both commands take a single argument, a path to a file to add to either the archive or restore queue, respectively. Each script has a corresponding queue, in the form of two plain text files, ``filesToArchive``, and ``filesToRestore``, located in the support folder designated by our configuration parameter ``supportPath``. When either ``addToArchiveQueue.sh`` or ``addToRestoreQueue.sh`` are called, they will consult these plain text queue files, and if the provided file path is not already cataloged in the file, it will append the new path. When ``fcsArchiver.py`` is ran with the ``-p`` flag, it will first process queues loaded into the SQLite3 database. After this is done, it will consult each of these flat file queues, and add the new file paths into the system. This is when MD5 checksums are ran and compared against the archive history, if the file has never been archived in it’s current form, it will be sent to PresStore for archiving. When restore queues are processed at this point, ``fcsArchiver.py`` will first ensure that the file does not already exist on disk, either in it’s online location, or at it’s designated location on the archive device. If the file does not exist in either location, ``fcsArchiver.py`` will restore the asset from tape. However, a restore job will only be submitted to PresStore once the tape is available in the library; if the tape is not available, ``fcsArchiver.py`` will instead send an offline media notification, and the file will be reprocessed during the next execution. This will continue until the appropriate restore media is placed inside the library. Syntax ++++++++++++++++++++++++++++++++++++++++++ The fcsArchiver.py script has the following usage: :: FCS Archiver Version: 1.0b Build: 2011040702 Framework Version: 1.0b Build: 2011041301 Copyright (C) 2009-2011 Beau Hunter, 318 Inc. Usage: fcsArchiver.py [option] Options: -h, --help Displays this help message -v, --version Display version number -f configfilepath, Use specified config file --configFile=configfilepath -p, --processQueue Process archive and restore queues --processRestoreQueue Process restore queues --processArchiveQueue Process archive queues --getVolumeBarcode Lists volume barcode for the requested file --getVolumeLabel (must be used with --file option) --file='/path/to/file' --getVolumeBarcodeForFile= Outputs barcode for specified file --getVolumeLabelForFile= Outputs label for specified file --getVolumeBarcodeForLabel= Outputs the barcode for the specified label Examples: fcsArchiver.py --processArchiveQueue fcsArchiver.py --getVolumeBarcode --file='/myfile.txt' fcsArchiver.py --getVolumeBarcodeForFile='/myfile.txt' fcsArchiver.py --getVolumeBarcodeForLabel=10001 Configuration ++++++++++++++++++++++++++++++++++++++++++ By default, the fcsArchiver.py script utilizes the configuration file found at /usr/local/etc/fcsArchiver.conf. In this file, fcsArchiver will queue off of a number of parameters configured under the ``[fcsArchiver]`` Section. The following shows an example fcsArchiver configuration: :: [GLOBAL] archivePath=/Users/Shared/FCSStore/Archive supportPath=/Users/Shared/FCSStore/Support/Archive debug=False [BACKUP] useOffsitePlan=True archivePlan=10001 offsiteArchivePlan=10002 backupSystem=PresStore nsdchatpath=/usr/local/aw/bin/nsdchat nsdchatUseSSL=True nsdchatUseSudo=False remoteSSLHost=hax.lbc remoteSSLUserName=root trustRestoreChecksumMismatch=True preventArchiveDuplicates=True [NOTIFICATIONS] SMTPServer=hax.lbc SMTPPort=25 SMTPUser='' SMTPPassword='' emailToNotify=hunterbj@hax.lbc emailFromAddress=fcs@hax.lbc As shown above, there are four specific settings which we will read in from this file, broken off into several sections [GLOBAL] **archivePath** The full path to the Final Cut Server archive device root **supportPath** The full path to a support folder which contains our sqlite3 database and queue files **debug** Specify whether we run in debug mode [BACKUP] **archivePlan** (*str*) -- The name of the archive plan to utilize (*i.e. 'FCSOnsitePlan'*) **useOffsitePlan** (*bool*) -- Specifies whether to duplicate archive jobs to a separate offsite archive plan **offsiteArchivePlan** (*str*) -- The name of the offsite archive plan to use if useOffsitePlan is True **backupSystem** (*str*) -- The name of the backup system **nsdchatpath** (*str*) -- The filesystem path to nsdchat binary **nsdchatUseSSL** (*bool*) -- Specifies whether we use ssh to a remote host for nsdchat calls if True, we will reference 'remoteSSLHost' and 'remoteSSLUserName' for connection information. **remoteSSLHost** (*str*) -- The IP or DNS name of remote host to call for nsdchat **remoteSSLUserName** (*str*) -- The Username of remote host to call for nsdchat .. note: In order to utilize nsdchatUseSSL, you will need to setup key-based SSH authentication. **trustRestoreChecksumMismatch** (*bool*) -- Specify whether we trust checksum mismatches for restores: if an asset exists on disk with a differing checksum, we will replace it when restoring if this option is set to ``False``. If set to ``True``, we will forego the restore from tape. **preventArchiveDuplicates** (*bool*) -- Specify whether we trust checksum's to skip tape archives. If this option is set to ``True``, if an asset is archived and the asset already has an entry in our ``archiveHistory`` database with an identical checksum, we will forego the archive to tape and remove the asset from disk. If set to ``False``, we will re-archive the asset. [NOTIFICATIONS] **SMTPServer** (*str*) -- The IP or DNS name of remote host to utilize for email notifications. **SMTPUser** (*str*) -- The username to utilize for authenticated SMTP email notifications. This value should be ommited if unauthenticated SMTP is desired. **SMTPPassword** (*str*) -- The password to utilize for authenticated SMTP email notifications. This value should be ommited if unauthenticated SMTP is desired. **emailToNotify** (*str*) -- The email address that notifications are sent to. **emailFromAddress** (*str*) -- The From address used by email notifications .. note: If desired, an alternate configuration file can be used through the ``--configFile=`` parameter. Example Usage ++++++++++++++++++++++++++++++++++++++++++ The fcsArchiver.py script has fairly limited scope in regards to command line options. In it's typical usage, we will simply have process both restore and archive queues (in that order). To accomplish this, we simply use the ``-p`` flag: :: >>> fcsArchiver.py -p Apr 15 02:56:11: INFO : Processing Restore Queues... Apr 15 02:56:11: INFO : Found 0 running restore jobs. Apr 15 02:56:11: INFO : Checking for new restore files... Apr 15 02:56:11: INFO : Restore Queue is empty. Apr 15 02:56:11: INFO : Finished processing all restore queues. Apr 15 02:56:11: INFO : Processing Archive Queues... Apr 15 02:56:11: INFO : Found 0 running archive jobs. Apr 15 02:56:11: INFO : Checking for new archive files... Apr 15 02:56:11: INFO : Archive Queue is empty. Apr 15 02:56:11: INFO : Finished processing all archive queues.. When ran in this form, transmogrifier will process first restore queues, followed by archive queues. These queues are stored in the form of a SQLite3 database named ``backupHistory.db``, found at the path designated in our settings file by attribute ``supportPath``. For any previously submitted jobs, ``fcsArchiver.py`` will check with PresStore on their status. For any jobs that have completed, ``fcsArchiver.py`` will remove their entry from the queue, and create a record in the ``archiveHistory`` table recording the accomplishment. For any failed or cancelled jobs, ``fcsArchiver.py`` will resubmit the job to PresStore for reprocessing. For more information on the ``backupHistory.db`` database, see `The fcsArchiver Database`_. We can also process solely archive or restore queues by using the flags ``--processArchiveQueue`` or ``--processRestoreQueue``, respectively: :: >>> fcsArchiver.py --processRestoreQueue Apr 15 02:58:23: INFO : Processing Restore Queues... Apr 15 02:58:23: INFO : Found 0 running restore jobs. Apr 15 02:58:23: INFO : Checking for new restore files... Apr 15 02:58:23: INFO : Restore Queue is empty. Apr 15 02:58:23: INFO : Finished processing all restore queues. The ``fcsArchiver.py`` script can also be utilized to query PresStore to determine the tape that a particular file has been archived to: :: >>> fcsArchiver.py --getVolumeLabelForFile='/my/archive/device/myfile.mov' --tapeSet=onsite LABEL: 10001 >>> fcsArchiver.py --getVolumeBarcodeForFile='/my/archive/device/myfile.mov' --tapeSet=onsite BARCODE: A00001 >>> fcsArchiver.py --getVolumeBarcodeForFile='/my/archive/device/myfile.mov' --tapeSet=offsite BARCODE: B00001 The Scheduler ++++++++++++++++++++++++++++++++++++++++++ fcsArchiver.py is routinely fired via a launchd plist, located at ``/Library/LaunchDaemons/com.318.fcsarchiver.plist``. This plist has a few notable declarations. First and foremost, it will execute the ``fcsArchiver.py`` script with the syntax every 15 minutes: >>> /usr/local/bin/fcsArchiver.py -p This launchd plist will also redirect stdout and stderr from runtime to the file located at ``/var/logs/transmogrifier/fcsArchiver.log``. This log can be consulted to determine any current activity being pursued by the script. .. note: In some configurations it is desirable to break this launchd.plist into two separate plists: ``com.318.fcsarchiver.archive.plist``, and ``com.318.fcsarchiver.restore.plist``, so that archive’s and restore’s are processed independently. Unfortunately, this creates problems as Archiware’s ``nsdchat`` command line interface seems to deal poorly with multiple concurrent sessions: resulting in the command erratically returning back bad data. Thus it is recommended to only segregate the two automations if utilizing a disk-based workflow using ``fcsDiskArchiver.py``. Starting or stopping the automatic schedule for ``fcsArchiver`` is achieved using the standard ``launchctl`` cli tool. To start ``fcsArchiver`` to run every 15 minutes, the following command can be used: >>> sudo launchctl load -w /Library/LaunchDaemons/com.318.fcsarchiver.plist To stop the automation, we simple substitute ‘unload’: >>> sudo launchctl unload -w /Library/LaunchDaemons/com.318.fcsarchiver.plist It’s a good idea to stop ``fcsArchiver.py`` in the event that the backup server is taken down, to prevent unnecessary processing cycles. .. warning: If ``fcsArchiver.py`` is terminated while reading in new queue items from the ``filesToArchive`` or ``filesToRestore`` flat files, any unprocessed entries present in the flat file will be lost. The following shows the contenst of the ``com.318.fcsarchiver.plist`` launch daemon: :: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.318.fcsArchiver</string> <key>UserName</key> <string>admin</string> <key>ProgramArguments</key> <array> <string>/usr/local/bin/fcsArchiver.py</string> <string>-p</string> </array> <key>StartInterval</key> <integer>600</integer> <key>StandardOutPath</key> <string>/var/log/transmogrifier/fcsArchiver.log</string> <key>StandardErrorPath</key> <string>/var/log/transmogrifier/fcsArchiver.log</string> <key>RunAtLoad</key> <true/> </dict> </plist> When configuring, it is important to ensure that the value of key ``UserName`` to the user which Final Cut Serve was installed under. It is also important to ensure that the log file at ``/var/log/transmogrifier/fcsArchiver.log`` is writable by that user. The fcsArchiver Database ++++++++++++++++++++++++++++++++++++++++++ Upon each operation, ``fcsArchiver.py`` will consult it's populated queue database to keep track of file archive operations and requests. These queues are stored in the form of a SQLite3 database, located at the root of the fcsArchiver support path in a file named ``backupHistory.db``. This database file contains three tables: ``archiveQueue``, ``restoreQueue``, and ``archiveHistory``. The first table, ``archiveQueue``, holds all records for active archive requests and has the following schema: :: CREATE TABLE archiveQueue (fcsID,filePath,checksum,archiveSet,tapeSet,jobID,jobSubmitDate,retryCount,status); There are a few notable fields: the ``fcsID`` naturally is used for communication back with the Final Cut Server Asset. The ``filePath`` field is the full path to the asset when as it resides on the FCS Archive Device. The ``checksum`` field is an md5 checksum of the file, used to detect changes and prevent duplication for assets which have been restored and re-archived without changes. The ``archiveSet`` field designates the batch name in which the file was processed, this will typically be a Date+Time stamped identifier. The ``jobID`` field stores the PresStore jobid for the job in which the file was submitted to PresStore. The ``tapeSet`` field help us keep track of the asset should it need to pass through multiple tapesets (i.e. onsite, followed by offsite). The ``status`` field specifies where in the process the file is, i.e. ‘archiveQueued’, ‘archiveRunning’,‘archiveFailed’, etc.. The ``restoreQueue`` table is structure identically to the ``archiveQueue`` table, and functions in the same way. The only major difference is the removal of the ``checksum`` field and the addition of a ``barcode`` field, which is used to designate the tape from which the asset will be restored. :: CREATE TABLE restoreQueue (fcsID,filePath,archiveSet,tapeSet,barcode,jobID,jobSubmitDate,retryCount,status); The ``archiveHistory`` table represents both archive and restore jobs which have either completed successfully or terminated fatally. It has the following schema: :: CREATE TABLE archiveHistory (fcsID,filePath,checksum,barcode,tapeSet,archiveSet,jobID,completionDate,status); This table is structured a bit differently than either the archive or restore jobs, and will record both the ``checksum`` and ``barcode`` for finished jobs. When new assets are archived, their checksums will be compared to the checksum for any previous record in the ``archiveHistory`` database; provided a previous record is found, the asset will simply be removed from storage rather than re-archived.