Data Monitoring Procedures
Saving Online Monitoring Data
The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.
Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.
All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.
For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.
Running Over Archived Data
Once the files are written to take we can run the online plugins on these files to confirm what we were seeing in the online monitoring. Manual scripts, and eventually cron jobs can be set up to look for new run numbers and run the plugin over a sample of files.
Details of Offline Monitoring
Below are the procedures to
- run a single offline plugin job manually
- run a cron job to automate the process for new files
In principle these scripts should work, but if there are changes in the directory structure for the rawdata files, or if there is a significant increase in the memory or disk space necessary for the jobs, these should be modified.
Generating an offline plugin job
Within /home/gluex/halld/monitoring/batch/ there will be scripts to run the online monitoring plugins over tape files. The main script is generatejobs_plugins_rawdata.sh, which can be used as
where XXX is the run #.
This will generate a script run_rawdata_XXXXXX.sh, where the run # has now been formatted to be 6 digits. Executing this script will send the monitoring plugins job to the Auger batch system.
There is also a script clean.sh which can be used as
This will clean up all associated files created in association with run XXX.
Internally, the xml file used to submit the job will be created, and the job to run will be given within script.sh. All run parameters should be specified in at the beginning of generatejobs_plugins_rawdata.sh Since we are running on tape, the tape file will first be copied over to the cache disk, and the job will run over this cached file.
Using cron to run automatically
Within /home/gluex/halld/monitoring/cron/ there is a file cron_plugins that can be executed via
This will set up a cron job to call the script scan_for_jobs.sh, which will check in the rawdata directory and call generatejobs_plugins_rawdata.sh for any run that is more than 5 min old. The cron job is set up to run every 10 min.
Extracting Summary Data
For high-level monitoring, we save images of selected histograms and store time series of selected quantities in a database, which are then displayed on a web page. This section describes how to generate the monitoring images and database information.
The scripts used to generate this summary data are currently kept in /u/home/gluex/halld/monitoring/process Note that these scripts currently have some parameters which must be periodically set by hand.
The default python version on most JLab machine does not have the modules to allow these scripts to connect to the MySQL database. To run these scripts, load the environment with the following command
There are two scripts for running over the monitoring data generated by the online system and offline reconstruction. The online script is run with either of the following commands:
./check_new_runs.py OR ./check_new_runs.csh
The shell script sets up the environment properly to run the python script. To connect to the monitoring database on the JLab CUE, modules continued in the installation of python >= 2.7 are needed. The shell script is appropriate to use in a cron job.
The online monitoring system copies a ROOT file containing the results of the online monitoring, and other configuration files into a directory accessible outside the counting house. This python script automatically checks for new ROOT files, which it will then automatically process. It contains several configuration variables that must be correctly set, which contains the location of input/output directories, etc...
The processing of offline monitoring data should be run after a new reconstruction pass is done. The data is processed using the following script:
./process_new_offline_data.py <input directory> <output directory> EXAMPLE: ./process_new_offline_data.py /u/scratch/gluex/offline_monitoring /w/halld-scifs1a/data_monitoring/RunPeriod-2014-10/ver01
Every time a new reconstruction pass is performed, a new version number must be generated. To do this, prepare a version file as described below. Then run the register_new_version.py script to store the information in the database. The script will return a version number, which then should be set by hand in process_new_offline_data.py - future versions of the script will streamline this part of the procedure. An example of how to generate a new version is:
./register_new_version.py add /u/home/gluex/halld/monitoring/process/versions/vers_RunPeriod-2014-10_pass1.txt
To document the conditions of the monitoring data that is created, for the sake of reproducability and further analysis we save several pieces of information. The format is intended to be comprehensive enough to document not just monitoring data, but versions of raw and reconstructed data, so that this database table can be used for the event database as well.
We store one record per pass through one run period, with the following structure:
|data_type||The level of data we are processing. For the purposes of monitoring, "rawdata" is the online monitoring, "recon" is the offline monitoring|
|run_period||The run period of the data|
|revision||An integer specifying which pass through the run period this data corresponds to|
|software_version||The name of the XML file that specifies the different software versions used|
|jana_config||The name of the text file that specifies which JANA options were passed to the reconstruction program|
|ccdb_context||The value of JANA_CALIB_CONTEXT, which specifies the version of calibration constants that were used|
|production_time||The data at which monitoring/reconstruction began|
|dataVersionString||A convenient string for identifying this version of the data|
An example file used as as input to ./register_new_version.py is:
data_type = recon run_period = RunPeriod-2014-10 revision = 1 software_version = soft_comm_2014_11_06.xml jana_config = jana_rawdata_comm_2014_11_06.conf ccdb_context = calibtime=2014-11-10 production_time = 2014-11-10 dataVersionString = recon_RunPeriod-2014-10_20141110_ver01