Difference between revisions of "Data Monitoring Procedures"

From GlueXWiki
Jump to: navigation, search
(Running Over Archived Data)
(Procedures: Details)
 
(191 intermediate revisions by 7 users not shown)
Line 1: Line 1:
 
__TOC__
 
__TOC__
  
==Saving Online Monitoring Data==
+
== Master List of File / Database / Webpage Locations ==
 +
=== Run Conditions ===
 +
* Online Run-by-run condition files (B-field, current, etc.): /work/halld/online_monitoring/conditions/
 +
* Offline monitoring run conditions (software versions, jana config): /group/halld/data_monitoring/run_conditions/
 +
*[http://www.jlab.org/Hall-D/test/RunInfo/ Run Info vers. 1]
 +
*[https://halldweb.jlab.org/cgi-bin/data_monitoring/run_conditions.pl Run Info vers. 2]
 +
*[https://halldweb.jlab.org/rcdb RCDB]
  
The procedure for writing the data out is given in, e.g.,
+
=== Monitoring Output Files ===
[https://halldweb1.jlab.org/wiki/index.php/Raid-to-Silo_Transfer_Strategy Raid-to-Silo Transfer Strategy].
+
* Run Periods 201Y-MM is for example 2015-03, launch ver verVV is for example ver15
 +
* Online monitoring histograms: /work/halld/online_monitoring/root/
 +
* Offline monitoring histogram ROOT files (merged): /work/halld/data_monitoring/RunPeriod-201Y-MM/verVV/rootfiles
 +
* individual files for each job (ROOT, REST, log, etc.): /volatile/halld/offline_monitoring/RunPeriod-201Y-MM/verVV/
  
Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape,
+
=== Monitoring Database ===
and within ~20 min., we will have access to the file on tape at
+
* Accessing monitoring database (on ifarm): mysql -u datmon -h hallddb.jlab.org data_monitoring
/mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.
+
  
All online monitoring plugins will be run as data is taken.
+
=== Monitoring Webpages ===
They will be accessible within the counting house via RootSpy, and
+
*[https://halldweb.jlab.org/wiki/index.php/Monitoring_webpage_help Help]
for each run and file, a ROOT file containing the histograms will be saved
+
*[https://halldweb.jlab.org/data_monitoring/Plot_Browser.html Plot Browser]
within a subdirectory for each run.
+
*[https://halldweb.jlab.org/cgi-bin/data_monitoring/monitoring/runBrowser.py Run Browser]
 +
*[https://halldweb.jlab.org/cgi-bin/data_monitoring/monitoring/versionBrowser.py Version Browser]
 +
*[https://halldweb.jlab.org/cgi-bin/data_monitoring/monitoring/timeSeries.py Time Series]
 +
*[https://halldweb.jlab.org/data_monitoring/launch_analysis/ Launch Analysis]
 +
*[https://halldweb.jlab.org/cgi-bin/data_monitoring/monitoring/recontestBrowser.py Recon Tests]
  
For immediate access to these files, the raid disk files may be accessed directly
+
== SciComp Job Links ==
from the counting house, or the tape files will be available within ~20 min. of the
+
=== Main ===
file being written out.
+
* [https://scicomp.jlab.org/scicomp/ Scientific Computing Home Page]
 +
* [https://scicomp.jlab.org/scicomp/#/auger/jobs Auger Job Status Page]
 +
* [https://scicomp.jlab.org/scicomp/#/jasmine/jobs JasMine Tape Job Status Page]
  
== Launching and Tracking Jobs ==
+
=== Documentation ===
 +
* [https://scicomp.jlab.org/docs/batch Batch System]
 +
* [https://scicomp.jlab.org/docs/storage Mass Storage System]
 +
* [https://scicomp.jlab.org/docs/write-through-cache Write-Through Cache]
 +
* [https://scicomp.jlab.org/docs/swif SWIF]
 +
* [https://scicomp.jlab.org/docs/swif-cli SWIF Command Line]
  
* This section details instructions on how to create and launch a set of jobs using the Hall-D job management system developed by Mark Ito. These instructions are generic: this system can be used for the weekly monitoring jobs, but can also be used for other sets of job launches as well.
+
=== Job Tracking ===
 +
* [http://scicomp.jlab.org/farm2/job.html Completed Job History]
 +
* [http://scicomp.jlab.org/farm2/project.html Job Stats By Project]
 +
* [http://scicomp.jlab.org/farm2/trackOrg.html Job Stats By Track]
 +
* [http://scicomp.jlab.org/farm2/report.html Cluster Report]
 +
* [http://scicomp.jlab.org/farm2/walltime.html Walltime Distribution]
  
=== Database Table Overview ===
+
== Procedures: Overview ==
  
* Job management database table (<project_name>): For each input file, keeps track of whether or not a job for it has been submitted, along with other optional fields.
+
=== Online Monitoring: During Experimental Running ===
  
* Job status database table (<project_name>Job (no space)): For each job, keeps track of the job-id, the job status, memory used, cpu & wall time, time taken to complete various stages (e.g. pending, dependency, active), and others.  
+
After every run is finished, a ROOT file containing histograms from the online monitoring system and a file containing some run conditions are copied to directories under /work/halld/online_monitoring . A cronjob running in the counting house performs this function.
  
=== Initialize Project Management ===
+
This ROOT file is processed similarly to the offline monitoring results, and are made available under the same webpages as "ver00" of the relevant run period.
  
* Log into the ifarm machine with one of the gxproj accounts
+
For more details on the online monitoring system, see [https://halldweb.jlab.org/hdops/wiki/index.php/Online_Monitoring_Shift  this page].
<pre>
+
ssh gxproj1@ifarm -Y
+
</pre>
+
  
* Go to the project scripts folder and add the perl script directory to the current $PATH environment variable:
+
=== Offline Monitoring and Reconstruction: During Experimental Running ===
<pre>
+
cd ~/halld/jproj/scripts/
+
source setup.csh
+
</pre>
+
  
* Come up with a name for your job submission project.  It will be a unique identifier for the current set of job submissions.  For example, for the 10th pass over the 10/2014 data for the offline monitoring:  
+
During experimental running, the following offline monitoring procedures should be performed, each with a different gxprojN account, so that they don't interfere with each other:  
<pre>
+
offmon_rp2014m10_v10
+
</pre>
+
  
* However, the output file name format changed during the 10/2014 commissioning run (hd_raw_* --> hd_rawdata_*). Since these scripts assume a fixed file name format, for these runs an additional identifier should be used, e.g.:
+
# '''Incoming:''' Monitor the first <span style="color:red">5</span> files of each newly-recorded run as soon as it hits the tape.  
<pre>
+
# '''Monitoring Launches:''' Every <span style="color:red">two</span> weeks, do a monitoring launch over the first <span style="color:red">5</span> files of all runs currently available on the tape.
offmon_rp2014m10_v10_type1, offmon_rp2014m10_v10_type2
+
# '''Initial Reconstruction Launch:''' As soon as a new group (e.g. <span style="color:red">~100</span> runs) of data is initially semi-well calibrated, do a preliminary full reconstruction launch over all files in that group.
</pre>
+
#* We can add user analysis plugins to this launch, including those with ROOT TTree output, provided that they work and don't take much memory.
  
* Copy and rename an existing set of project files to create new project files for your project(s).  For example:
+
Note that the monitoring is limited to the first <span style="color:red">5</span> files of each run, because data is being recorded to tape at a faster rate than the monitoring can keep up with.  Also, during the experimental run, each run will only be fully-reconstructed once, because it will be difficult enough to keep up with the incoming data.
<pre>
+
cd ~/halld/jproj/projects/
+
cp -r offmon_rp2014m10_v10_type1 offmon_rp2014m10_v11_type1
+
cp -r offmon_rp2014m10_v10_type2 offmon_rp2014m10_v11_type2
+
</pre>
+
  
* For each project, descend into the new directory, and make changes to each file so that it will work for your project.  These changes typically include:
+
=== Offline Monitoring and Reconstruction: After Experimental Running ===
** Changing the project name (e.g. offmon_rp2014m10_v10_type1 --> offmon_rp2014m10_v11_type1) in both the .jproj and .jsub file names, and in the contents of each file.
+
** If the project version number has changed, update it in the contents of the .jsub file.
+
** If the run period has changed, update it in the contents of each file (e.g. RunPeriod-2014-10 --> RunPeriod-2015-01).
+
** If the path or file name format for the input files have changed, update them in the .jproj and .jsub files.
+
** Any other changes to the execution script, environment variables, or job submission instructions can be made in the appropriate files.
+
  
=== Project File Overview ===
+
After experimental running, the following offline monitoring procedures should be performed, each with a different gxprojN account, so that they don't interfere with each other:
  
An overview of each project file:
+
# '''Monitoring Launches:''' Every two weeks, do a monitoring launch over the first <span style="color:red">5</span> files of all runs currently available on the tape.  
* '''clear.sh:''' For the current project, deletes the job status and management database tables (if any), and creates new, empty ones.
+
# '''Initial Reconstruction Launch:''' As soon as a new group (e.g. <span style="color:red">~100</span> runs) of data is initially semi-well calibrated, do a preliminary full-reconstruction launch over all files in that group.
* '''<project_name>.jproj:''' Contains the path and file name format for the input files for the jobs.  
+
#* We can add user analysis plugins to this launch, including those with ROOT TTree output, provided that they work and don't take much memory.
* '''<project_name>.jsub:''' The xml job submission script. The run number and file number variables are set during job submission for each input file.  
+
# '''Further Reconstruction Launches:''' Every <span style="color:red">~3</span> months, if there have been significant improvements to the reconstruction / calibrations, do a new full-reconstruction launch over all of the data.
* '''script.sh:''' The script that is executed during the job. If output job directories are not pre-created manually, they should be created in this script with the proper permissions:
+
#* We can add user analysis plugins to this launch, including those with ROOT TTree output, provided that they work and don't take much memory.  
<pre>
+
mkdir -p -m 775 my_directory
+
</pre>
+
* '''setup_jlab.csh:''' The environment that is sourced at the beginning of the job execution.  
+
* '''status.sh:''' Updates the job status database table, and prints some of its columns to screen.
+
  
=== Project Management ===
+
Note that the monitoring is limited to the first <span style="color:red">5</span> files of each run, since there will be a significant amount of data.
  
* Delete (if any) and create the database table(s) for the current set of job submissions:
+
=== Saving to Tape (Write-through Cache): Monitoring Launches ===
<pre>
+
All job output will be directly written to the write-thru cache. However, only the following will be saved to tape:  
./clear.sh
+
* REST files: All files.
</pre>
+
* ROOT files: One merged file per run.  
 +
** After merge, the individual files are deleted (so they won't be saved).  
 +
* Job stdout/stderr: One tarball per run
 +
** After launch analysis, the log files are deleted (so they won't be saved).
 +
* Browser png's: One tarball per launch
  
* Search for input files matching the string in the .jproj file, and create a row for each in the job management database table (called <project_name>).  You can test by adding an optional argument at the end, which only selects files with a specific file number:
+
=== Saving to Tape (Write-through Cache): Full Reconstruction Launches ===
<pre>
+
* REST files: All files.
jproj.pl <project_name> update <optional_file_number>
+
* ROOT files: All files, <span style="color:blue">AND</span> one merged file per run.
</pre>
+
* Job stdout/stderr: One tarball per run
 +
** After launch analysis, a tarball is created and the individual log files are deleted (so they won't be saved).
 +
* Browser png's: One tarball per launch
  
* Confirm that the job management database is accurate by printing it's contents to screen:
+
== Procedures: Details ==
<pre>
+
mysql -hhalldweb1 -ufarmer farming -e "select * from <project_name>"
+
</pre>
+
  
* ONLY if a mistake was made, to delete the tables from the database and recreate new, empty ones, run:  
+
* [[Offline_Monitoring_Incoming_Data | Offline Monitoring: Running Over Incoming Data]]
<pre>
+
* [[Offline_Monitoring_Archived_Data | Offline Monitoring: Running Over Archived Data]]
./clear.sh
+
* [[Offline_Monitoring_Post_Processing | Offline Monitoring: Post-Processing]]
</pre>
+
* [[Offline_Monitoring_Data_Validation | Offline Monitoring: Data Validation]]
 +
** [[Online_Monitoring_Data_Validation | Online Monitoring: Data Validation]]
 +
* [[DEPRECATED_Offline_Monitoring_Archived_Data | DEPRECATED (Except plots): Offline Monitoring: Running Over Archived Data]]
 +
* [[DSelector_SWIF_Jobs | DSelector SWIF Jobs]]
 +
* [[Merging_Analysis_Trees | Analysis Launch: Merging Trees]]
  
* Submit the unsubmitted jobs in the job management database, and add their job ids to the job status database:
+
== Software Tests ==
<pre>
+
* [[Software_Test_Data_Recon | Software Test: Experimental Data Reconstruction]]
jproj.pl <project_name> submit
+
** [https://halldweb.jlab.org/recon_test/ Test Results]
</pre>
+
 
+
* To look at the status of the submitted jobs, first query auger and update the job status database:
+
<pre>
+
fill_in_job_details.pl <project_name>
+
</pre>
+
 
+
* The job status can then be viewed by submitting a query to the job status database (called <project_name>Job (no space in between)):
+
<pre>
+
mysql -hhalldweb1 -ufarmer farming -e "select id,run,file,jobId,hostname,status,timeSubmitted,timeActive,walltime,cput,timeComplete,result,error from <project_name>Job"
+
</pre>
+
 
+
* These last two commands can instead be executed simultaneously by running:
+
<pre>
+
./status.sh
+
</pre>
+
 
+
=== Handy mysql Instructions ===
+
 
+
* Handy mysql instructions:
+
<pre>
+
mysql -hhalldweb1 -ufarmer farming # Enter the "farming" mysql database on "halldweb1" as user "farmer"
+
quit; # Exit mysql
+
show tables; # Show a list of the tables in the current database
+
show columns from <project_name>; # show all of the columns for the given table
+
select * from <project_name>; # show the contents of all rows from the given table
+
</pre>
+
 
+
== Running Over Data As It Comes In (DEPRECATED) ==
+
 
+
A special user gxproj1 will have a cron job set up to run the plugins as new data appears on /mss.
+
During the week, gxproj1 will submit offline plugin jobs with the same setup as the weekly jobs
+
run the previous Friday. The procedure for this is shown below.
+
 
+
=== Setting up the environment ===
+
The file
+
/home/gxproj1/setup_jlab.csh
+
is sourced through .tcshrc.
+
This file is the same as what is linked to by
+
/home/gluex/setup_jlab_commissioning.csh,
+
except HALLD_HOME, HDDS_HOME, and JANA_CALIB_URL are set separately so that this
+
user can have a separate build.
+
 
+
To obtain the builds from the previous Friday's runs,
+
execute
+
/home/gxproj1/halld/monitoring/newruns/setup_previous.sh [year] [month] [day]
+
The build revisions from the previous Friday are archived in files
+
/work/halld/data_monitoring/run_conditions/soft_comm_[year]_[month]_[day].xml
+
and the script will build libraries based on those stored revision numbers.
+
 
+
=== Running the cron job ===
+
 
+
To run the cron job go to
+
/u/home/gxproj1/halld/monitoring/newruns
+
and do
+
crontab cron_plugins
+
To check whether the cron job is running, do
+
crontab -l
+
To remove the cron job do
+
crontab -r
+
 
+
The cron job will run the script scan_for_jobs.sh,
+
which runs generatejobs_plugins_rawdata.sh for any
+
new runs that it had not seen before. All previous
+
runs are recorded in the file filelists/files_current.txt
+
so clear this to run over runs, or set the parameters
+
MINRUN and MAXRUN which will set the range of runs submitted.
+
 
+
==Extracting Summary Data==
+
 
+
For high-level monitoring, we save images of selected histograms and store time series of selected quantities in a database, which are then displayed on a web page.  This section describes how to generate the monitoring images and database information.
+
 
+
The scripts used to generate this summary data are currently kept in /u/home/gluex/halld/monitoring/process
+
Note that these scripts currently have some parameters which must be periodically set by hand.
+
 
+
The default python version on most JLab machine does not have the modules to allow these scripts to connect to the MySQL database.  To run these scripts, load the environment with the following command
+
<syntaxhighlight>
+
source /u/home/gluex/halld/monitoring/process/monitoring_env.sh
+
</syntaxhighlight>
+
 
+
===Online Monitoring===
+
 
+
There are two scripts for running over the monitoring data generated by the online system and offline reconstruction.  The online script can be run with either of the following commands:
+
<syntaxhighlight>
+
./check_new_runs.py
+
 
+
OR
+
 
+
./check_new_runs.csh
+
</syntaxhighlight>
+
The shell script sets up the environment properly to run the python script.  To connect to the monitoring database on the JLab CUE, modules continued in the installation of python >= 2.7 are needed.  The shell script is appropriate to use in a cron job.
+
 
+
The online monitoring system copies a ROOT file containing the results of the online monitoring, and other configuration files into a directory accessible outside the counting house.  This python script automatically checks for new ROOT files, which it will then automatically process.  It contains several configuration variables that must be correctly set, which contains the location of input/output directories, etc...
+
 
+
Note that while this script is current run as a cronjob, the processing of online ROOT files is currently disabled, so its only function it to update the run_info database.
+
 
+
===Offline Monitoring===
+
 
+
After the data is run over, the results should be processed, so that summary data is entered into the monitoring database and plots are made for the monitoring webpages.  Currently, this processing is controlled by a cronjob that runs the following script:
+
<syntaxhighlight>
+
/home/gluex/halld/monitoring/process/check_monitoring_data.csh 
+
</syntaxhighlight>
+
This script checks for new ROOT files, and only runs over those it hasn't processed yet.  Since one monitoring ROOT file is produced for each EVIO file, whenever a new file is produced, the plots for the corresponding run are recreated and all the ROOT files for a run are combined into one file.  Information is stored in the database on a per-file basis. 
+
 
+
Plots for the monitoring web page can be made from single histograms or multiple histograms using RootSpy macros.  If you want to change the list of plots made, you must modify one of the following files:
+
* histograms_to_monitor - specify either the name of the histogram or its the full ROOT path
+
* macros_to_monitor - specify the full path to the RootSpy macro .C file
+
 
+
When a new monitoring run is started, or the conditions are changed, the following steps should be taken to process the new files:
+
# Add a new data version, as described below:
+
# Change the following parameters in check_monitoring_data.csh:
+
## JOBDATE should correspond to the ouptut date used by the job submission script
+
## OUTPUTDIR should correspond to the directory corresponding to the run period and revision corresponding to the new version you just submitted.  Presumably, this directory will be empty at the beginning.
+
## Once you create a new data version as defined below, you should pass the needed information as a command line option.  Currently this is done by the ARGS variable.  For example, the argument "-v RunPeriod-2014-10,8" tells the monitoring scripts to look up the version corresponding to revision 8 of RunPeriod-2014-10 in the monitoring DB and to use to store the results.
+
 
+
<syntaxhighlight>
+
Example configuration parameters:
+
set JOBDATE=2015-01-09
+
set INPUTDIR=/volatile/halld/RunPeriod-2014-10/offline_monitoring
+
set OUTPUTDIR=/w/halld-scifs1a/data_monitoring/RunPeriod-2014-10/ver08
+
set ARGS=" -v RunPeriod-2014-10,8 "
+
</syntaxhighlight>
+
If you want to process the results manually, the data is processed using the following script:
+
<syntaxhighlight>
+
./process_new_offline_data.py <input directory> <output directory>
+
 
+
EXAMPLE:
+
 
+
./process_new_offline_data.py 2014-11-14 /volatile/halld/RunPeriod-2014-10/offline_monitoring/ /w/halld-scifs1a/data_monitoring/RunPeriod-2014-10/ver02
+
</syntaxhighlight>
+
The python script takes several options to enable/disable various steps in the processing.  Of interest is the "--force" option, which will run over all monitoring ROOT files, whether or not they've been previously identified.
+
 
+
Every time a new reconstruction pass is performed, a new version number must be generated.  To do this, prepare a version file as described below.  Then run the register_new_version.py script to store the information in the database.  The script will return a version number, which then should be set by hand in process_new_offline_data.py - future versions of the script will streamline this part of the procedure.  An example of how to generate a new version is:
+
<syntaxhighlight>
+
./register_new_version.py add /u/home/gluex/halld/monitoring/process/versions/vers_RunPeriod-2014-10_pass1.txt
+
</syntaxhighlight>
+
 
+
===Run Conditions===
+
 
+
Currently the run_info database is being updated by Sean by hand.  Note that this must be done inside the counting house.
+
If you want to do this yourself, check out the monitoring scripts on a gluon machine
+
<syntaxhighlight>
+
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/process/
+
</syntaxhighlight>
+
In the process/get_conds directory, run the process_runlog_files.py script with the maximum and minimum run number that you want to process, e.g.
+
<syntaxhighlight>
+
./process_runlog_files.py -b 2200 -e 2260
+
</syntaxhighlight>
+
 
+
==Data Versions==
+
 
+
To document the conditions of the monitoring data that is created, for the sake of reproducability and further analysis we save several pieces of information.  The format is intended to be comprehensive enough to document not just monitoring data, but versions of raw and reconstructed data, so that this database table can be used for the event database as well.
+
 
+
We store one record per pass through one run period, with the following structure:
+
 
+
{| class="wikitable"
+
! Field !! Description
+
|-
+
| data_type || The level of data we are processing.  For the purposes of monitoring, "rawdata" is the online monitoring, "recon" is the offline monitoring
+
|-
+
| run_period || The run period of the data
+
|-
+
| revision || An integer specifying which pass through the run period this data corresponds to
+
|-
+
| software_version || The name of the XML file that specifies the different software versions used
+
|-
+
| jana_config  || The name of the text file that specifies which JANA options were passed to the reconstruction program
+
|-
+
| ccdb_context  || The value of JANA_CALIB_CONTEXT, which specifies the version of calibration constants that were used
+
|-
+
| production_time  || The data at which monitoring/reconstruction began
+
|-
+
| dataVersionString  || A convenient string for identifying this version of the data
+
|}
+
 
+
 
+
An example file used as as input to ./register_new_version.py is:
+
<syntaxhighlight>
+
data_type          = recon
+
run_period          = RunPeriod-2014-10
+
revision            = 1
+
software_version    = soft_comm_2014_11_06.xml
+
jana_config        = jana_rawdata_comm_2014_11_06.conf
+
ccdb_context        = calibtime=2014-11-10
+
production_time    = 2014-11-10
+
dataVersionString  = recon_RunPeriod-2014-10_20141110_ver01
+
</syntaxhighlight>
+

Latest revision as of 09:22, 17 March 2020

Master List of File / Database / Webpage Locations

Run Conditions

  • Online Run-by-run condition files (B-field, current, etc.): /work/halld/online_monitoring/conditions/
  • Offline monitoring run conditions (software versions, jana config): /group/halld/data_monitoring/run_conditions/
  • Run Info vers. 1
  • Run Info vers. 2
  • RCDB

Monitoring Output Files

  • Run Periods 201Y-MM is for example 2015-03, launch ver verVV is for example ver15
  • Online monitoring histograms: /work/halld/online_monitoring/root/
  • Offline monitoring histogram ROOT files (merged): /work/halld/data_monitoring/RunPeriod-201Y-MM/verVV/rootfiles
  • individual files for each job (ROOT, REST, log, etc.): /volatile/halld/offline_monitoring/RunPeriod-201Y-MM/verVV/

Monitoring Database

  • Accessing monitoring database (on ifarm): mysql -u datmon -h hallddb.jlab.org data_monitoring

Monitoring Webpages

SciComp Job Links

Main

Documentation

Job Tracking

Procedures: Overview

Online Monitoring: During Experimental Running

After every run is finished, a ROOT file containing histograms from the online monitoring system and a file containing some run conditions are copied to directories under /work/halld/online_monitoring . A cronjob running in the counting house performs this function.

This ROOT file is processed similarly to the offline monitoring results, and are made available under the same webpages as "ver00" of the relevant run period.

For more details on the online monitoring system, see this page.

Offline Monitoring and Reconstruction: During Experimental Running

During experimental running, the following offline monitoring procedures should be performed, each with a different gxprojN account, so that they don't interfere with each other:

  1. Incoming: Monitor the first 5 files of each newly-recorded run as soon as it hits the tape.
  2. Monitoring Launches: Every two weeks, do a monitoring launch over the first 5 files of all runs currently available on the tape.
  3. Initial Reconstruction Launch: As soon as a new group (e.g. ~100 runs) of data is initially semi-well calibrated, do a preliminary full reconstruction launch over all files in that group.
    • We can add user analysis plugins to this launch, including those with ROOT TTree output, provided that they work and don't take much memory.

Note that the monitoring is limited to the first 5 files of each run, because data is being recorded to tape at a faster rate than the monitoring can keep up with. Also, during the experimental run, each run will only be fully-reconstructed once, because it will be difficult enough to keep up with the incoming data.

Offline Monitoring and Reconstruction: After Experimental Running

After experimental running, the following offline monitoring procedures should be performed, each with a different gxprojN account, so that they don't interfere with each other:

  1. Monitoring Launches: Every two weeks, do a monitoring launch over the first 5 files of all runs currently available on the tape.
  2. Initial Reconstruction Launch: As soon as a new group (e.g. ~100 runs) of data is initially semi-well calibrated, do a preliminary full-reconstruction launch over all files in that group.
    • We can add user analysis plugins to this launch, including those with ROOT TTree output, provided that they work and don't take much memory.
  3. Further Reconstruction Launches: Every ~3 months, if there have been significant improvements to the reconstruction / calibrations, do a new full-reconstruction launch over all of the data.
    • We can add user analysis plugins to this launch, including those with ROOT TTree output, provided that they work and don't take much memory.

Note that the monitoring is limited to the first 5 files of each run, since there will be a significant amount of data.

Saving to Tape (Write-through Cache): Monitoring Launches

All job output will be directly written to the write-thru cache. However, only the following will be saved to tape:

  • REST files: All files.
  • ROOT files: One merged file per run.
    • After merge, the individual files are deleted (so they won't be saved).
  • Job stdout/stderr: One tarball per run
    • After launch analysis, the log files are deleted (so they won't be saved).
  • Browser png's: One tarball per launch

Saving to Tape (Write-through Cache): Full Reconstruction Launches

  • REST files: All files.
  • ROOT files: All files, AND one merged file per run.
  • Job stdout/stderr: One tarball per run
    • After launch analysis, a tarball is created and the individual log files are deleted (so they won't be saved).
  • Browser png's: One tarball per launch

Procedures: Details

Software Tests