Offline Monitoring Incoming Data

From GlueXWiki
Jump to: navigation, search

Saving Online Monitoring Data

The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.

Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.

All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.

For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.

Preparing the software

1) Replace "monitoring_launch" with "monitoring_incoming."

2) The software should be built with a different directory name (e.g. "build1") instead of "monitoring_incoming." And then a soft link should be created:

ln -s build1 monitoring_incoming

This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. "build2") and then switch the symbolic links when you're ready.

3) Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants.

Starting A New Run Period

  • Do the exact same steps as detailed in "Starting a new run period" at Link

Launching for a new run period

1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:

cd ~/
svn co
cd monitoring/incoming

2) Update the jobs_incoming.config job config file. Definitely be sure to update RUNPERIOD. Monitoring of the incoming data should always be ver01.

vi ~/monitoring/incoming/jobs_incoming.config

3) Update the jana_incoming.config jana config file. This contains the command line arguments given to JANA. Definitely be sure to update REST:DATAVERSIONSTRING.

vi ~/monitoring/incoming/jana_incoming.config

4) Create the SWIF workflow. The workflow should have a name like "offmon_2016-10_ver01". It should also match the workflow name in the job config file (e.g. jobs_incoming.config).

swif create -workflow <my_workflow>

5) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period E.g. for 2016-02:


6) Before launching the cron job, manually run the script first. This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually (this submits jobs for the first 5 files (000 -> 004) of every run that are on /mss/, but haven't been submitted yet):

python ~/monitoring/incoming/ 2016-10 ~/monitoring/incoming/jobs_incoming.config 5 >& ~/incoming_log.txt

7) Update the script for post-processing for the new run period:


8) Add the incoming data to the data version database

~/monitoring/process/ add ~/monitoring/process/version/incoming_2016-10_ver01

9) Check if the cron demon is running on that node:

ps aux | grep crond

10) Now that the initial batch of jobs have been submitted, launch the cron job by running:

crontab cron_incoming

11) To check whether the cron job is running (on the same machine you launched the cron job, i.e. for CentOS7: ifarm1401 or ifarm1402), do

crontab -l

12) The stdout & stderr from the cronjob are piped to a log file located at:




13) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of SYSTEM, TIMEOUT, RLIMIT):

swif status <workflow>
~/monitoring/hdswif/ resubmit <workflow> <problem>

14) To remove the cron job (e.g. at the end of the run) do

crontab -r