Difference between revisions of "Offline Monitoring Incoming Data"

Revision as of 15:29, 28 September 2016

Saving Online Monitoring Data

The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.

Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.

All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.

For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.

Preparing the software

Do the exact same steps as detailed for the offline monitoring and reconstruction setup EXCEPT the following.

1) Replace "monitoring_launch" with "monitoring_incoming."

2) The software should be built with a different directory name (e.g. "build1") instead of "monitoring_incoming." And then a soft link should be created:

ln -s build1 monitoring_incoming

This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. "build2") and then switch the symbolic links when you're ready.

3) Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants.

Starting A New Run Period

Do the exact same steps as detailed in "Starting a new run period" at Link

Launching for a new run period

1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:

cd ~/
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/
cd monitoring/incoming

2) Update the jobs_incoming.config job config file. Definitely be sure to update RUNPERIOD.

vi ~/monitoring/incoming/jobs_incoming.config

3) Update the jana_incoming.config jana config file. This contains the command line arguments given to JANA. Definitely be sure to update REST:DATAVERSIONSTRING.

vi ~/monitoring/incoming/jana_incoming.config

4) Create a new swif workflow for running all of the incoming data (e.g. <workflow> = offline_monitoring_RunPeriod2016_02_ver01_hd_rawdata):

~/monitoring/hdswif/hdswif.py create [workflow] -c ~/monitoring/incoming/input.config

5) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period E.g. for 2016-02:

~/monitoring/incoming/cron_exec.csh

6) Before launching the cron job, manually run the script first. This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually:

python ~/monitoring/incoming/process_incoming.py 2016-02 ~/monitoring/incoming/input.config 20 >& ~/incoming_log.txt

7) Now that the initial batch of jobs have been submitted, launch the cron job by running:

crontab cron_incoming

8) To check whether the cron job is running, do

crontab -l

9) The stdout & stderr from the cronjob are piped to a log file located at:

~/incoming_log.txt

10) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of SYSTEM, TIMEOUT, RLIMIT):

swif status <workflow>
~/monitoring/hdswif/hdswif.py resubmit <workflow> <problem>

11) To remove the cron job (e.g. at the end of the run) do

crontab -r

@@ Line 37: / Line 37: @@
 == Launching for a new run period ==
-) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:
+'''1)''' Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:
 <pre>
 cd ~/
 svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/
 cd monitoring/incoming
-chmod 755 script.sh    #Fix the permissions!
 </pre>
-) Edit the job config file, ~/monitoring/incoming/input.config, which is used to register jobs in hdswif. The version # should be "01."  A typical config file will look this:
+'''2)''' Update the '''<span style="color:red">jobs_incoming.config</span>''' job config file.  Definitely be sure to update '''<span style="color:red">RUNPERIOD</span>'''.
- PROJECT                       gluex
+<pre>
- TRACK                         reconstruction
+vi ~/monitoring/incoming/jobs_incoming.config
- OS                            centos65
+</pre>
- NCORES                        24
- DISK                          40
+'''3)''' Update the '''<span style="color:red">jana_incoming.config</span>''' jana config file.  This contains the command line arguments given to JANA. Definitely be sure to update '''<span style="color:red">REST:DATAVERSIONSTRING</span>'''.
- RAM                           18
+<pre>
- TIMELIMIT                      4
+vi ~/monitoring/incoming/jana_incoming.config
- NTHREADS                      24
+</pre>
- JOBNAMEBASE                   offmon
- RUNPERIOD                     2016-02
- VERSION                       01
- OUTPUT_TOPDIR                 /cache/halld/offline_monitoring/RunPeriod-[RUNPERIOD]/ver[VERSION] # Example of other variables included in variable
- SCRIPTFILE                    /home/gxproj1/monitoring/incoming/script.sh                        # Must specify full path
- ENVFILE                       /home/gxproj1/env_monitoring_incoming                              # Must specify full path
- PLUGINS                       TAGH_online,TAGM_online,BCAL_online,CDC_online,CDC_expert,FCAL_online,FDC_online,ST_online_lowlevel,ST_online_tracking,TOF_online,PS_online,PSC_online,PSPair_online,TPOL_online,TOF_TDC_shift,monitoring_hists,danarest,BCAL_Eff,p2pi_hists,p3pi_hists,HLDetectorTiming,BCAL_inv_mass,trackeff_missing,TRIG_online,CDC_drift,RF_online,BCAL_attenlength_gainratio,BCAL_TDC_Timing
-) Create a new swif workflow for running all of the incoming data (e.g. <workflow> = offline_monitoring_RunPeriod2016_02_ver01_hd_rawdata):
+'''4)''' Create a new swif workflow for running all of the incoming data (e.g. <workflow> = offline_monitoring_RunPeriod2016_02_ver01_hd_rawdata):
 <pre>
 ~/monitoring/hdswif/hdswif.py create [workflow] -c ~/monitoring/incoming/input.config
 </pre>
-) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period  E.g. for 2016-02:
+'''5)''' In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period  E.g. for 2016-02:
 <pre>
 ~/monitoring/incoming/cron_exec.csh
 </pre>
-) Before launching the cron job, manually run the script first.  This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually:
+'''6)''' Before launching the cron job, manually run the script first.  This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually:
 <pre>
 python ~/monitoring/incoming/process_incoming.py 2016-02 ~/monitoring/incoming/input.config 20 >& ~/incoming_log.txt
 </pre>
-) Now that the initial batch of jobs have been submitted, launch the cron job by running:
+'''7)''' Now that the initial batch of jobs have been submitted, launch the cron job by running:
 <pre>
 crontab cron_incoming
 </pre>
-) To check whether the cron job is running, do
+'''8)''' To check whether the cron job is running, do
 <pre>
 crontab -l
 </pre>
-) The stdout & stderr from the cronjob are piped to a log file located at:
+'''9)''' The stdout & stderr from the cronjob are piped to a log file located at:
 <pre>
 ~/incoming_log.txt
 </pre>
-) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of '''SYSTEM, TIMEOUT, RLIMIT'''):
+'''10)''' Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of '''SYSTEM, TIMEOUT, RLIMIT'''):
 <pre>
 swif status <workflow>
@@ Line 98: / Line 90: @@
 </pre>
-) To remove the cron job (e.g. at the end of the run) do
+'''11)''' To remove the cron job (e.g. at the end of the run) do
 <pre>
 crontab -r
 </pre>

Difference between revisions of "Offline Monitoring Incoming Data"

Revision as of 15:29, 28 September 2016

Contents

Saving Online Monitoring Data

Preparing the software

Starting A New Run Period

Launching for a new run period

Navigation menu

Views

Personal tools

Navigation

Search

Tools