Difference between revisions of "Offline Monitoring Incoming Data"
(→Preparing the software) |
(→Preparing the software) |
||
Line 21: | Line 21: | ||
* Do the exact same steps as detailed for the [[Offline_Monitoring_Archived_Data | offline monitoring and reconstruction setup]] EXCEPT the following. | * Do the exact same steps as detailed for the [[Offline_Monitoring_Archived_Data | offline monitoring and reconstruction setup]] EXCEPT the following. | ||
− | 1) Replace <span style="color:red">"monitoring_launch"</span> with <span style="color:red">"monitoring_incoming."</span> | + | '''1)''' Replace <span style="color:red">"monitoring_launch"</span> with <span style="color:red">"monitoring_incoming."</span> |
− | 2) The software should be built with a different directory name (e.g. <span style="color:red">"build1"</span>) instead of <span style="color:red">"monitoring_incoming."</span> And then a soft link should be created: | + | '''2)''' The software should be built with a different directory name (e.g. <span style="color:red">"build1"</span>) instead of <span style="color:red">"monitoring_incoming."</span> And then a soft link should be created: |
<pre> | <pre> | ||
ln -s build1 monitoring_incoming | ln -s build1 monitoring_incoming | ||
Line 29: | Line 29: | ||
This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. <span style="color:red">"build2"</span>) and then switch the symbolic links when you're ready. | This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. <span style="color:red">"build2"</span>) and then switch the symbolic links when you're ready. | ||
− | 3) Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants. | + | '''3)''' Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants. |
== Starting A New Run Period == | == Starting A New Run Period == |
Revision as of 15:18, 28 September 2016
Contents
Saving Online Monitoring Data
The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.
Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.
All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.
For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.
Preparing the software
- Do the exact same steps as detailed for the offline monitoring and reconstruction setup EXCEPT the following.
1) Replace "monitoring_launch" with "monitoring_incoming."
2) The software should be built with a different directory name (e.g. "build1") instead of "monitoring_incoming." And then a soft link should be created:
ln -s build1 monitoring_incoming
This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. "build2") and then switch the symbolic links when you're ready.
3) Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants.
Starting A New Run Period
- Do the exact same steps as detailed in "Starting a new run period" at Link
Launching for a new run period
1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:
cd ~/ svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/ cd monitoring/incoming chmod 755 script.sh #Fix the permissions!
2) Edit the job config file, ~/monitoring/incoming/input.config, which is used to register jobs in hdswif. The version # should be "01." A typical config file will look this:
PROJECT gluex TRACK reconstruction OS centos65 NCORES 24 DISK 40 RAM 18 TIMELIMIT 4 NTHREADS 24 JOBNAMEBASE offmon RUNPERIOD 2016-02 VERSION 01 OUTPUT_TOPDIR /cache/halld/offline_monitoring/RunPeriod-[RUNPERIOD]/ver[VERSION] # Example of other variables included in variable SCRIPTFILE /home/gxproj1/monitoring/incoming/script.sh # Must specify full path ENVFILE /home/gxproj1/env_monitoring_incoming # Must specify full path PLUGINS TAGH_online,TAGM_online,BCAL_online,CDC_online,CDC_expert,FCAL_online,FDC_online,ST_online_lowlevel,ST_online_tracking,TOF_online,PS_online,PSC_online,PSPair_online,TPOL_online,TOF_TDC_shift,monitoring_hists,danarest,BCAL_Eff,p2pi_hists,p3pi_hists,HLDetectorTiming,BCAL_inv_mass,trackeff_missing,TRIG_online,CDC_drift,RF_online,BCAL_attenlength_gainratio,BCAL_TDC_Timing
3) Create a new swif workflow for running all of the incoming data (e.g. <workflow> = offline_monitoring_RunPeriod2016_02_ver01_hd_rawdata):
~/monitoring/hdswif/hdswif.py create [workflow] -c ~/monitoring/incoming/input.config
4) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period E.g. for 2016-02:
~/monitoring/incoming/cron_exec.csh
5) Before launching the cron job, manually run the script first. This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually:
python ~/monitoring/incoming/process_incoming.py 2016-02 ~/monitoring/incoming/input.config 20 >& ~/incoming_log.txt
6) Now that the initial batch of jobs have been submitted, launch the cron job by running:
crontab cron_incoming
7) To check whether the cron job is running, do
crontab -l
8) The stdout & stderr from the cronjob are piped to a log file located at:
~/incoming_log.txt
9) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of SYSTEM, TIMEOUT, RLIMIT):
swif status <workflow> ~/monitoring/hdswif/hdswif.py resubmit <workflow> <problem>
10) To remove the cron job (e.g. at the end of the run) do
crontab -r