Difference between revisions of "Offline Monitoring Incoming Data"
(→Launching for a new run period) |
(→Launching for a new run period) |
||
Line 61: | Line 61: | ||
TRACK reconstruction | TRACK reconstruction | ||
OS centos65 | OS centos65 | ||
− | NCORES 24 | + | NCORES 24 |
DISK 40 | DISK 40 | ||
− | RAM | + | RAM 18 |
− | TIMELIMIT | + | TIMELIMIT 4 |
+ | NTHREADS 24 | ||
JOBNAMEBASE offmon | JOBNAMEBASE offmon | ||
RUNPERIOD 2016-02 | RUNPERIOD 2016-02 |
Revision as of 21:35, 15 April 2016
Saving Online Monitoring Data
The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.
Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.
All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.
For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.
Preparing the software
1. Update the environment, using the latest desired versions of JANA, the CCDB, etc. Also, the launch software will create new tags of the HDDS and sim-recon repositories, so update the version*.xml file referenced in the environment file to use the soon-to-be-created tags. This must be done BEFORE launch project creation. The environment file is at:
~/env_monitoring_incoming2. Setup the environment. This will override the HDDS and sim-recon in the version*.xml file and will instead use the monitoring launch working-area builds. Call:
source ~/env_monitoring_incoming
3. Updating & building hdds:
cd $HDDS_HOME git pull # Get latest software scons -c install # Clean out the old install: EXTREMELY IMPORTANT for cleaning out stale headers scons install -j4 # Rebuild and re-install with 4 threads
4. Updating & building sim-recon:
cd $HALLD_HOME/src git pull # Get latest software scons -c install # Clean out the old install: EXTREMELY IMPORTANT for cleaning out stale headers scons install -j4 # Rebuild and re-install with 4 threads
5. Create a new sqlite file containing the very latest calibration constants. Original documentation on creating sqlite files are here.
cd $GLUEX_MYTOP/../sqlite/ $CCDB_HOME/scripts/mysql2sqlite/mysql2sqlite.sh -hhallddb.jlab.org -uccdb_user ccdb | sqlite3 ccdb.sqlite mv ccdb.sqlite ccdb_monitoring_incoming.sqlite #replacing the old file
Launching for a new run period
1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:
cd ~/ svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/ cd monitoring/incoming chmod 755 script.sh #Fix the permissions!
2) Edit the job config file, ~/monitoring/incoming/input.config, which is used to register jobs in hdswif. The version # should be "01." A typical config file will look this:
PROJECT gluex TRACK reconstruction OS centos65 NCORES 24 DISK 40 RAM 18 TIMELIMIT 4 NTHREADS 24 JOBNAMEBASE offmon RUNPERIOD 2016-02 VERSION 01 OUTPUT_TOPDIR /cache/halld/offline_monitoring/RunPeriod-[RUNPERIOD]/ver[VERSION] # Example of other variables included in variable SCRIPTFILE /home/gxproj1/monitoring/incoming/script.sh # Must specify full path ENVFILE /home/gxproj1/env_monitoring_incoming # Must specify full path PLUGINS TAGH_online,TAGM_online,BCAL_online,CDC_online,CDC_expert,FCAL_online,FDC_online,ST_online_lowlevel,ST_online_tracking,TOF_online,PS_online,PSC_online,PSPair_online,TPOL_online,TOF_TDC_shift,monitoring_hists,danarest,BCAL_Eff,p2pi_hists,p3pi_hists,HLDetectorTiming,BCAL_inv_mass,trackeff_missing,TRIG_online,CDC_drift,RF_online,BCAL_attenlength_gainratio,BCAL_TDC_Timing
3) Create a new swif workflow for running all of the incoming data (e.g. <workflow> = offline_monitoring_RunPeriod2016_02_ver01_hd_rawdata):
~/monitoring/hdswif/hdswif.py create [workflow] -c ~/monitoring/incoming/input.config
4) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period E.g. for 2016-02:
~/monitoring/incoming/cron_exec.csh
5) Before launching the cron job, manually run the script first. This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually:
python ~/monitoring/incoming/process_incoming.py 2016-02 ~/monitoring/incoming/input.config 20 >& ~/incoming_log.txt
6) Now that the initial batch of jobs have been submitted, launch the cron job by running:
crontab cron_incoming
7) To check whether the cron job is running, do
crontab -l
8) The stdout & stderr from the cronjob are piped to a log file located at:
~/incoming_log.txt
9) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of SYSTEM, TIMEOUT, RLIMIT):
swif status <workflow> ~/monitoring/hdswif/hdswif.py resubmit <workflow> <problem>
10) To remove the cron job (e.g. at the end of the run) do
crontab -r