HOWTO Execute a Launch using IU

From GlueXWiki
Jump to: navigation, search

Introduction

This page gives some instructions on executing a launch at IU. Note that some steps must be completed to make sure things are set up at Cori and Globus prior to submitting any jobs.

The following is based on steps used to do a RunPeriod-2019-11 recon launch using swif2.

Quick Start (more detailed instructions below)

  1. ssh from gxproj9 account to jlab@cori.nersc.gov to make sure passwordless login works
  2. login to Globus and make sure both endpoints are active ("NERSC DTN", "jlab#scidtn1")
  3. Make directory in gxproj4 for launch, checkout launch scripts, and modify launch/launch_nerscB.py
    mkdir ~gxproj4/NERSC/2020.10.24.recon_ver01
    cd ~gxproj4/NERSC/2020.10.24.recon_ver01
    svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/launch
    svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/hdswif3
  4. Run launch_nerscB.py in test mode (TESTMODE=True) with VERBOSE=3 and for 1 file of 1 run and check swif2 command carefully
  5. Commit any changes to launch directory scripts to repository
  6. ssh to jlab@cori.nersc.gov and update launch directory
    cd projectdir_JLab/launch
    svn update
  7. Make sure enough scratch disk space is available on jlab@cori (use myquota - 60TB quota)
  8. Make sure OUTPUTDIR is pointing to a valid location at JLab (should be something like /lustre/expphy/volatile/halld/offsite_prod/RunPeriod-XXX
  9. Back at ifarm, set TESTMODE=False in launch_nerscB.py and submit test job by running script
  10. IF job runs successfully:
    • Check that all files were copied back to JLab in the appropriate subdirectory of OUTPUTDIR
    • Set TESTMODE=True and VERBOSE=1 in launch_nerscB.py
    • modify numbers to process all files desired for launch
    • run launch_nerscB.py and confirm everything looks right
    • set TESTMODE=False and run the launch_nerscB.py script
  11. Set up updating job monitoring plots
    • ssh into gxproj4 on ifarm in terminal that can be left up for long periods (e.g. desktop)
    • cd to hdswif3 in project directory ( cd ~gxproj4/NERSC/2020.10.24.recon_ver01/hdswif3 )
    • Modify auto_run.sh to reflect correct swif2 workflow name
    • Modify start_date in regenerate_plots.csh to current time minus 3 hours (for California time)
    • Run the auto_run.sh script so that it completes one full cycle and creates the appropriate subdirectory in the halldweb pages.
    • ssh to gxproj5@ifarm and:
      • modify /group/halld/www/halldweb/html/data_monitoring/launch_analysis/index.html to include new launch campaign. Point to newly created directory.
      • log back out to the previous gxproj4 shell
    • run: ./auto_run.sh
  12. Move output files to tape as jobs finish
    • Make sure the script launch/move_to_tape_multi.py has srcdir pointing to a directory containing the OUTDIR directory from the launch_nerscB.py script
    • Run the move_to_tape_multi.py script occasionally as jobs complete to move the output files to the /cache disk. (n.b. this does not automatically flush to tape)

NERSC Account