HOWTO Execute a Launch using NERSC

From GlueXWiki
Revision as of 11:05, 5 October 2018 by Davidl (Talk | contribs) (Created page with "==Introduction== This page gives some instructions on executing a launch at NERSC. Note that some steps must be completed to make sure things are set up at Cori and Globus pr...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

This page gives some instructions on executing a launch at NERSC. Note that some steps must be completed to make sure things are set up at Cori and Globus prior to submitting any jobs.

The following is based on steps used to do RunPeriod-2018-01 monitoring launch ver 18 using swif2.



Files and directories on Cori at NERSC

Globus Endpoint Authentication

Submitting jobs to swif2

The offsite jobs at NERSC are managed from the gxproj4 account. This is a group account with access limited to certain users. Your ssh key must be added to the account by an existing member. Contact the software group to request access.

Generally, one would log into an appropriate computer with:

ssh gxproj4@ifarm


The following are some steps needed to create a workflow and submit jobs.


1. Create a new workflow. The workflow name follows a convention based on the type of launch, run period, version, and optional extra qualifiers. Here is the command used to create the workflow for offline monitoring launch ver18 for RunPeriod-2018-01:

swif2 create -workflow offmon_2018-01_ver18 -max-concurrent 2000 -site nersc/cori -site-storage nersc:m3120


The -max-concurrent 2000 option tells swif2 to limit the number of dispatched jobs to no more than 2000. The primary concern here is in scratch disk space at NERSC. If each input file is 20GB and produces 7GB of output then the we need 27GB * 2000 = 54 TB of free scratch disk space. If multiple launches are running at the same time and using the same account's scratch disk then it is up to you to make sure the sum of requirements does not exceed the quota. At this point in time we have a quota of 60TB of scratch space, though they have claimed that they will revisit that at the beginning of the year.

The -site nersc/cori is required at the moment and is the only allowed option for "site".

The -site-storage nersc:m3120 is used to specify which NERSC project assigned disk space to use. At this point, swif2 has been changed to use scratch disk space assigned to the personal account being used to run the jobs so I believe this is being ignored.


2. Create a working directory in the gxproj4 account and checkout the launch scripts. This is done so that the scripts can be modified for the specific launch in case some tweaks are needed. Changes should eventually be pushed back into the repository, but having dedicated directory for the launch can help with managing relative to other launches.

mkdir ~gxproj4/NERSC/2018.10.05.offmon_ver18
cd ~gxproj4/NERSC/2018.10.05.offmon_ver18
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/launch


3. Edit the file launch/launch_nersc.py to adjust all of the settings at the top to be consistent with the current launch. Make sure TESTMODE is set to "True" so the script can be tested without actually submitting any jobs.

/group/halld/Software/builds/Linux_CentOS7-x86_64-gcc4.8.5-cntr