Difference between revisions of "GlueX Offline Meeting, June 14, 2017"

From GlueXWiki
Jump to: navigation, search
(Agenda)
(Minutes)
Line 24: Line 24:
  
 
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2017</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .
 
Talks can be deposited in the directory <code>/group/halld/www/halldweb/html/talks/2017</code> on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .
 +
 +
== Minutes ==
 +
 +
Present:
 +
* '''CMU''': Naomi Jarvis
 +
* '''JLab''': Thomas Britton, Brad Cannon, Eugene Chudakov, Hovanes Egiyan, Mark Ito (chair), Dmitry Romanov, Beni Zihlmann
 +
* '''NU''':  Sean Dobbs
 +
* '''UConn''': Richard Jones
 +
* '''Yerevan''': Hrach Marukyan
 +
 +
There is a [https://bluejeans.com/s/uEQso/ recording of this meeting] on the BlueJeans site. Use your JLab credential to access it.
 +
 +
=== Announcements ===
 +
 +
# [https://mailman.jlab.org/pipermail/halld-offline/2017-June/002796.html New release of HDDS: version 3.11]. Mark noted that this release contains recent changes to target and start counter geometry from Simon Taylor.
 +
# [https://mailman.jlab.org/pipermail/halld-offline/2017-June/002810.html hdpm 0.7.0]. Nathan went over his announcement. New features include
 +
#* AmpTools' new location at GitHub is handled.
 +
#* New package: PyPWA
 +
#* Revised actions for hdpm sub-commands.
 +
 +
=== Review of minutes from the last meeting ===
 +
 +
We went over the [[GlueX Offline Meeting, May 31, 2017#Minutes|minutes from May 31]].
 +
 +
==== Progress on the OSG ====
 +
 +
Richard gave an update on progress with the OSG. For all the details, please see the [https://bluejeans.com/s/uEQso/ recording] starting. Some notes:
 +
 +
* scosg16.jlab.org is fully functional as an OSG submit host now.
 +
* Jobs similar to Data Challenge 2 are going out.
 +
* '''Using containers''' to deliver and run our software to remote nodes:
 +
** [https://en.wikipedia.org/wiki/Docker_(software) Docker was subject of initial focus, turns out it was designed to solve network (and other system resources) isolation problems, e. g., for deployment of web services on foreign OS.
 +
** [http://singularity.lbl.gov/ Singularity aimed at mobility of compute, which is the problem we are trying to solve. OSG has embraced it as the on-the-grid-node-at-run-time solution.
 +
** Richards original solution to was make a straight-forward Singularity container with everything we need to run. That came to 7 GB, too large to use under OASIS (OSG's file distribution system).
 +
** With guidance from OSG folks, he has implemented a solution that allows us to run. [The details are many and various and will not be recorded here. Again, see the recording.] The broad features are:
 +
*** Singularity on the grid node runs using system files (glibc, ld, system provide shared libraries, etc.) stored outside the container on OASIS.
 +
*** Software is distributed in two parts. The system files mentioned in the previous item, and our standard built-by-us-GlueX-software-stack, distributed via OASIS without any need for containerization.
 +
* '''Scripts for running the system'''
 +
** osg_container.sh: script that runs on the grid node
 +
** my_grid_job.py
 +
*** runs generator, simulation, smearing, reconstruction, and analysis
 +
*** hooks for submitting, no knowledge of Condor required
 +
*** will report on job status
 +
** Richard will send out an email with instructions.
 +
* '''Problem with CentOS 6 nodes'''
 +
** Some grid nodes are hanging on the hd_root step.
 +
** CentOS 7 nodes seem OK. CentOS 6 nodes have the problem. Unfortunately, the majority of nodes out there are CentOS-6-like, including all of the nodes deployed at GlueX collaborating university sites.
 +
** The issue seems to be related to access of the SQLite form of CCDB. OSG guys are working on a solution. David Lawrence has been consulted. Dmitry thinks he has a way forward that involves deploying an in-memory realization of the database.
 +
 +
==== Event Display ====
 +
 +
Dmitry and Thomas will give an update at the next meeting.
 +
 +
=== Other Items ===
 +
 +
* Brad mentioned that our Doxygen documentation pages are down. Mark will take a look.
 +
* Eugene asked about the manner in which we document details of simulation runs and whether enough information is retained to reproduce the results. Mark showed him [https://halldweb.jlab.org/gluex_simulations/sim1.2.1/ the site for sim 1.2.1] as an example of what we do for the "public" simulation runs.

Revision as of 13:18, 16 June 2017

GlueX Offline Software Meeting
Wednesday, June 14, 2017
11:00 am EDT
JLab: CEBAF Center F326/327

Agenda

  1. Announcements
    1. New release of HDDS: version 3.11 (Mark)
    2. hdpm 0.7.0 (Nathan)
  2. Review of minutes from the last meeting (all)
  3. Review of recent pull requests (all)
  4. Review of recent discussion on the Gluex Software Help List.
  5. Action Item Review

Communication Information

Remote Connection

Slides

Talks can be deposited in the directory /group/halld/www/halldweb/html/talks/2017 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .

Minutes

Present:

  • CMU: Naomi Jarvis
  • JLab: Thomas Britton, Brad Cannon, Eugene Chudakov, Hovanes Egiyan, Mark Ito (chair), Dmitry Romanov, Beni Zihlmann
  • NU: Sean Dobbs
  • UConn: Richard Jones
  • Yerevan: Hrach Marukyan

There is a recording of this meeting on the BlueJeans site. Use your JLab credential to access it.

Announcements

  1. New release of HDDS: version 3.11. Mark noted that this release contains recent changes to target and start counter geometry from Simon Taylor.
  2. hdpm 0.7.0. Nathan went over his announcement. New features include
    • AmpTools' new location at GitHub is handled.
    • New package: PyPWA
    • Revised actions for hdpm sub-commands.

Review of minutes from the last meeting

We went over the minutes from May 31.

Progress on the OSG

Richard gave an update on progress with the OSG. For all the details, please see the recording starting. Some notes:

  • scosg16.jlab.org is fully functional as an OSG submit host now.
  • Jobs similar to Data Challenge 2 are going out.
  • Using containers to deliver and run our software to remote nodes:
    • [https://en.wikipedia.org/wiki/Docker_(software) Docker was subject of initial focus, turns out it was designed to solve network (and other system resources) isolation problems, e. g., for deployment of web services on foreign OS.
    • [http://singularity.lbl.gov/ Singularity aimed at mobility of compute, which is the problem we are trying to solve. OSG has embraced it as the on-the-grid-node-at-run-time solution.
    • Richards original solution to was make a straight-forward Singularity container with everything we need to run. That came to 7 GB, too large to use under OASIS (OSG's file distribution system).
    • With guidance from OSG folks, he has implemented a solution that allows us to run. [The details are many and various and will not be recorded here. Again, see the recording.] The broad features are:
      • Singularity on the grid node runs using system files (glibc, ld, system provide shared libraries, etc.) stored outside the container on OASIS.
      • Software is distributed in two parts. The system files mentioned in the previous item, and our standard built-by-us-GlueX-software-stack, distributed via OASIS without any need for containerization.
  • Scripts for running the system
    • osg_container.sh: script that runs on the grid node
    • my_grid_job.py
      • runs generator, simulation, smearing, reconstruction, and analysis
      • hooks for submitting, no knowledge of Condor required
      • will report on job status
    • Richard will send out an email with instructions.
  • Problem with CentOS 6 nodes
    • Some grid nodes are hanging on the hd_root step.
    • CentOS 7 nodes seem OK. CentOS 6 nodes have the problem. Unfortunately, the majority of nodes out there are CentOS-6-like, including all of the nodes deployed at GlueX collaborating university sites.
    • The issue seems to be related to access of the SQLite form of CCDB. OSG guys are working on a solution. David Lawrence has been consulted. Dmitry thinks he has a way forward that involves deploying an in-memory realization of the database.

Event Display

Dmitry and Thomas will give an update at the next meeting.

Other Items

  • Brad mentioned that our Doxygen documentation pages are down. Mark will take a look.
  • Eugene asked about the manner in which we document details of simulation runs and whether enough information is retained to reproduce the results. Mark showed him the site for sim 1.2.1 as an example of what we do for the "public" simulation runs.