Difference between revisions of "GlueX Offline Meeting, February 9, 2018"

From GlueXWiki
Jump to: navigation, search
m (Optional Packages)
m (SciComp Meeting Report)
 
Line 113: Line 113:
 
#* The second shelf of traditional raid is up and running. Our work disk quota has been increased to 110 TB from 66 TB.
 
#* The second shelf of traditional raid is up and running. Our work disk quota has been increased to 110 TB from 66 TB.
 
#* Hall B migration to the new work disk is underway. That should free up 85 TB of cache space. Mark emphasized that Hall D needs more cache space.
 
#* Hall B migration to the new work disk is underway. That should free up 85 TB of cache space. Mark emphasized that Hall D needs more cache space.
# Hall B needs 40 GB per job for the Java virtual machine. This is causing cores to go idle as nodes are running in a memory-limited mode.
+
# '''Hall B needs 40 GB per job for the Java virtual machine'''. This is causing cores to go idle as nodes are running in a memory-limited mode.
# '''New tape drives, playing with four new LT07 drives and four new LT08 drives.'' They are planning the upgrade path.
+
# '''New tape drives, playing with four new LT07 drives and four new LT08 drives.''' They are planning the upgrade path.
  
 
=== AMD benchmark results ===
 
=== AMD benchmark results ===

Latest revision as of 15:00, 21 February 2018

GlueX Offline Software Meeting
Friday, February 9, 2018
10:00 am EST
JLab: CEBAF Center A110
BlueJeans: 968 592 007

Agenda

  1. Announcements
    1. Reclamation of halld-scratch volume set (Mark)
    2. New top-level directory: /mss/halld/detectors. E. g., /mss/halld/detectors/DIRC (Mark)
    3. New sim-recon release: version 2.23.0 (Mark)
    4. New simple email list: online_calibrations
    5. Launches
  2. Review of minutes from the January 26 meeting (all)
  3. Collaboration Meeting
  4. SciComp Meeting Report (Mark)
    1. Change to fair share allocations...
    2. ENP consumption of disk space under /work
    3. Hall B needs 40 GB for Java virtual machine
    4. New tape drives, playing with four new LT07 drives and four new LT08 drives
  5. New releases: build_scripts 1.26, rcdb 0.03, sqlitecpp 2.2.0, sim-recon 2.26.0, hdgeant4 1.6.0 (Mark)
  6. AMD benchmark results (Sean)
  7. GlueX + NERSC (David)
  8. Review of recent pull requests (all)
  9. Review of recent discussion on the GlueX Software Help List (all)
  10. Meeting on Containers, 11:30 today, A110. Same BlueJeans number.
  11. Action Item Review (all)

Communication Information

Remote Connection

Slides

Talks can be deposited in the directory /group/halld/www/halldweb/html/talks/2018 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2018/ .

Minutes

Present:

  • CMU: Curtis Meyer
  • FSU: Sean Dobbs
  • JLab: Alex Austregesilo, Thomas Britton, Hovanes Egiyan, Mark Ito (chair), David Lawrence, Beni Zihlmann
  • Yerevan: Hrach Marukyan

There is a recording of this meeting on the BlueJeans site. Use your JLab credentials to access it.

Announcements

  1. Reclamation of halld-scratch volume set. 19 of our tapes are about to be erased.
  2. New top-level directory: /mss/halld/detectors. A new tape directory for data related to specific detectors, e. g., /mss/halld/detectors/DIRC.
  3. New sim-recon release: version 2.23.0. This release includes Simon's newly tuned parameters for track matching to calorimeter clusters.
  4. New simple email list: online_calibrations. Sean has created a Simple Email List for those interested in keeping track of the latest calibration results.
  5. Launches. Thomas caught us up.
    • There is a monitoring launch that will start soon.
    • The reconstruction launch with recent software on Spring 2016 data has started.
    • Several people are looking at the anomaly that Beni has identified, visible in some TOF monitoring plots (though the TOF is blameless).

Review of minutes from the January 26 meeting

We went over the minutes.

Optional Packages

In the context of discussing the recent changes with using the RCDB in the offline software, we started a more general discussion of whether certain packages, and RCDB in particular should be optional. David explained that there are certain packages (among them RCDB, the ET system) that have been maintained as optional. The mechanism:

  1. If the "home" environment variable (e. g., RCDB_HOME) is not defined, then at build time, the build system (i. e., SCons) does not include the corresponding include paths and libraries in the build commands. No dependence on the package is built into the resulting code. A warning that this is happening is printed during the build. This does not halt the build.
  2. If any program requires the package in question, then it is built so that when the program is run, it exits immediately with an error informing the user that the environment variables for the missing package needs to be defined and that package built. Then the build of the program can be redone.
    • To implement this behavior appropriate C-preprocessor directives need to be included in the source code so that the resulting program has the appropriate behavior depending on the setting or non-setting of the home environment variable.

There was a lot of discussion on whether this is a useful feature that should be maintained, mainly between two of us. An incomplete summary:

  • David. Having the ability to exclude packages whose functionality is completely irrelevant to the software builder is very convenient: extraneous packages need not be built and the resulting disk footprint of the code is reduced.
  • Mark. Having this flexibility requires coding and creates trap doors that code software builders can fall into. If a needed environment variable is missing, then the warning at build time can easily be missed and the error at run time may leave the user (who may not be the builder) at a loss as to how to proceed.

Mark will put the issue on the agenda of a future meeting.

Release Management Thoughts

In the course of discussing Sean's presentation from last time, Thomas brought up the idea of breaking up the sim-recon repository into two repositories. In a moment of inspiration he came up with a concept for the split: one repository for simulation and one for reconstruction. This would simplify the task of "release management" (as defined last time). The technical advantage is that simulation code can be versioned independently of the reconstruction code. Right now, Sean has to maintain a reconstruction-fixed-simulation-changing branch of sim-recon to get the right behavior. If the two functions were versioned separately, this recon-fixed-sim-changing property would be manifest.

We noted that the two sides (sim and recon) are functionally closely tied together. The question is the degree of independence of the development streams on the two sides. E. g. if drift time in tracking chambers is improved in simulation, then does the tracking reconstruction have to change? Likely not, the sim side can go ahead independently. On the other hand, if there is reconstruction code that does one thing for real data and another for simulation, and the simulation is improved so that unequal treatment is no longer necessary, then both sides have to change together. In the latter case it would be easier if both sides were in the same repository.

Mark will put the issue on the agenda of a future meeting.

Not-the-TOF Anomaly in Monitoring Histograms

Alex noted that the problem first appeared several months ago when the material maps were changed in the CCDB. Beni has reported that the most recent version of the code does not exhibit the problem, so the current mystery is how the problem could have possibly fixed itself.

GlueX + NERSC

David has succeeded in running GlueX reconstruction jobs on two of the NERSC supercomputers.

  • Cori I: Haswell (comparable to the JLab farm)
  • Cori II: Knight's Landing (KNL)

He analyzed two runs on both architectures. See the results on his slide. He notes that the KNL jobs run 2.4 times slower than the Haswell even though they are using four times the number of threads, i. e., ten times slower on a per thread basis.

Collaboration Meeting

Sean has put together a nice little session for us on the Collaboration Meeting agenda.

SciComp Meeting Report

Mark reported items from the Scientific Computing meeting held Thursday, February 1.

  1. Change to fair share allocations.... The change was discussed at the meeting.
  2. ENP consumption of disk space under /work.
    • The second shelf of traditional raid is up and running. Our work disk quota has been increased to 110 TB from 66 TB.
    • Hall B migration to the new work disk is underway. That should free up 85 TB of cache space. Mark emphasized that Hall D needs more cache space.
  3. Hall B needs 40 GB per job for the Java virtual machine. This is causing cores to go idle as nodes are running in a memory-limited mode.
  4. New tape drives, playing with four new LT07 drives and four new LT08 drives. They are planning the upgrade path.

AMD benchmark results

Sean has purchased a box with new AMD EPYC processors and ran benchmarks of hd_root on it. For comparison he ran the same tests on gluon119 which has Intel Xeon processors. See his slide for details and results. Scaling for the two systems is comparable as the number of threads is increased, though the AMD processors come a one-third the price (on the CPU package itself).

Meeting on Containers

Mark announced a meeting later in the day to discuss use of containers (Docker, Singularity) in various computing contexts (NERSC, OSG, JLab farm, personal laptops). There is a lot of ground to cover here; a series of meetings is likely.