GlueX Offline Meeting, September 18, 2018

From GlueXWiki
Jump to: navigation, search

GlueX Offline Software Meeting
Tuesday, September 18, 2018
3:00 pm EDT (non-standard time)
JLab: CEBAF Center A110
BlueJeans: 968 592 007

Agenda

  1. Announcements
    1. New releases: AmpTools 0.9.4, halld_sim 3.4.0, MCwrapper 2.0.1.
    2. New version of build_scripts: version 1.39
    3. Status of Launches (Alex A.)
  2. Review of minutes from the September 4 meeting (all)
  3. Review of the HDGeant4 Meeting from September 11
  4. Software Items on the Collaboration Meeting Agenda
  5. NERSC Update
  6. Computing and Software Review
  7. Review of recent pull requests:
  8. Review of recent discussion on the GlueX Software Help List (all)
  9. Action Item Review (all)

Communication Information

Remote Connection

Slides

Talks can be deposited in the directory /group/halld/www/halldweb/html/talks/2018 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2018/ .


Minutes

Present:

  • CMU: Naomi Jarvis, Curtis Meyer
  • FSU: Sean Dobbs
  • JLab: Alex Austregesilo, Thomas Britton, Mark Ito (chair), Justin Stevens, Beni Zihlmann
  • MIT: Cristiano Fanelli
  • Raleigh, NC: David Lawrence
  • Yerevan: Hrach Marukyan

There is a recording of this meeting on the BlueJeans site. Use your JLab credentials to access it.

Announcements

  1. New releases: AmpTools 0.9.4, halld_sim 3.4.0, MCwrapper 2.0.1. The linked email has links to the release notes for each package.
    • AmpTools 0.9.4: Support for weighted events
    • halld_sim 3.4.0: Gets parameters for drift time-to-distance relationship from the CCDB.
    • MCwrapper 2.0.1: Uses generator binary names consistent with their containing directories.
      • This version has a problem with the bggen bash script. See Thomas if you are having difficulties.
  2. New version of build_scripts: version 1.39: Fixes build of cernlib to find lapack and blas libraries and successfully create the PAW binary.
  3. Status of Launches.
    • Alex reports that the reconstruction launch over 2017 data is 100% complete. It took 30 days to complete. There was no tape I/O bottleneck this time. Up to 1000 jobs were running on the farm simultaneously. Extrapolating, the Spring 2018 data should take 90 days to reconstruct if resources are similar.
    • A quick analysis launch was done for Lubomir.
    • A complete analysis launch with 60 reactions was also done on the 2017 data[?]. It took only 20 hours to complete.
    • A problem came up during the merging of the output. The merge jobs seem to place a high load on the work disk server, so much that other users are pushed out and do not get decent service. This same activity did not cause any problems with the Lustre disks. Alex is working with Dave Rackley of SciComp to track down the problem.

Review of minutes from the September 4 meeting

We went over the minutes.

Review of the HDGeant4 Meeting from September 11

We went over the minutes.

On the problem with forward protons at high momentum (Issue #66), Beni reports that we sees hits from the downstream FDC chambers missing on tracks both in HDG3 and HDG4 at the wire-based stage, but in the HDG3 case those hits are recovered during time-based tracking. In HDG4, the hits remain lost. He is investigating why this is the case.

Thomas reminded us that the simulations we are studying were done with all physics processes turned off.

Beni also occasionally sees events where there are no FDC hits altogether in HDG4[?], neither simulated nor "truth", when scanning events with hdview2. Justin reported seeing similar pseudo-point multiplicity distributions when comparing 3 vs. 4. These observations seem to be in tension with each other.

Software Items on the Collaboration Meeting Agenda

We looked at the agenda put together by Sean. Looks good!

NERSC Update

David gave us an update. For plots see the recording starting at 51:00.

Since his last report, David completed the monitoring launch at NERSC. He ran 5000 jobs and it took 12 days.

Since then he did a bandwidth test where he used Globus Online to transfer 3 complete runs, over 200 files per run, out to NERSC, outside of the SWIF2 framework. He saw transfer speeds matching the advertised bandwidth of the Lab's link, with peaks at 10 Gb/s (and sometimes a bit more).

After the files were transferred, jobs were submitted against them (about 700 jobs). These took over 5 days to complete and there where slack period where no jobs were running for up to a day.

Computing and Software Review

Mark reviewed the materials we have received so far on the review to be held November 27 and 28.

Review of recent pull requests

Alex mentioned that he has seen crashes with the latest version of halld_recon when he tries to reconstruct a full run. Many of the jobs crash.

Review of recent discussion on the GlueX Software Help List

We went over the list without a lot of comment.

Action Item Review

  1. Fix the work disk.
  2. Find the reason for track loss in HDG4.
    • Find the timing problem.
    • Understand the events with no FDC hits.
  3. Release a new version of gluex_MCwrapper with the bggen bash script fixed.
  4. Find out the schedule for getting more cache disk from SciComp.