GlueX Software Meeting, July 7, 2020
GlueX Software Meeting
Tuesday, July 7, 2020
3:00 pm EDT
BlueJeans: 968 592 007
- Review of Minutes from the Last Software Meeting (all)
- Report from Recent HDGeant4 Meetings (all)
- Report from the SciComp Meeting (Mark)
- NERSC status(David)
- OSG Jobs and mcsmear (Thomas)
- Compiler upgrade discussion (all)
- Review of recent issues and pull requests:
- Review of recent discussion on the GlueX Software Help List (all)
- Action Item Review (all)
Present: Alex Austregesilo, Thomas Britton, Sean Dobbs, Mark Ito (chair), Igal Jaegle, Richard Jones, Naomi Jarvis, David Lawrence, Keigo Mizutani, Susan Schadmand, Simon Taylor, Nilanga Wickramaarachchi, Beni Zihlmann
There is a recording of his meeting on the BlueJeans site. Use your JLab credentials to authenticate.
- New version set with upgrade to Geant4: version_4.21.3.xml Ready for testers of an updated version of Geant4.
- multi-threaded hdgeant4 now working Richard implemented patches to Geant4 that fixed issues that prevented multi-threaded running from giving sensible results.
- To address thread safety in the HDGeant4 code proper, he made a change to provide each thread with its own copy of the magnetic field map at the cost of 300 MB of memory per additional thread. This was done out of an abundance of caution; if not necessary he will back out the change to recover the memory used.
- Richard proposed changing the CCDB to so that the magnetic field is no longer suppressed inside selected volumes, in particular in the BCAL. Other volumes that would get changed are the FCAL, TOF, DIRC, and CCAL where the effect was not manifest because the field is so much lower there. He discovered that the suppression resulted field discontinuities that caused crashes in particle propagation reported by Naomi. The suppression had originally been introduced to speed up shower development in GEANT, but Geant4 seems not to be affected by having the field on. We endorsed the proposal.
- New release of Build Scripts: version 1.58 This new release will accommodate our current default as well as subsequent versions of Geant4.
- New version set: version_4.22.0.xml. This version set incorporates the fix to multi-threaded running of HDGeant4 mentioned above.
- Checksum changes for files written to tape The Computer Center is dropping MD5 checksum and will rely on CRC32 sums for tape data validation.
Review of Minutes from the Last Software Meeting
We went over the minutes from June 9.
New Release of JANA: version 0.8.2
David has addressed the request for suppressing of warnings when geometry paths are requested but not present. This is often a normal situation, e.g., when probing the geometry to see how the data should be analyzed. His fix is in a new release of JANA, version 0.8.2.
Nathan Brei continues work on porting halld_recon to use JANA 2.0. We will get a report at a future meeting on this new major release.
Report from Recent HDGeant4 Meetings
We went over the minutes from the June 16 meeting and those from the June 30 meeting without much comment.
Report from the SciComp Meeting
Please see Mark's slides for Scientific Computing news from the Computer Center.
David brought us up-to-date on processing at NERSC. Please see his slides for all of the details. Some broad points:
- We have moved to two-hour jobs rather than the 6 to 8 hour jobs run in past campaigns. This gives us much better access to the backfill mechanism at NERSC.
- The move involved significant development to our workflow to (a) process only selected parts of our 20 GB raw data files in a single job and (b) recombine the resulting output files to correspond to the original 20 GB file.
- One significant challenge has been to run monitoring launches, with their 57 plugins, to give complete ROOT output files for each selection of the raw data file.
- David pointed out that the current version of the Oasis image of our software does not support development (i.e., building new versions of software), only running. He has put up a Docker container that remedies this. Richard has also run into this issue; he has put the missing pieces in an undisclosed location on Oasis.
- Mark remarked that having a developer-friendly version of Oasis would involve very little work, only real estate on Oasis. He will look into this.
- Alex suggested that one way forward is to abandon processing of monitoring launches at NERSC and concentrate on REST file production which uses a smaller set of plugins, plugins that have had better past records of success.
OSG Jobs and mcsmear
Today, Thomas noticed that many jobs were crashing in mcsmear when accessing the SQLite form of the CCDB on the OSG. Two issues:
- What is wrong with the SQLite file?
- Why is it that mcsmear exits with status code = 0 (i.e., success) after bombing?
We were only able to address the first issue. David noticed that today's SQLite file was a bit smaller than usual, making it suspect. Mark promised to recreate the SQLite file and ship it out via Oasis. [Added in press: (a) Mark made good on his promise and regenerated the SQLite file. (b) David had reported this issue via email to Mark early this morning. Suffice it to say that Mark is behind on his email.]