GlueX Offline Meeting, June 29, 2018
GlueX Offline Software Meeting
Friday, June 29, 2018
10:00 am EDT
JLab: CEBAF Center A110
BlueJeans: 968 592 007
- 1 Agenda
- 2 Communication Information
- 3 Minutes
- 3.1 Announcements
- 3.2 Review of minutes from the June 15 meeting
- 3.3 Coming Computing Resources
- 3.4 Report from the June 28 SciComp Meeting
- 3.5 Missing CDC hits in recent bggen launch
- 3.6 Planning for Launches
- 3.7 Splitting up Sim-Recon
- 3.8 HDGeant4 Meeting
- 3.9 Review of Pull Requests and Software Help Topics
- 3.10 Reproducibility in Tracking Reconstruction
- Review of minutes from the June 15 meeting (all)
- Future Computing Resources: Chip's talk at the June 28 Roundtable (David, Mark)
- Report from the June 28 SciComp Meeting (Mark, David)
- Reconstruction Launch
- Notifying SciComp about upcoming launches (all)
- Splitting up Sim-Recon (Sean, Mark)
- HDGeant4 Meeting (Richard, Sean, Mark)
- Review of recent pull requests (all)
- Review of recent discussion on the GlueX Software Help List (all)
- Action Item Review (all)
- The BlueJeans meeting number is 968 592 007 .
- Join the Meeting via BlueJeans
Talks can be deposited in the directory
/group/halld/www/halldweb/html/talks/2018 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2018/ .
- CMU: Naomi Jarvis
- FIU: Mahmoud Kamel
- FSU: Sean Dobbs
- JLab: Alex Austregesilo, Mark Ito (chair), David Lawrence, Simon Taylor, Beni Zihlmann
- W&M: Justin Stevens
- Yerevan: Hrach Marukyan
There is a recording of this meeting on the BlueJeans site. Use your JLab credentials to access it.
- Sim-Recon 2.26.0. A periodic release. See the release notes for changes from the last such release.
- GlueX Root Analysis 0.3. There has not been a new release for about a year. This one was overdue.
Review of minutes from the June 15 meeting
We went over the minutes.
- The Experimental Readiness Review for GlueX II was held last Monday. David noted that Eugene Chudakov has already circulated the preliminary report of the committee.
- David mentioned that we may want to revisit our estimate of the amount of Monte Carlo we need. It accounts for nearly half of our current projections. This is in line with what we have always said, but bears re-examination.
Coming Computing Resources
Mark flipped through Chip Watson's talk at the June 28 Round Table.
Increases to the core count on the farm coming this summer were listed on slide 6:
- Current system: 3.5k cores (scaled to Broadwell)
- Major farm upgrade due in July: 88 dual 20 core Skylake compute nodes (farm18), adds 3.5k cores (100% gain)
- Retiring LQCD cluster to be shared for 6 months, 250 dual 8 core (2012 Sandy Bridge) compute nodes, adds 2.4k cores
- Size for the Fall run: 9.4k cores (up 2.7x)
- Note: 2.7k cores go end of life mid way through FY2019, and we might
add only 1.8k new, dropping onsite capacity to 8.5k cores, still up 150%.
For all of the talks presented at the Round Tables see the Indico site.
Report from the June 28 SciComp Meeting
Mark and David gave the report.
- SciComp has replaced PBS with Slurm as the underlying batch scheduler for the LQCD cluster and are looking to do the same for the Experimental Nuclear Physics (ENP) cluster. We now have Auger sitting on top of PBS and SWIF sitting on top of Auger. They are looking at eliminating the Auger layer.
- SciComp is planning on allowing LQCD jobs to run on the ENP cluster at low priority. The reverse will not be allowed. Jobs submitted to ENP will be able to pre-empt LQCD jobs (kill them to make room if no job slots are free).
Missing CDC hits in recent bggen launch
Alex A. reported on the problem. All CDC hits are apparently missing. Cause of the problem is not known. See his presentation of the evidence starting at 40:40 in the recording.
We will have to re-run.
Planning for Launches
We discussed the idea of doing some semi-formal documentation of plans for launches mainly as a communication tool both for the collaboration and for computing resource providers. We should at least write down things that we know are upcoming acknowledging the usual schedule uncertainties. We agreed it was a good idea which we should discuss further.
Splitting up Sim-Recon
Mark reported that there was still some remaining work to get to a production system.
- Environment variable changes. Rather than HALLD_HOME for sim-recon, we will have HALLD_SIM_HOME for the halld_sim repository (originally gluex_sim) and HALLD_RECON_HOME for the halld_recon repository (originally gluex_recon). We decided that the "halld" name reflected the fact that non-GlueX experiments will be using the same software.
- Preserving branches in halld_recon. There were some details to be worked out to have the branches from sim-recon transmit to halld_recon.
- Removing sim pieces from gluex_recon. This had not been done. Removal necessitates changes in the build system.
- Dealing with build_scripts. Build_scripts has not been modified for the new configuration. All work so far has been in getting a working build.
At the last collaboration meeting Richard suggested having a meeting dedicated to issues related to the Geant4 version of our simulation. The first meeting will be Friday, July 6 at 10 am. We will likely meet bi-weekly in the slot opposite the Offline Software meetings.
Review of Pull Requests and Software Help Topics
Reproducibility in Tracking Reconstruction
Beni thinks he has discovered the problem: somehow duplicate, but not identical, FDC pseudo-points are being fed into the tracking. Which point gets used can vary from one run of the program to the next. The source of the duplication is the next thing to investigate.