GlueX Software Meeting, April 28, 2021
GlueX Software Meeting
Tuesday, April 27, 2021
3:00 pm EDT
BlueJeans: 968 592 007
- 1 Agenda
- 2 Minutes
- 2.1 Announcements
- 2.2 Review of Minutes from the Last Software Meeting
- 2.3 Minutes from the Last HDGeant4 Meeting
- 2.4 Report from the April 20th SciComp Meeting
- 2.5 ROOTWriter and DSelector Updates
- 2.6 Production of Missing Random Trigger Files
- 2.7 Action Item Review
- version set 4.37.1 and mcwrapper v2.5.2 (Thomas)
- New: HOWTO copy a file from the ifarm to home (Mark)
- To come: HOWTO use AmpTools on the JLab farm GPUs (Alex)
- work disk full again (Mark)
- RedHat-6-era builds on group disk slated for deletion (Mark)
- Bug fix release of halld_recon: restore the REST version number (Mark)
- Review of Minutes from the Last Software Meeting (all)
- Minutes from the Last HDGeant4 Meeting (all)
- Report from the April 20th SciComp Meeting (Mark)
- [Illustrative Slides goes here] (Naomi)
- ROOTWriter_and_DSelectorUpdates2021 (Jon)
- Review of recent issues and pull requests:
- Review of recent discussion on the GlueX Software Help List (all)
- Action Item Review (all)
Present: Alexander Austregesilo, Edmundo Barriga, Thomas Britton, Sean Dobbs, Mark Ito (chair), Igal Jaegle, Naomi Jarvis, Simon Taylor, Nilanga Wickramaarachchi, Jon Zarling, Beni Zihlmann
- version set 4.37.1 and mcwrapper v2.5.2. Thomas described the changes in the latest version of MCwrapper. Luminosity is now used to normalize the number of events to produce for each run number requested.
- New: HOWTO copy a file from the ifarm to home. Mark pointed us to the new HOWTO. Sean told us that one could do the same thing from ftp.jlab.org without having to set up an ssh tunnel. Mark will make the appropriate adjustments to the documentation.
- To come: HOWTO use AmpTools on the JLab farm GPUs. Alex described his HOWTO (still under construction).
- work disk full again. Mark described the current work disk crunch, including plots of recent usage history. More clean-up will be needed until the arrival of new work disk servers this summer.
- RedHat-6-era builds on group disk slated for deletion. Mark reminded us that the deletion of these builds has been carried out.
- Bug fix release of halld_recon: restore the REST version number. Mark reviewed the reason for the new version sets.
Review of Minutes from the Last Software Meeting
We went over the minutes from the meeting on March 30th.
- It turns out that there is no pull-request-triggered test for HDGeant4. Mark has volunteered to set one up à la the method Sean instituted for halld_recon and halld_sim.
- Some significant progress has been made on releasing CCDB 2.0.
- The unit tests for CCDB 1.0 have been broken for some time. Mark and Dmitry Romanov found and fixed a problem with the fetch of constants in the form map<string, string> having to do with cache access. This problem is likely in the CCDB 2.0 branch.
- Dmitry has started on reviving the MySQL interface for CCDB 2.0.
- Dmitry has moved us to a new workflow for CCDB pull requests.
- Developers will fork the JeffersonLab/ccdb repository to their personal accounts and work on branches created there as they see fit.
- When a change is ready, they will submit a pull request back to the JeffersonLab/ccdb repository for merging.
- This workflow is common outside Hall D. For example Hall C uses it as do many groups outside the Lab. We may consider using it within Hall D as well. It makes it easier to put up safeguards against spurious errors from inadvertent/faulty commits and any code review mechanism we may want to have. And it solves the problem of the confusing proliferation of branches in the main repository that we have seen. We could move to it with no structural changes to the repositories themselves.
- Sean pointed out that such a workflow might require minor changes to the automatic-pull-request-triggered tests.
Minutes from the Last HDGeant4 Meeting
We went over the minutes from the HDGeant4 meeting on April 6th. Sean noted that the overall focus of the HDGeant4 group is to compare Monte Carlo with data and, using the two simulation engines at our disposal, G3 and G4, try to drill down to see where difference arise at a basic physical level in HDGeant4, and then adjust the model to get agreement with data. This approach is preferred over one where empirical correction factors are imposed as an after-burner on the simulation.
Report from the April 20th SciComp Meeting
Mark presented slides, the first two reproducing the Bryan Hess's agenda for the meeting and the third summarizing some of the discussion. Please see his slides for the details.
Sean asked if we could prioritize recovery of certain files over others. Mark will ask.
Handling of Recon Launch Output from Off-site
Alex raised the issue of disk use when bringing results of reconstruction launches, performed off-site, back to JLab. All data land on volatile, and after reprocessing, get written to cache and from there to tape. He is worried about this procedure for two reasons:
- Data on volatile is subject to deletion (oldest files get deleted first) and we do not want to lose launch output to the disk cleaner.
- The array of problems we have always seen with Lustre disks. Both volatile and cache are Lustre systems.
Mark showed a plot where the amount of data we have on volatile has been well under the deletion level for months now. His claim was that pre-mature deletion from volatile has not been a problem for quite a while. Alex did not think that the graph was accurate; it showed too little variation in usage level when Alex knows that there has been significant activity on the disk, an argument that Mark found convincing. Mark will have to check on the source of his data. That aside, disk usage in the context should be reviewed.
Consolidation of Skim Files on to Fewer Tapes
Sean has noticed that at times reprocessing skimmed data can take a long time due to retrieval times of files from tape. He suspects that this is because the files are scattered on many tapes and so a large number of tape mounts and file skips are needed to get all of the data. He proposed a project where, for certain skims, we re-write the data on to a smaller number of tapes.
Mark had some comments:
- We should only start such a project on skims for which there is some reasonable expectation that retrieval will be done repeatedly in the future. The consolidation step itself involves reading and writing all of the files of interest and so reading those files has to happen at least a couple of times after consolidation before the exercise shows a net gain.
- The way we write data to tape, by putting skim files on the write-through cache over several weeks guarantees that the files will be scattered on different tapes. With the write through cache we would do better to buffer data on disk until a significant fraction of one tape has been accumulated and then manually trigger the write to tape.
- It is possible to set-up tape "volume sets" (a set of specific physical tapes) in advance in the tape library and then directed selected data types to specific volume sets. The tapes in the volume sets will then be dense in the data types so directed. This is already done for raw data but there is no structural impediment to doing it for other types of data. This approach has the advantage there there is no need to develop software to make it happen.
Something does have to be done on this front. Sean and Mark will discuss the issue further.
ROOTWriter and DSelector Updates
Jon presented a list of ideas and improvements for our data analysis software. See his wiki page for the complete list.
The items and subsequent discussion were in two broad classes:
- How we use the ROOT toolkit: Are there more efficient practices? Are there features we don't exploit but should?
- How we analyze the data: Are there new features in the Analysis Library that we should develop? Should the contents of the REST format be expanded? Are there things we do in Analysis that should be done in reconstruction or vice-versa?
One thing that came up was our use of TLorentzVector. Jon has seen others use a smaller (member-data-wise) class. Alex pointed out that the current ROOT documentation has marked this class as deprecated. Yet our use of TLorentzVector is ubiquitous. Several expressed interest in looking into this more closely.
Jon encouraged us to think about where we might want to expend effort. This will likely come up again at a future meeting.
Production of Missing Random Trigger Files
Sean reported that he and Peter Pauli are very close to filling in all of the gaps in the random trigger file coverage for Fall 2018. Peter may give a presentation on this work at a future meeting.
Action Item Review
- Set up pull-request-triggered tests for HDGeant4. (Mark)
- Modify the documentation to feature ftp.jlab.org. (Mark)
- Prioritizing specific tapes to be recovered. (Mark)
- Review disk usage when re-repatriating recon launch data. (Alex, Mark)
- Check input data for volatile usage plot. (Mark)
- Make a plan for structuring tape writing for efficient file retrieval. (Sean, Mark)
- Look into how we use TLorentzVector (Alex, Simon, Jon)
- Think about Jon's list of improvements. (all)