GlueX Software Meeting, February 5, 2019

From GlueXWiki
Jump to: navigation, search

GlueX Software Meeting
Tuesday, February 5, 2019
3:00 pm EST
JLab: CEBAF Center A110
BlueJeans: 968 592 007

Agenda

  1. Announcements
    1. New version set: version_4.1.0.xml (Mark)
    2. Moving version set files to new repository: halld_versions (Mark)
    3. HOW2019 computing workshop: OSG All-Hands + WLCG/HSF: March 18-22 (Indico site)
    4. Slack: go here to join
  2. Review of minutes from the January 22 meeting (all)
  3. Report from the January 29 HDGeant4 Meeting (all)
  4. Report from the SciComp Meeting on January 31
  5. XROOTD and GlueX (Thomas)
  6. ML Monitoring and ML Tracking (Thomas, David)
  7. Review of recent issues and pull requests:
    1. halld_recon
    2. halld_sim
  8. Review of recent discussion on the GlueX Software Help List (all)
  9. Action Item Review (all)

Slides

Talks can be deposited in the directory /group/halld/www/halldweb/html/talks/2019 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2019/ .

Minutes

Present:

  • CMU: Curtis Meyer, Naomi Jarvis
  • JLab: Shankar Adhikari, Alexander Austregesilo, Thomas Britton, Sean Dobbs, Ashley Ernst, Stuart Fegan, Mark Ito (chair), David Lawrence, Simon Taylor, Beni Zihlmann
  • UConn: Richard Jones
  • W&M: Justin Stevens

There is a recording of this meeting on the BlueJeans site. Use your JLab credentials to access it.

Announcements

  1. New version set: version_4.1.0.xml. Mark reviewed the new releases in this version set. There followed a discussion of the utility of the new dbg and opt builds. Mark pointed out the advantages as outlined in his talk from the January 8 Software Meeting. The sense of the group was that the combination of debug symbols and optimization is too valuable to forego, that even in production where optimization is essential, seeing the added information from debug symbols is important when the code crashes. If this combo is built, there is little use for the dbg and opt versions. dbg, although it has better behavior with the debugger, uses as much disk space as combo but runs slowly. opt, although it is a factor of 10 smaller than combo, does not have the debug symbols. So despite having the dbg/opt plan endorsed on January 8, we will go back to combo-only builds.
  2. Moving version set files to new repository: halld_versions. Mark will be moving the location of the version.xml files from their current location in the "dist" directory of /group/halld/www/halldweb/html to a new "halld_versions" directory. This is to facilitate export and update of builds outside of the JLab CUE.
    • The default version set has always referred to the latest tagged build of each package. Sean wondered if rather it should point to the last versions used in reconstruction. Beni and Mark thought that the current practice was best. New versions come out more frequently than reconstruction launches and if the corresponding version set is desired, it is available.
  3. HOW2019 computing workshop: OSG All-Hands + WLCG/HSF: March 18-22 (Indico site). HOW refers to "High energy software foundation," "Open science grid," and "Worldwide LHC Computing Grid." Richard described the upcoming meeting at JLab. We will prepare a contribution to the session at 2:00 on Monday afternoon, "Input from communities/experiments: Input from other experiments."
  4. Slack. Mark reported that he created a new channel, #halld, on the Slack workspace jlab12gev. Slack is a modern chat/messaging application oriented toward the enterprise. Follow this link to join.

Review of minutes from the January 22 meeting

We went over the minutes.

  • The confusion over HDDS geometry has been resolved since the last meeting. It remains to create a repository-based build of the pre-DIRC geometry so we can have a version of halld_sim with consistent builds of hdgeant and mcsmear.
  • We looked at the software-related work packages and had a brief discussion of which working group should be responsible for the various Analysis Software packages. No firm conclusions were reached.
  • David reported that NERSC has realized that our typical workflow is not at all like what they normally see. Their projects typically run for much longer (days not hours) and with more nodes (multiple-multi-core nodes versus one). They are preparing suggestions on best practices for us.

Report from the January 29 HDGeant4 Meeting

We went through the minutes.

  • The problem with "100 times slower" execution of hdgeant and hdgeant4, and the simple solution (CKOV=0) was explained at the meeting by Richard. Thomas has incorporated the fix into MCWrapper.
  • Thomas has completed the new set of comparison simulations (HDG3 vs. HDG4). Folks are looking at the result now.
  • Added in press: Peter Pauli has done an extensive high-level comparison of HDG3 vs. HDG4. More on that to come.

Report from the SciComp Meeting on January 31

We reviewed Sandy Philpott's notes from the meeting. Highlights:

  • Two more ifarm nodes will be deployed soon.
  • Theory jobs on the farm will be pre-emptable by production accounts, not by all accounts.
  • Slurm is almost ready to go into production replacing PBS/Maui.
  • More Lustre disk space is coming.
  • More work disk space is not coming.

XROOTD and GlueX

With help from Sean to get going, Thomas has had some success testing XROOTD. He and Kurt Strosahl got a server running on scosg16 (the OSG submit host) and Thomas was able to run Monte Carlo on his desktop, streaming the random trigger data from scosg16. The initial attempt at doing the same on the grid did not work, but note that the attempt-on-the-grid count = 1 right now. We recalled that Richard ran a proof-of-principle exercise several months ago using a server he stood up at UConn.

ML Monitoring and ML Tracking

Thomas noted that there is a push from Management to explore machine learning at the Lab. Some weeks ago he and Dmitry Romanov gave talks at the ODU-sponsored Machine Learning Fest. He showed a slide demonstrating his work in classifying BCAL occupancy plots, generated by the online monitoring, as either "good" or "bad." He is using Keras/TensorFlow.

David has been doing work using ML to classify tracks based on the raw hits alone[?]. The 39 slides following Thomas's one in the pdf file linked above describe his work. We ran short of time; he will present this material at the next Track Meeting on Thursday.

Review of recent issues and pull requests

Sean drew out attention to the halld_recon issue he opened today, Crashes with analysis library when adding multiple similar reactions #95. Alex has some ideas about the cause.

Code Readiness for Reconstruction of Spring 2018

Sean reported on some bugs that he and Simon are working on that need fixing before we can go ahead. Simon has also submitted a pull request with several changes to tracking. Sean will report back when we are ready to proceed.

[Added in press: Simon closed his pull request. He will resubmit the bug-fix-like changes as a separate pull request from the more fundamental tracking changes. The former will definitely be included in the reconstruction launch.]

HDGeant on Ubuntu on Windows

Thomas reported that HDGeant[3] crashes when run on a gluex_install build on Ubuntu on Windows. The error looks like

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
LOCB/LOCF: address 0x7ffa129ad700 exceeds the 32 bit address space
or is not in the data segments
This may result in program crash or incorrect results
Therefore we will stop here
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

No investigation of the cause has been started thus far. A candidate for an issue, perhaps?