GlueX Offline Meeting, January 6, 2016

From GlueXWiki
Jump to: navigation, search

GlueX Offline Software Meeting
Wednesday, January 6, 2016
1:30 pm EST
JLab: CEBAF Center F326/327

Agenda

  1. Announcements
  2. Review of minutes from December 9 (all)
  3. HEP Software Foundation (Amber)
  4. Offline Monitoring (Paul)
  5. Geant4 Update (Richard, David)
  6. Upgrade Xerces C++ from 3.1.1 to 3.1.2 (Mark)
  7. Review of recent pull requests (all)
  8. Data Challenge 3 update (Mark)
  9. Future Commissioning Simulations (all)
  10. Action Item Review

Communication Information

Remote Connection

Slides

Talks can be deposited in the directory /group/halld/www/halldweb/html/talks/2015 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2016/ .

Minutes

Present:

  • CMU: Curtis Meyer
  • FIU: Mahmoud Kamel
  • JLab: Amber Boehnlein, Eugene Chudakov, Mark Ito (chair), David Lawrence, Paul Mattione, Dmitry Romanov, Nathan Sparks, Simon Taylor, Beni Zihlmann
  • NU: Sean Dobbs
  • UConn: James McIntyre

Announcements

Review of minutes from December 9

Sean raised the issue of deleting old pull-request builds last time. Mark implemented auto-deletion of builds older than a month via cron job.

HEP Software Foundation

Amber gave a summary of an effort to collect information and encourage collaboration on software systems among high-energy and nuclear physics collaborations. The approach is community-based. See her slides for details. The Foundation has a website at http://hepsoftwarefoundation.org/ . The "knowledge base" is hosted at http://hepsoftware.org/ and Amber has already added GlueX. More links are planned.

She highlighted the Packaging Working Group, which is collecting information on systems for software distribution within collaborations. She encouraged interested parties to subscribe to the mailing list, hep-sf-packaging-wg@googlegroups.com .

She also told us about Exascale-2015, a project aiming at a machine of hundreds of petaflops. There will be a workshop this summer to discuss nuclear physics applications including experimental tasks.

Offline Monitoring

Paul gave the report. He showed us results from the most recent launches, just before the holidays. He did a launch on Spring 15 and Fall 15 data sets. He pointed us to the Run Browser, Run 4319 as an example. One can clearly pi zeros in the BCAL.

The transition from the old offline monitoring database to the RCDB is in progress. Some info in the browser is compromised as a result. Dmitry pointed out some of the features of the RCDB data viewer at https://halldweb.jlab.org/rcdb .

Geant4 Update

David has succeeded in build a new version of the CPP simulation using Geant 4.10.2 and Clang 3.7.0. He reports that the problem with multi-threaded processing he had seen with previous Geant versions has been fixed. Event rate scales with number of threads. He sees some small differences between multi-threaded enabled code run with one thread versus single-threaded code.

He has added a new generator to sim-recon. It produces muon pairs with a coherent photon beam, including polarization effects. The pair productions code was adopted from Geant4 and the coherent photon generator adopted from Richard Jones's code.

Upgrade Xerces C++ from 3.1.1 to 3.1.2

Version 3.1.2 of the Xerces C++ library came out last March. Mark flashed the release notes. Mark has built it on all of he platforms and tested it on RHEL7 with the b1pi test suite. Nathan has been using this version for his work for months now. We will upgrade to this version in the near future.

More generally, we had a discussion on how to handle future version upgrades for the low-level software packages that we rely on. Upgrades can be disruptive since they force everyone to build a new version of the package itself and recompile all packages that depend on it.

During the discussion we spent some time on the plan for going from GCC 4.4 to 4.8. GCC 4.4 is the default on RHEL/CentOS 6, which appears to be common throughout the collaboration. The latest Geant4 and ROOT6 both require 4.8 at least.

Curtis suggested we poll the collaboration about which compilers people are using at their home institutions.

As far as implementing a change, David suggested making decisions at Offline meetings, issuing an announcement in advance of any change giving people time to object.

Review of recent pull requests

We skimmed through the list going back to early December.

Data Challenge 3

Mark has generated the fake raw data using recent software, 5000 files of 10 GB each. This was done just before the break.

Future Commissioning Simulations

Mark generated a hundred jobs using Tegan's new BCAL code at Sean's request. Sean will look over the results.

New HDPM Documentation

Nathan has transitioned to using the GitHub built-in wiki for Hall D Package Manager documentation. Find it at https://github.com/JeffersonLab/hdpm/wiki .

Intentional gaps in the sequence of run numbers

We continued the discussion of encoding some additional information into run numbers (for real data).

There are two possible advantages:

1) Predictability of run numbers for a future run period.

We have had a policy of setting the run number in Monte Carlo data to match that of real data when that Monte Carlo is supposed to emulate conditions of real runs. This is fine after the run is over, but when creating simulated data for a future run, it is hard to guess what the run numbers will be. Since the run number is encoded in each event of the simulated data, it is hard to change after the fact. If we had a regular scheme for run numbers where those numbers indicate the run period, this problem would be solved.

2) Run number as an identifier of past run periods.

If there was a known correspondence between run period and run number, then file identification is easier. The idea can be extended to have fields for run period, Julian date, and intra-day index for finer grained identification, as is done by STAR.

Traditionally, of course, run numbers increment by one from run to run no matter what. And that seems to have always worked in the past.

We settled on a scheme at the last offline meeting and decided to propose it to the Collaboration, mainly to have something concrete to discuss. The idea is to always start a run period at an increment of 10,000. So far we have used 4,000+ runs, so this Spring run starts with run 10,000 and when we run in Fall we start with 20,000. And so on. In the highly unlikely case of using more than 10k runs in a period, we just skip to the next increment of 10k. Our understanding is this is not hard to implement in the data acquisition system.