GlueX Data Challenge Meeting, March 14, 2014
- 1 Connection
- 2 Agenda
- 3 Minutes
GlueX Data Challenge Meeting
Friday, March 14, 2014
11:00 am, EDT
JLab: CEBAF Center, F326
The meeting this week will be made using the bluejeans video conferencing solution supported by Jefferson Lab. You can find information from Jefferson Lab on this product as well as some notes that we within GlueX have made GlueX Bluejeans Video Conferencing. You should be able to connect seamlessly with virtually any device. If you have a polycom device, or want to connect through your web browser or tablet, use the Network Connection below. If you want to connect via phone, then use the phone information from below. The main drawback is that you may not be able to connect via a browser using a linux computer.
The first time that you connect to a bluejeans meeting, you will be prompted to install an application for your web browser. Versions of these for windows and macos have been placed on docDb as document 2435. You can also find apps for iOS and android.
Connect to the Meeting
Join via Polycom room system
- Network Connection
- IP Address: 126.96.36.199
- URL: bjn.vc
- Enter Meeting ID: 531811405
- Network Connection
Join via a Webbrowser
Join via Phone
- Dial-in phone number:
- US or Canada: +1 408 740 7256 or
- US or Canada: +1 888 240 2560
- Dial-in international phone numbers
- Enter Conference ID: 531811405
- Dial-in phone number:
- Review of minutes from last time
- Random Number Seeds Procedure (as of 2014-03-07) (Closed?)? Mark/Anyone
- ZFATAL fix (closed?) (Richard)
- Short file issue (all)
- Non-reproducible results (all)
- Running jobs at CMU (Paul)
- Running jobs at NU (Sean)
- Running jobs at MIT (Justin)
- Running jobs at FSU (Aristeidis) 1 2
- Running jobs at JLab (Mark)
- Electromagnetic Backgrounds update. (Closed?) Paul/Kei
- Run number assignments (Mark)
- Proposed Schedule - where do we stand? Should we launch now?
- Launch of Data Challenge Thursday March 20, 2014 (est.).
- Test jobs going successfully by Tuesday March 18.
- Distribution ready by Monday March 17.
- CMU: Paul Mattione, Curtis Meyer
- FSU: Volker Crede, Priyashree Roy, Aristeidis Tsaris
- IU: Kei Moriya
- JLab: Sergey Furletov, Mark Ito (chair), David Lawrence, Sandy Philpott, Dmitry Romanov, Simon Taylor, Beni Zihlmann
- MIT: Justin Stevens
- NU: Sean Dobbs
- UConn: Richard Jones, James McIntyre, Brendan Pratt
Note that we used BlueJeans for telecommunications for this meeting. Was not terrible.
Random Number Seeds Procedure
Curtis described his proposal for a way to handle random number seeds to allow reproduction of results even in a multi-threaded environment.
Mark showed David's message describing the syntax for setting the random number seed for mcsmear on the command line. He also mentioned that David has implemented a switch for mcsmear to have it use the two hdgeant seeds of the incoming event, along with the number 137, as the three random number seeds for each event in mcsmear, roughly consistent with Curtis's proposal. This should be ported to the branch.
Mark reminded us of the seed procedure we used last year.
Mark proposed that we follow the same procedure as last year, only this time use a fixed seed triplet for mcsmear. No further code development would be needed.
Richard thought we should go ahead and fully implement Curtis's scheme. He also volunteered to make the needed modifications to bggen, hdgeant, and the data model to support it. We agreed have him go ahead and make the changes for this challenge.
Short File Issue
Mark ran 2000 jobs yesterday, 1000 events each, with compression turned back on, and did not see any short files. Richard thought that that was not enough to see the problem with the current code (100 jobs of 100,000 events each might give 2 short files). In any case, the rate is much less than we saw a couple of weeks ago.
Richard proposed that if a fix is not found soon, we run without compression. It would simplify the job handling procedure and a short file might be correlated with some other less obvious type of data corruption. Justin and Mark both reported that this increases the REST file size by about a factor of two. Richard will continue to work on a fix, but we will not hold-up starting for it.
Mark promised to copy David's compression on/off switch to the trunk from the branch.
There has been a lot of activity around this topic. For details find the email traffic here.
Simon checked in a fix for t0 variation in wire-based tracking yesterday.
Mark reported non-reproducibility even after this fix. Sean has not seen problems but did not do a lot of trials. Paul has seen differences in multiple trials on the same smeared data, and found that those results are of only two classes with all results within a class identical. Paul has traced the difference causer to a single event; he has been looking particular file he distributed some time ago. Beni confirms that this event is the culprit.
Work continues on this issue.
Running Jobs at FSU
Aristeidis summarized recent running on the cluster at FSU.
He contrasted two sets of running conditions, one with the branch as of last Friday and the other with the 2.4 tag. Memory usage is low and stable with the branch, but the REST files are of two different sizes. With the tag there is about 10% memory variation job to job, but all files are the same size. The files are also a factor of 2 bigger [no compression?].
Nodes at JLab
Mark described Sandy's summary of the current activity and future potential for addition of nodes to the physics batch farm for our data challenge. One of several numbers: we have 911,000 core-hours "in the bank" due to the "loan" of nodes from the physics batch farm to the LQCD cluster over the past few months.
Electromagnetic Background Update
Kei presented slides describing the current state of his studies on using the data challenge software configuration for the pπ+π- pπ+π-π0 final states. Slides covered:
- file sizes
- CPU and virtual memory usage
- reaction selection
- properties of reconstructed events
- momentum vs. polar angle for protons
- momentum vs. polar angle for π0s
- effects of kinematic-fit-confidence-level cuts on event spectra and yield
Run number assignments
Mark pointed out recent changes to the run conditions page.
He proposed that we reduce the fraction of events that we aim for from 15% to 5%, and raise the fraction of 1×107 from 70% to 80%. We thought that that was reasonable.
We agreed on noon, Thursday March 20 as the deadline for code and configuration changes. Processing at the various sites can then proceed as local schedules permit. In the mean time:
- email exchanges on relevant work is encouraged
- we will discuss status at Wednesday's offline meeting, following Paul's suggestion
- we will hold open the possibility of an ad hoc meeting if needed
- Curtis asked that sites send CPU resource estimates to Mark by Monday for compilation
- Implement random-number-seed-storage design in bggen, hdgeant, and HDDM. Richard
- Fix the reproducibility issue. All
- Investigate the short rest file issue further. Richard
- Send resource info to Mark. All site managers/honchos/cognoscenti
- Confirm observation of differences in repeated runs with the 2.5 tag. Mark
- Put compression on/off switch on the trunk. Mark
- Bring David's seed-mcsmear-from-hdgeant-seed scheme to branch. Mark
- Update the conditions page to reflect the reduction in high EM event fraction. Mark