GlueX Data Challenge Meeting, February 14, 2014

From GlueXWiki
Jump to: navigation, search

GlueX Data Challenge Meeting
Friday, February 14, 2014
11:00 pm, EST
JLab: CEBAF Center, L207
ESNet: 8542553
ReadyTalk: (866)740-1260, access code: 1833622
ReadyTalk desktop: http://esnet.readytalk.com/ , access code: 1833622

Agenda

  1. Announcements
  2. Status of Preparations
    • Update on phi=0 geometry issues in CDC? Solved?
    • Random number seeds procedure? Mark/Anyone
    • Running jobs at CMU Paul
    • Running jobs at NU Sean
    • Running jobs at MIT Justin
      • 100 jobs with 10K events each, no EM background: only 2 failed (with DTrackFitterKalmanSIMD)
    • 100-job tests at various sites. Jlab/Grid/CMU,NWU, ....
    • Electromagnetic Backgrounds update. Paul/Kei updated studies
    • Check on event genealogy. Kei
    • Preparations of standard distribution/scripts Mark/Richard/Sean
    • Report on data management practices - Sean -
  3. Proposed Schedule
    • Launch of Data Challenge Thursday Feb.27, 2014 (est.).
    • Test jobs going successfully by Tuesday February 25.
    • Distribution ready by Monday February 24.
  4. AOT

Minutes

Present:

  • CMU: Paul Mattione, Curtis Meyer
  • FSU: Volker Crede, Aristeidis Tsaris
  • IU: Kei Moriya
  • JLab: Mark Ito (chair), Chris Larrieu, Simon Taylor
  • NWU: Sean Dobbs

Find a recording of this meeting here.

Update on phi=0 geometry issues in CDC

Richard checked in a change. Simon and Beni confirm that this fixes the issue.

Random number seeds procedure

Still nothing new to report.

Running jobs at CMU

Paul has continued running jobs. Last time he reported a third of jobs crashing. Since then he increased the memory allowed from 2 GB to 5 GB. Now the crash rate is down to 5%.

Simon has been running Valgrind, but he sees a lot of issues being flagged in xerces, which is likely not our problem.

Paul has also been running hd_root withing the time command and is getting memory usage varying between 3 and 5 GB.

Running jobs at NU Sean

Sean has been seeing start up problems with hd_root, which seems to be correlated with whether the input and/or output files are on network disk or on the local machine. In particular, he sees crashes when reading in the geometry. This has not been seen at other sites, although the exact configuration which is giving him problems may not have been tried much. He is tracking down the error.

Running jobs at MIT

Justin has succeeded in running 100 jobs with 10 k events each, no EM background, at MIT on the OpenStack Cluster. He sees only 2 failures, those in DTrackFitterKalmanSIMD.

Running jobs at JLab

Mark updated the DC2 tag to bring in Simon's latest changes to tracking, as well as Richard's fix at φ=0 in the CDC. Mark ran 400 50-k-event jobs with same configuration as last week, except that the requested memory was doubled, at 3.0 GB up from 1.5 GB. He saw a much improved success rate, 84% success vs. 26% reported last week. A lot of the improvement was a reduced rate of breaching the memory limit, but not all. Two percent of the jobs failed because they exceeded 15 hours of wall time. At least one successful job took only 9 hours to finish.

Mark has also started to run very short jobs, to help Simon debug the tracking crashes. He ran 10,000 jobs of 1000 events each last weekend, and 4,000 jobs of 500 events each with the new DC2 version. For these, he is keeping the smeared data that is input to hd_root. Simon was able to use last weekend's jobs to find the errors discovered during the week.

Electromagnetic Backgrounds update

Kei gave us an update on this performance comparisons with and without electromagnetic background. See his slides for details. He compares memory usage and execution time for various combinations of background time gate and beam intensity. He also showed the effect on reconstructed quantities in slides titled as follows:

  • Increase in Showers
  • FCAL Shower time vs. energy
  • Projections of Time and Energy of FCAL Showers
  • FCAL Shower Time vs. Distance from Target
  • FCAL Energy Showers After Timing Cut
  • BCAL
  • Generated E, p
  • Generated p vs. θ
  • Tracks

[Added in press: Kei sent out an email directing us to an update on this report, answering some questions raised at this meeting.]

Check on event genealogy

During his studies on EM background Kei noticed that the energy and momentum of the sum of all primary particles (generated E, p, above) is now much more sensible than before. There are still some low-level tails that may have to do with particles re-entering the detector from the calorimeters and getting mis-classified.

SRM capability

We started to discuss this issue, but decided to postpone a full discussion until the collaboration meeting.

Next Meeting

We will not meet next week due to the collaboration meeting, but will start up the week after. The main task for us to to continue tracking down the cause of job crashes.