GlueX Data Challenge Meeting, March 21, 2014

From GlueXWiki
Revision as of 16:55, 24 February 2017 by Marki (Talk | contribs) (Text replacement - "http://argus.phys.uregina.ca/cgi-bin/private" to "https://halldweb.jlab.org/doc-private")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

GlueX Data Challenge Meeting
Friday, March 21, 2014
11:00 am, EDT
JLab: CEBAF Center, F326

Agenda

  1. Announcements
    1. We have frozen the code and configuration.
      • Do we have consensus on commands?
  2. Review of minutes from last time
  3. Launch Status
    • CMU
    • FSU
    • JLab
    • MIT
    • OSG
      • NU
      • UConn
  4. Monitoring Discsussion
  5. File Distribution Discussion
  6. Collection of Site Characteristics

Connection

Bluejeans Applications

The first time that you connect to a bluejeans meeting, you will be prompted to install an application for your web browser. Versions of these for windows and macos have been placed on docDb as document 2435. You can also find apps for iOS and android.

Connect to the Meeting

Join via Polycom room system

    • Network Connection
      • IP Address: 199.48.152.152
      • URL: bjn.vc
    • Enter Meeting ID: 531811405

Join via a Webbrowser

    • If you have not registered with bluejeans through JLab:
    • If you have registered with bluejeans through JLab:
      • URL: [2]
      • Enter Conference ID: 531811405

Join via Phone

    • Dial-in phone number:
      • US or Canada: +1 408 740 7256 or
      • US or Canada: +1 888 240 2560
      • Dial-in international phone numbers
    • Enter Conference ID: 531811405

Minutes

Present:

  • CMU: Paul Mattione
  • IU: Kei Moriya
  • JLab: Mark Ito, Simon Taylor
  • MIT: Justin Stevens
  • NU: Sean Dobbs
  • UConn: Alex Barnes, Richard Jones

Announcements

Yesterday we froze the code and tagged version dc-2.7. This tag brings in the change that Paul made to solve the reproducibility problem (see below). Richard also discussed how the change we had requested, to store the random number seed from bggen in its output for later use by hdgeant, was ill-advised and thus was not included in this tag. He wrote email to the group yesterday on this. He noted that Eugene Chudakov has replaced the original Pythia random number generator with a more modern one. This means our simple practice of seeding the generator with the file number will not get us into trouble with tightly repeating loops.

We went over the conditions page to make sure we are all in agreement on details of the data challenge. We agreed that no changes that affect functionality were needed, but Mark should clean up some of the presentation on the page.

Review of Minutes from Last Time

We went through the minutes of the March 14 meeting.

  • Paul commented on his fix to the reproducibility problem. At root was a dependence on the order of objects stored in an STL map which could change from one run on the data to the next. This occurred in matching tracks in the CDC with those in the FDC.
  • Mark reported that the return of nodes from the LQCD farm to the physics batch farm is complete. SciComp is ready for us to start submitting jobs.
  • We noted that all action items listed at the end of the minutes had in fact been acted on and resolved.

Activities at Production Sites

We reviewed activities at the various sites.

CMU

Paul has started a check-out and build of the latest tag. He reminded us that the various conditions have different processing rates, in terms of events per CPU-seconds. His spreadsheet can help folks plan their processing. The cluster will contribute 384 cores.

FSU

FSU had trouble connecting, understandable due to the evasive action we took to establish communications for this meeting. [Added in press: Aristeidis sent in a report via email.]

JLab

  • Last night, a few jobs were run through with the dc-2.7 tag. They seem to have gone through.
  • Plan is to run 24-hour jobs, with the number of events per job depending on the EM background intensity.
  • JLab will have roughly 1200 cores for DC2.

MIT

Justin led us through his wiki page summarizing status at MIT. He is using OpenStack in two contexts: the MIT Reuse Cluster and FutureGrid (links on his wiki page). He expects to have 300 cores to contribute.

OSG

Richard reported that the next step for OSG running is to set-up a CernVM system to provide access to our software stack at all available OSG sites. The system consists of a distributed set of read-only replica servers.

MIT has access to an LHC tier-2 site in principle, we have not built up any credit on that site and so those resources are probably not available for this exercise.

The number of cores that we get from the OSG is uncertain due to the nature of opportunistic running. Last time we peaked at 10,000 cores and nothing that we know now says that this time should be any different.

IU

Kei reported that there are significant computing resources in Bloomington. These will likely be deployed for future large-scale computing efforts.

Monitoring Discussion

We threw around some ideas on how to monitor quality of the results. We will be creating ROOT histograms for each job. At a minimum we could do spot checks on these histograms. Kei suggested processing the output of hd_dump for each output file to get an event count and prove analyzability. Mark proposed using hddm-xml and counting the number of event tags found using simple tools grep and wc. Richard volunteered to implement a program to do event counting at a lower level; it will be more efficient. We all thought that this would be a nice tool to have. [Added in press: this last change turned out to be a bit more complex than anticipated. We will proceed without it; it may be added to the mix later.]

File Distribution Discussion

We did not come to a general plan for how to ship output data around for use in analysis. Paul pointed out that last time the OSG contribution dwarfed everything else and there the mechanism is the SRM. More thought will have to go into this.