Difference between revisions of "GlueX Data Challenge Meeting, February 7, 2014"

From GlueXWiki
Jump to: navigation, search
m (Text replacement - "/halldweb1.jlab.org/" to "/halldweb.jlab.org/")
 
(10 intermediate revisions by 4 users not shown)
Line 22: Line 22:
 
#* Update on phi=0 geometry issues in CDC? Simon/Richard
 
#* Update on phi=0 geometry issues in CDC? Simon/Richard
 
#* Random number seeds procedure?  Mark/Anyone
 
#* Random number seeds procedure?  Mark/Anyone
#* Electromagnetic Backgrounds update. Kei
+
#* Electromagnetic Backgrounds update. [https://halldweb.jlab.org/wiki/images/3/30/2014-02-07-DataChallenge2_EMrates.pdf Kei], Paul
 
#* Check on event genealogy. Kei
 
#* Check on event genealogy. Kei
 
#* JLab CC is ready, what do we need to tell them. Mark
 
#* JLab CC is ready, what do we need to tell them. Mark
 
#* Preparations of standard distribution/scripts Mark/Richard/Sean
 
#* Preparations of standard distribution/scripts Mark/Richard/Sean
#* 100-job tests at various sites. Jlab/Grid/CMU,NWU, ....
+
#* Report on data management practices - Sean - [https://halldweb.jlab.org/wiki/images/6/6a/DataChallenge-20140207.pdf slides]
#  Open Stack [https://halldweb1.jlab.org/wiki/index.php/Openstack_at_MIT_Overview Overview]
+
#* 100-job tests at various sites. [[Quick Look at DC 2|Jlab]]/Grid/CMU,NWU, ....
 +
#  Open Stack [https://halldweb.jlab.org/wiki/index.php/Openstack_at_MIT_Overview Overview]
 
# Proposed Schedule
 
# Proposed Schedule
 
#* Launch of Data Challenge Thursday Feb.27, 2014 (est.).
 
#* Launch of Data Challenge Thursday Feb.27, 2014 (est.).
Line 33: Line 34:
 
#* Distribution ready by Monday February 24.
 
#* Distribution ready by Monday February 24.
 
# AOT
 
# AOT
 +
 +
=Minutes=
 +
 +
Present:
 +
* '''CMU''': Curtis Meyer, Paul Mattione
 +
* '''FSU''': Volker Crede, Aristeidis Tsaris
 +
* '''IU''': Kei Moriya
 +
* '''JLab''': Eugene Chudakov, Mark Ito (chair), Sandy Philpott, Simon Taylor, Beni Zihlmann
 +
* '''MIT''': Justin Stevens
 +
* '''NWU''': Sean Dobbs
 +
* '''UConn''': Richard Jones
 +
 +
Find a recording of this meeting [https://halldweb.jlab.org/talks/2014-1Q/data_challenge_2014-02-07/index.htm here].
 +
 +
==Agreed upon Parameters==
 +
 +
We reviewed the parameters listed in the [[GlueX Data Challenge Meeting, February 7, 2014#Agenda|agenda above]]. There was no discussion and no changes were proposed.
 +
 +
==Status of Preparations==
 +
 +
===JLab CC is ready, what do we need to tell them===
 +
 +
Sandy gave a report on preparations by JLab Scientific Computing.
 +
 +
* The current farm is at about 1200 cores. The farm typically 1400 cores. For the 25% level test we will need 1250 cores. The plan is to bring another 1000 cores over from LQCD to keep the farm generally usable.
 +
* One option is to end our data challenge at a power outage that is planned for some time during the next 6 to 8 weeks.
 +
* We estimate that we will need a full complement of nodes for about 2 weeks.
 +
* Lattice nodes are available to us because Physics has been lending 32 16-core nodes to the LQCD farm all of December and January.
 +
* It looks like the SRM capability is not a pressing problem for running this challenge at JLab, but does need to be addressed in the medium term. This issue will be discussed at the upcoming collaboration meeting.
 +
 +
===Update on phi=0 geometry issues in CDC?===
 +
 +
Simon sent a plot to Richard illustrating the problem. Beni has also done a study with lead CDC straws and geantinos in which he observed disappearing straws under certain circumstances. Richard has been able to reproduce the problem; it appears to involve the transition in the wrap-around from 359 going back to 0 degrees. He is studying the problem now.
 +
 +
===Random number seeds procedure?===
 +
 +
No progress to report.
 +
 +
===Electromagnetic Backgrounds update===
 +
 +
Kei showed [https://halldweb.jlab.org/wiki/images/3/30/2014-02-07-DataChallenge2_EMrates.pdf slides] detailing studies he has done on the computing resources needed to generate EM background under various conditions. He tried two beam rates, corresponding to 10^7 and 5 ×10^7, and two time intervals, 400 ns and 800 ns. In addition to these four combinations he ran with no beam background at all. For the comparison between no background and 10^7 with an 800 ns gate, the output file size increases by about 10% but the execution time for the job (including bggen, hdgeant, mcsmear, and hd_root) nearly triples. The increase is mainly in hdgeant, as expected. He also did comparisons of the number of neutral showers, and deposited energy in the BCAL and FCAL separately.
 +
 +
Paul was running at CMU with the recently defined versions (see below) and was seeing a third of his 10,000-event[?] jobs crash in reconstruction.
 +
 +
Mark is also seeing crashes with these versions at JLab even though there is no EM background in his jobs. He showed a [[Quick Look at DC 2|wiki page]] where the success rate was only about 20% for 50,000 event runs with parameters unchanged from the first data challenge.
 +
 +
We had been expecting that seg faults in the reconstruction were largely eliminated in recent weeks; this was a surprise to many of us. All aspects of the code and how it is configured need to be examined.
 +
 +
===Check on event genealogy===
 +
 +
No progress to report.
 +
 +
===Preparations of standard distribution/scripts===
 +
 +
Mark prepared tagged versions of HDDS and sim-recon yesterday and put up the [https://halldweb.jlab.org/data_challenge/02/conditions/data_challenge_2.html webpage] for distribution of all relevant software versions and run-time configuration files. These were mentioned in the discussion above.
 +
 +
===Report on data management practices===
 +
 +
Sean went over a quick survey he has done on various data management systems. Richard had asked him to do so at the last meeting. See [https://halldweb.jlab.org/wiki/images/6/6a/DataChallenge-20140207.pdf his slides] for details. He described two systems in detail:
 +
 +
# a custom database with xrootd/SRM
 +
# DIRAC toolkit
 +
 +
In general he found documentation less than comprehensive. Also the ease of modifying systems for our purposes was not clear. We probably should postpone adoption of such a system until after this data challenge.
 +
 +
==Open Stack Overview==
 +
 +
Justin described an effort at MIT to operate a cluster where users instantiate a virtual machine of their choice with their desired software stack when running on a node. See [[Openstack at MIT Overview|his wiki page]] for details. The cluster to develop this system currently stands at 22 blades with 8 cores each. The project is looking for users. Justin asked the group if this was something that we would like to pursue; he volunteered to implement the data challenge jobs as a demo and as an additional production site if things go well. We all thought it was a great idea.
 +
 +
==Workflow Tools at JLab==
 +
 +
Mark reported that he talked to Chris Larrieu of SciComp and they are not ready to deploy a system in time for this data challenge. Curtis suggested that we invite Chris to future data challenge meetings.
 +
 +
==Proposed Schedule==
 +
 +
We endorsed the schedule that Curtis proposed, reproduced below:
 +
 +
* Launch of Data Challenge Thursday Feb.27, 2014 (est.).
 +
* Test jobs going successfully by Tuesday February 25.
 +
* Distribution ready by Monday February 24.
 +
 +
==Action Items==
 +
 +
# Look at the phi = 0 efficiency hole. -> Richard
 +
# Understand random number seed saving and retrieval. -> Mark
 +
# Test new genealogy scheme. -> Kei
 +
# Settle on a realistic gate time for EM background. -> Kei
 +
# Invite Chris Larrieu to future data challenge meetings. -> Mark
 +
# Track the source of crashes in the reconstruction. -> All

Latest revision as of 18:44, 31 March 2015

GlueX Data Challenge Meeting
Friday, February 7, 2014
11:00 pm, EST
JLab: CEBAF Center, F326/7
ESNet: 8542553
ReadyTalk: (866)740-1260, access code: 1833622
ReadyTalk desktop: http://esnet.readytalk.com/ , access code: 1833622

Agenda

  1. Announcements
  2. Agreed upon parameters.
    • Nominal goal is 10 billion events.
    • Phythia/BGGEN events from 7.0GeV to the endpoint.
    • Normal run will be in REST format.
    • Specialized block (at JLab) that retains HDGEANT output (for tape tests)?
    • Target distribution to be nominal LH2 as handled by HDGEANT.
    • srm servers at UCONN and NWU will receive grid events?
    • Jlab events will be written at JLAB.
    • CMU events will be written at CMU.
  3. Status of Preparations
    • Update on phi=0 geometry issues in CDC? Simon/Richard
    • Random number seeds procedure? Mark/Anyone
    • Electromagnetic Backgrounds update. Kei, Paul
    • Check on event genealogy. Kei
    • JLab CC is ready, what do we need to tell them. Mark
    • Preparations of standard distribution/scripts Mark/Richard/Sean
    • Report on data management practices - Sean - slides
    • 100-job tests at various sites. Jlab/Grid/CMU,NWU, ....
  4. Open Stack Overview
  5. Proposed Schedule
    • Launch of Data Challenge Thursday Feb.27, 2014 (est.).
    • Test jobs going successfully by Tuesday February 25.
    • Distribution ready by Monday February 24.
  6. AOT

Minutes

Present:

  • CMU: Curtis Meyer, Paul Mattione
  • FSU: Volker Crede, Aristeidis Tsaris
  • IU: Kei Moriya
  • JLab: Eugene Chudakov, Mark Ito (chair), Sandy Philpott, Simon Taylor, Beni Zihlmann
  • MIT: Justin Stevens
  • NWU: Sean Dobbs
  • UConn: Richard Jones

Find a recording of this meeting here.

Agreed upon Parameters

We reviewed the parameters listed in the agenda above. There was no discussion and no changes were proposed.

Status of Preparations

JLab CC is ready, what do we need to tell them

Sandy gave a report on preparations by JLab Scientific Computing.

  • The current farm is at about 1200 cores. The farm typically 1400 cores. For the 25% level test we will need 1250 cores. The plan is to bring another 1000 cores over from LQCD to keep the farm generally usable.
  • One option is to end our data challenge at a power outage that is planned for some time during the next 6 to 8 weeks.
  • We estimate that we will need a full complement of nodes for about 2 weeks.
  • Lattice nodes are available to us because Physics has been lending 32 16-core nodes to the LQCD farm all of December and January.
  • It looks like the SRM capability is not a pressing problem for running this challenge at JLab, but does need to be addressed in the medium term. This issue will be discussed at the upcoming collaboration meeting.

Update on phi=0 geometry issues in CDC?

Simon sent a plot to Richard illustrating the problem. Beni has also done a study with lead CDC straws and geantinos in which he observed disappearing straws under certain circumstances. Richard has been able to reproduce the problem; it appears to involve the transition in the wrap-around from 359 going back to 0 degrees. He is studying the problem now.

Random number seeds procedure?

No progress to report.

Electromagnetic Backgrounds update

Kei showed slides detailing studies he has done on the computing resources needed to generate EM background under various conditions. He tried two beam rates, corresponding to 10^7 and 5 ×10^7, and two time intervals, 400 ns and 800 ns. In addition to these four combinations he ran with no beam background at all. For the comparison between no background and 10^7 with an 800 ns gate, the output file size increases by about 10% but the execution time for the job (including bggen, hdgeant, mcsmear, and hd_root) nearly triples. The increase is mainly in hdgeant, as expected. He also did comparisons of the number of neutral showers, and deposited energy in the BCAL and FCAL separately.

Paul was running at CMU with the recently defined versions (see below) and was seeing a third of his 10,000-event[?] jobs crash in reconstruction.

Mark is also seeing crashes with these versions at JLab even though there is no EM background in his jobs. He showed a wiki page where the success rate was only about 20% for 50,000 event runs with parameters unchanged from the first data challenge.

We had been expecting that seg faults in the reconstruction were largely eliminated in recent weeks; this was a surprise to many of us. All aspects of the code and how it is configured need to be examined.

Check on event genealogy

No progress to report.

Preparations of standard distribution/scripts

Mark prepared tagged versions of HDDS and sim-recon yesterday and put up the webpage for distribution of all relevant software versions and run-time configuration files. These were mentioned in the discussion above.

Report on data management practices

Sean went over a quick survey he has done on various data management systems. Richard had asked him to do so at the last meeting. See his slides for details. He described two systems in detail:

  1. a custom database with xrootd/SRM
  2. DIRAC toolkit

In general he found documentation less than comprehensive. Also the ease of modifying systems for our purposes was not clear. We probably should postpone adoption of such a system until after this data challenge.

Open Stack Overview

Justin described an effort at MIT to operate a cluster where users instantiate a virtual machine of their choice with their desired software stack when running on a node. See his wiki page for details. The cluster to develop this system currently stands at 22 blades with 8 cores each. The project is looking for users. Justin asked the group if this was something that we would like to pursue; he volunteered to implement the data challenge jobs as a demo and as an additional production site if things go well. We all thought it was a great idea.

Workflow Tools at JLab

Mark reported that he talked to Chris Larrieu of SciComp and they are not ready to deploy a system in time for this data challenge. Curtis suggested that we invite Chris to future data challenge meetings.

Proposed Schedule

We endorsed the schedule that Curtis proposed, reproduced below:

  • Launch of Data Challenge Thursday Feb.27, 2014 (est.).
  • Test jobs going successfully by Tuesday February 25.
  • Distribution ready by Monday February 24.

Action Items

  1. Look at the phi = 0 efficiency hole. -> Richard
  2. Understand random number seed saving and retrieval. -> Mark
  3. Test new genealogy scheme. -> Kei
  4. Settle on a realistic gate time for EM background. -> Kei
  5. Invite Chris Larrieu to future data challenge meetings. -> Mark
  6. Track the source of crashes in the reconstruction. -> All