Difference between revisions of "GlueX Data Challenge Meeting, March 14, 2014"

From GlueXWiki
Jump to: navigation, search
(Agenda)
m (Text replacement - "http://argus.phys.uregina.ca/cgi-bin/private" to "https://halldweb.jlab.org/doc-private")
 
(5 intermediate revisions by 2 users not shown)
Line 9: Line 9:
 
The meeting this week will be made using the bluejeans video conferencing solution supported by Jefferson Lab. You can find information from [https://cc.jlab.org/bluejeans Jefferson Lab] on this product as well as some notes that we within GlueX have made [[GlueX Bluejeans Video Conferencing]]. You should be able to connect seamlessly with virtually any device. If you have a polycom device, or want to connect through your web browser or tablet, use the <i>Network Connection</i> below. If you want to connect via phone, then use the <i>phone information</i> from below. The main drawback is that you may not be able to connect via a browser using a linux computer.  
 
The meeting this week will be made using the bluejeans video conferencing solution supported by Jefferson Lab. You can find information from [https://cc.jlab.org/bluejeans Jefferson Lab] on this product as well as some notes that we within GlueX have made [[GlueX Bluejeans Video Conferencing]]. You should be able to connect seamlessly with virtually any device. If you have a polycom device, or want to connect through your web browser or tablet, use the <i>Network Connection</i> below. If you want to connect via phone, then use the <i>phone information</i> from below. The main drawback is that you may not be able to connect via a browser using a linux computer.  
 
==Bluejeans Applications==
 
==Bluejeans Applications==
The first time that you connect to a bluejeans meeting, you will be prompted to install an application for your web browser. Versions of these for windows and macos have been placed on docDb as [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2435 document 2435]. You can also find apps for [https://itunes.apple.com/us/app/blue-jeans/id560788314?mt=8 iOS] and [https://play.google.com/store/apps/details?id=com.bluejeansnet.Base android].
+
The first time that you connect to a bluejeans meeting, you will be prompted to install an application for your web browser. Versions of these for windows and macos have been placed on docDb as [https://halldweb.jlab.org/doc-private/DocDB/ShowDocument?docid=2435 document 2435]. You can also find apps for [https://itunes.apple.com/us/app/blue-jeans/id560788314?mt=8 iOS] and [https://play.google.com/store/apps/details?id=com.bluejeansnet.Base android].
  
 
==Connect to the Meeting==
 
==Connect to the Meeting==
Line 47: Line 47:
 
# Running jobs at JLab (Mark)
 
# Running jobs at JLab (Mark)
 
## [[Nodes at JLab for Data Challenge 2|Nodes at JLab]]
 
## [[Nodes at JLab for Data Challenge 2|Nodes at JLab]]
# Electromagnetic Backgrounds update. (Closed?) Paul/Kei
+
# Electromagnetic Backgrounds update. (Closed?) Paul/[https://halldweb.jlab.org/wiki/images/4/4c/2014-03-14-DC2.pdf Kei]
# [https://halldweb1.jlab.org/data_challenge/02/conditions/data_challenge_2.html Run number assignments] (Mark)
+
# [https://halldweb.jlab.org/data_challenge/02/conditions/data_challenge_2.html Run number assignments] (Mark)
 
# Proposed Schedule - where do we stand? Should we launch now?
 
# Proposed Schedule - where do we stand? Should we launch now?
 
#* Launch of Data Challenge Thursday March 20, 2014 (est.).
 
#* Launch of Data Challenge Thursday March 20, 2014 (est.).
Line 58: Line 58:
  
 
Present:
 
Present:
 +
* '''CMU''': Paul Mattione, Curtis Meyer
 +
* '''FSU''': Volker Crede, Priyashree Roy, Aristeidis Tsaris
 +
* '''IU''': Kei Moriya
 +
* '''JLab''': Sergey Furletov, Mark Ito (chair), David Lawrence, Sandy Philpott, Dmitry Romanov, Simon Taylor, Beni Zihlmann
 +
* '''MIT''': Justin Stevens
 +
* '''NU''': Sean Dobbs
 +
* '''UConn''': Richard Jones, James McIntyre, Brendan Pratt
 +
 +
Note that we used BlueJeans for telecommunications for this meeting. Was not terrible.
 +
 +
==Random Number Seeds Procedure==
 +
 +
Curtis described [https://mailman.jlab.org/pipermail/halld-offline/2014-March/001561.html his proposal] for a way to handle random number seeds to allow reproduction of results even in a multi-threaded environment.
 +
 +
Mark showed [https://mailman.jlab.org/pipermail/halld-offline/2014-March/001563.html David's message] describing the syntax for setting the random number seed for mcsmear on the command line. He also mentioned that David has implemented a switch for mcsmear to have it use the two hdgeant seeds of the incoming event, along with the number 137, as the three random number seeds for each event in mcsmear, roughly consistent with Curtis's proposal. This should be ported to the branch.
 +
 +
Mark reminded us of the [[Random_number_seeds_for_Data_Challenge_1|seed procedure we used last year]].
 +
 +
Mark proposed that we follow the same procedure as last year, only this time use a fixed seed triplet for mcsmear. No further code development would be needed.
 +
 +
Richard thought we should go ahead and fully implement Curtis's scheme. He also volunteered to make the needed modifications to bggen, hdgeant, and the data model to support it. We agreed have him go ahead and make the changes for this challenge.
 +
 +
==Short File Issue==
 +
 +
Mark ran 2000 jobs yesterday, 1000 events each, with compression turned back on, and did not see any short files. Richard thought that that was not enough to see the problem with the current code (100 jobs of 100,000 events each might give 2 short files). In any case, the rate is much less than we saw a couple of weeks ago.
 +
 +
Richard proposed that if a fix is not found soon, we run without compression. It would simplify the job handling procedure and a short file might be correlated with some other less obvious type of data corruption. Justin and Mark both reported that this increases the REST file size by about a factor of two. Richard will continue to work on a fix, but we will not hold-up starting for it.
 +
 +
Mark promised to copy David's compression on/off switch to the trunk from the branch.
 +
 +
==Non-Reproducible Results==
 +
 +
There has been a lot of activity around this topic. For details find the email traffic [https://mailman.jlab.org/pipermail/halld-offline/2014-March/thread.html#start here].
 +
 +
Simon checked in a fix for t0 variation in wire-based tracking yesterday.
 +
 +
Mark reported non-reproducibility even after this fix. Sean has not seen problems but did not do a lot of trials. Paul has seen differences in multiple trials on the same smeared data, and found that those results are of only two classes with all results within a class identical. Paul has traced the difference causer to a single event; he has been looking particular file he distributed some time ago. Beni confirms that this event is the culprit.
 +
 +
Work continues on this issue.
 +
 +
==Running Jobs at FSU==
 +
 +
Aristeidis summarized recent running on the cluster at FSU.
 +
 +
He contrasted two sets of running conditions, one with the [http://hadron.physics.fsu.edu/~aristeidis/dc2_3_11.pdf branch as of last Friday] and the other with the [http://hadron.physics.fsu.edu/~aristeidis/dc2_3_14.pdf 2.4 tag]. Memory usage is low and stable with the branch, but the REST files are of two different sizes. With the tag there is about 10% memory variation job to job, but all files are the same size. The files are also a factor of 2 bigger [no compression?].
 +
 +
==Nodes at JLab==
 +
 +
Mark described Sandy's summary of the [[Nodes at JLab for Data Challenge 2|current activity and future potential for addition of nodes]] to the physics batch farm for our data challenge. One of several numbers: we have 911,000 core-hours "in the bank" due to the "loan" of nodes from the physics batch farm to the LQCD cluster over the past few months.
 +
 +
==Electromagnetic Background Update==
 +
 +
Kei presented slides describing the current state of his [[media:2014-03-14-DC2.pdf|studies on using the data challenge software configuration]] for the p&pi;<sup>+</sup>&pi;<sup>-</sup> p&pi;<sup>+</sup>&pi;<sup>-</sup>&pi;<sup>0</sup> final states. Slides covered:
 +
* file sizes
 +
* CPU and virtual memory usage
 +
* reaction selection
 +
* properties of reconstructed events
 +
* momentum vs. polar angle for protons
 +
* momentum vs. polar angle for &pi;<sup>0</sup>s
 +
* effects of kinematic-fit-confidence-level cuts on event spectra and yield
 +
 +
==Run number assignments==
 +
 +
Mark pointed out recent changes to the [https://halldweb.jlab.org/data_challenge/02/conditions/data_challenge_2.html run conditions page].
 +
 +
He proposed that we reduce the fraction of events that we aim for from 15% to 5%, and raise the fraction of 1&times;10<sup>7</sup> from 70% to 80%. We thought that that was reasonable.
 +
 +
==Schedule==
 +
 +
We agreed on noon, Thursday March 20 as the deadline for code and configuration changes. Processing at the various sites can then proceed as local schedules permit. In the mean time:
 +
* email exchanges on relevant work is encouraged
 +
* we will discuss status at Wednesday's offline meeting, following Paul's suggestion
 +
* we will hold open the possibility of an ad hoc meeting if needed
 +
* Curtis asked that sites send CPU resource estimates to Mark by Monday for compilation
 +
 +
==Action Items==
 +
 +
# Implement random-number-seed-storage design in bggen, hdgeant, and HDDM. Richard
 +
# Fix the reproducibility issue. All
 +
# Investigate the short rest file issue further. Richard
 +
# Send resource info to Mark. All site managers/honchos/cognoscenti
 +
# Confirm observation of differences in repeated runs with the 2.5 tag. Mark
 +
# Put compression on/off switch on the trunk. Mark
 +
# Bring David's seed-mcsmear-from-hdgeant-seed scheme to branch. Mark
 +
# Update the conditions page to reflect the reduction in high EM event fraction. Mark

Latest revision as of 17:00, 24 February 2017

GlueX Data Challenge Meeting
Friday, March 14, 2014
11:00 am, EDT
JLab: CEBAF Center, F326

Connection

Bluejeans Information

The meeting this week will be made using the bluejeans video conferencing solution supported by Jefferson Lab. You can find information from Jefferson Lab on this product as well as some notes that we within GlueX have made GlueX Bluejeans Video Conferencing. You should be able to connect seamlessly with virtually any device. If you have a polycom device, or want to connect through your web browser or tablet, use the Network Connection below. If you want to connect via phone, then use the phone information from below. The main drawback is that you may not be able to connect via a browser using a linux computer.

Bluejeans Applications

The first time that you connect to a bluejeans meeting, you will be prompted to install an application for your web browser. Versions of these for windows and macos have been placed on docDb as document 2435. You can also find apps for iOS and android.

Connect to the Meeting

Join via Polycom room system

    • Network Connection
      • IP Address: 199.48.152.152
      • URL: bjn.vc
    • Enter Meeting ID: 531811405

Join via a Webbrowser

    • If you have not registered with bluejeans through JLab:
    • If you have registered with bluejeans through JLab:
      • URL: [2]
      • Enter Conference ID: 531811405

Join via Phone

    • Dial-in phone number:
      • US or Canada: +1 408 740 7256 or
      • US or Canada: +1 888 240 2560
      • Dial-in international phone numbers
    • Enter Conference ID: 531811405

Agenda

  1. Announcements
  2. Review of minutes from last time
  3. Random Number Seeds Procedure (as of 2014-03-07) (Closed?)? Mark/Anyone
    1. Proposal (Curtis)
    2. Setting an initial seed in mcsmear (Mark)
    3. What we did last time (Mark)
  4. ZFATAL fix (closed?) (Richard)
  5. Short file issue (all)
  6. Non-reproducible results (all)
  7. Running jobs at CMU (Paul)
  8. Running jobs at NU (Sean)
  9. Running jobs at MIT (Justin)
  10. Running jobs at FSU (Aristeidis) 1 2
  11. Running jobs at JLab (Mark)
    1. Nodes at JLab
  12. Electromagnetic Backgrounds update. (Closed?) Paul/Kei
  13. Run number assignments (Mark)
  14. Proposed Schedule - where do we stand? Should we launch now?
    • Launch of Data Challenge Thursday March 20, 2014 (est.).
    • Test jobs going successfully by Tuesday March 18.
    • Distribution ready by Monday March 17.
  15. AOT

Minutes

Present:

  • CMU: Paul Mattione, Curtis Meyer
  • FSU: Volker Crede, Priyashree Roy, Aristeidis Tsaris
  • IU: Kei Moriya
  • JLab: Sergey Furletov, Mark Ito (chair), David Lawrence, Sandy Philpott, Dmitry Romanov, Simon Taylor, Beni Zihlmann
  • MIT: Justin Stevens
  • NU: Sean Dobbs
  • UConn: Richard Jones, James McIntyre, Brendan Pratt

Note that we used BlueJeans for telecommunications for this meeting. Was not terrible.

Random Number Seeds Procedure

Curtis described his proposal for a way to handle random number seeds to allow reproduction of results even in a multi-threaded environment.

Mark showed David's message describing the syntax for setting the random number seed for mcsmear on the command line. He also mentioned that David has implemented a switch for mcsmear to have it use the two hdgeant seeds of the incoming event, along with the number 137, as the three random number seeds for each event in mcsmear, roughly consistent with Curtis's proposal. This should be ported to the branch.

Mark reminded us of the seed procedure we used last year.

Mark proposed that we follow the same procedure as last year, only this time use a fixed seed triplet for mcsmear. No further code development would be needed.

Richard thought we should go ahead and fully implement Curtis's scheme. He also volunteered to make the needed modifications to bggen, hdgeant, and the data model to support it. We agreed have him go ahead and make the changes for this challenge.

Short File Issue

Mark ran 2000 jobs yesterday, 1000 events each, with compression turned back on, and did not see any short files. Richard thought that that was not enough to see the problem with the current code (100 jobs of 100,000 events each might give 2 short files). In any case, the rate is much less than we saw a couple of weeks ago.

Richard proposed that if a fix is not found soon, we run without compression. It would simplify the job handling procedure and a short file might be correlated with some other less obvious type of data corruption. Justin and Mark both reported that this increases the REST file size by about a factor of two. Richard will continue to work on a fix, but we will not hold-up starting for it.

Mark promised to copy David's compression on/off switch to the trunk from the branch.

Non-Reproducible Results

There has been a lot of activity around this topic. For details find the email traffic here.

Simon checked in a fix for t0 variation in wire-based tracking yesterday.

Mark reported non-reproducibility even after this fix. Sean has not seen problems but did not do a lot of trials. Paul has seen differences in multiple trials on the same smeared data, and found that those results are of only two classes with all results within a class identical. Paul has traced the difference causer to a single event; he has been looking particular file he distributed some time ago. Beni confirms that this event is the culprit.

Work continues on this issue.

Running Jobs at FSU

Aristeidis summarized recent running on the cluster at FSU.

He contrasted two sets of running conditions, one with the branch as of last Friday and the other with the 2.4 tag. Memory usage is low and stable with the branch, but the REST files are of two different sizes. With the tag there is about 10% memory variation job to job, but all files are the same size. The files are also a factor of 2 bigger [no compression?].

Nodes at JLab

Mark described Sandy's summary of the current activity and future potential for addition of nodes to the physics batch farm for our data challenge. One of several numbers: we have 911,000 core-hours "in the bank" due to the "loan" of nodes from the physics batch farm to the LQCD cluster over the past few months.

Electromagnetic Background Update

Kei presented slides describing the current state of his studies on using the data challenge software configuration for the pπ+π-+π-π0 final states. Slides covered:

  • file sizes
  • CPU and virtual memory usage
  • reaction selection
  • properties of reconstructed events
  • momentum vs. polar angle for protons
  • momentum vs. polar angle for π0s
  • effects of kinematic-fit-confidence-level cuts on event spectra and yield

Run number assignments

Mark pointed out recent changes to the run conditions page.

He proposed that we reduce the fraction of events that we aim for from 15% to 5%, and raise the fraction of 1×107 from 70% to 80%. We thought that that was reasonable.

Schedule

We agreed on noon, Thursday March 20 as the deadline for code and configuration changes. Processing at the various sites can then proceed as local schedules permit. In the mean time:

  • email exchanges on relevant work is encouraged
  • we will discuss status at Wednesday's offline meeting, following Paul's suggestion
  • we will hold open the possibility of an ad hoc meeting if needed
  • Curtis asked that sites send CPU resource estimates to Mark by Monday for compilation

Action Items

  1. Implement random-number-seed-storage design in bggen, hdgeant, and HDDM. Richard
  2. Fix the reproducibility issue. All
  3. Investigate the short rest file issue further. Richard
  4. Send resource info to Mark. All site managers/honchos/cognoscenti
  5. Confirm observation of differences in repeated runs with the 2.5 tag. Mark
  6. Put compression on/off switch on the trunk. Mark
  7. Bring David's seed-mcsmear-from-hdgeant-seed scheme to branch. Mark
  8. Update the conditions page to reflect the reduction in high EM event fraction. Mark