GlueX Data Challenge Meeting, April 4, 2014
GlueX Data Challenge Meeting
Friday, April 4, 2014
11:00 am, EDT
JLab: CEBAF Center, F326
Connection Using Bluejeans
- To join via Polycom room system go to the IP Address: 22.214.171.124 (bjn.vc) and enter the meeting ID: 531811405.
- To join via a Web Browser, go to the page  https://bluejeans.com/531811405.
- To join via phone, use one of the following numbers and the Conference ID: 531811405
- US or Canada: +1 408 740 7256 or
- US or Canada: +1 888 240 2560
- More information on connecting to bluejeans is available.
- Status reports from sites
- CMU: Paul Mattione, Curtis Meyer
- FSU: Volker Crede, Priyashree Roy, Aristeidis Tsaris
- IU: Kei Moriya
- JLab: Mark Ito (chair), Sandy Philpott, Simon Taylor
- MIT: Justin Stevens
- NU: Sean Dobbs
- UConn: Richard Jones
- Sean went through his email describing a Python script to compare monitoring_hists with standard distributions. Justin had tried it with success. Justin also noted that a look at the thrown beam photon energy plot gives a quick check that the number of events in the job is correct. The script is now linked from the conditions page.
Data Challenge 2 Event Tally Board
We took a quick look at the tally board we are currently up to 1.5 gigaevents.
Richard brought us up to speed on the OSG effort.
We are still in amber mode, as opposed to green, on the OSG. Some throttling of our jobs is being done. Still we have seen a peak of 10,000 cores devoted to this data challenge, as shown on a recent graph of running jobs for GlueX from the UConn OSG/GlueX status site. Richard also showed us a plot of "idle" jobs, i. e., those queued for running. They show an effect where jobs are accepted and quickly fail for some sites where the installed run-time libraries are incompatible with our software stack. Richard is going through and eliminating these types of problems. When we have a configuration consistent with all contributing sites, we will ask to be flipped to green. That should happen over the next few days.
UConn is contributing about 400 cores to the OSG currently. Northwestern is contributing about 250 cores.
A rough estimate of the event count from the OSG already puts it at about that of all of the other sites combined thus far.
We agreed that until production turns on fully on the OSG, we will continue in our current mode at the other sites and re-evaluate plans when the situation with the OSG changes.
We had brief reports on production at the various sites.
Mark showed the latest plot of number of running jobs versus time.
Sandy gave a clarification of the core count on the batch farm. We now have a total of roughly 3400 cores total. The Hall D share is 2400. The nominal core count for the farm as a whole is 1400. The surplus comes from the loan from LQCD.
Justin took us through his wiki page.
The activity at MIT related to this data challenge, both there and on FutureGrid, will be reported at the OSG all-hands meeting.
Paul reports that all is well in Pittsburgh, with 20 million events produced in the last two days. Production continues smoothly.
Aristeidis reports that production continues, with 5 million since Wednesday. He suspects that progress should be faster and is looking into possible impediments.
Core-Hour Credit at Jefferson Lab
We discussed how to spend our credit with the LQCD farm. Currently we have used about one third of the million core-hours. SciComp has told us that a late summer data challenge would mesh with a seasonal slack period on the LQCD cluster, and that at time a fresh loan to us would be easy to do. They are fine with us burning it all for this data challenge. Curtis thought that a tape-resident-data driven challenge should be started sooner, in which case the credit might come in handy. We will have to revisit this issue as the picture becomes more clear.