Difference between revisions of "GlueX Data Challenge Meeting, December 17, 2012"

From GlueXWiki
Jump to: navigation, search
(Minutes: notes only)
(Minutes)
Line 47: Line 47:
 
Present:
 
Present:
  
* '''CMU''':  
+
* '''CMU''': Paul Mattione
* '''JLab''':  
+
* '''JLab''': David Lawrence, Yi Qiang, Elton Smith, Simon Taylor, Beni Zihlmann
 
* '''UConn''':
 
* '''UConn''':
  
Line 55: Line 55:
  
 
paul
 
paul
david, simon, yi, elton, mark
+
david, simon, yi, elton, mark, beni, dmitry
 
richard
 
richard
  

Revision as of 19:58, 17 December 2012

GlueX Data Challenge Meeting
Monday, December 17, 2012
1:30 pm, EDT
JLab: CEBAF Center, F326/327

Agenda

  1. Announcements
  2. Minutes from last time
  3. Data Challenge 1 status
    1. JLab
    2. Grid status
    3. CMU status
  4. Shutdown plan (or continuation plan?)
  5. Work list for post DC-1 period
    1. file archiving
    2. file distribution
    3.  ???
  6. Thoughts on DC-2
    1. What?
    2. How much?
    3. When?

Meeting Connections

To connect from the outside:

Videoconferencing

  1. ESNET:
    • Call ESNET Number 8542553 (this is the preferred connection method).
  2. EVO:
    • A conference has been booked under "GlueX" from 1:00pm until 3:30pm (EST).
    • Direct meeting link
    • To phone into an EVO meeting, from the U.S. call (626) 395-2112 and then enter the EVO meeting code, 13 9993
    • Skype Bridge to EVO

Telephone

  1. Phone: (should not be needed)
    • +1-866-740-1260 : US and Canada
    • +1-303-248-0285 : International
    • then use participant code: 3421244# (the # is needed when using the phone)
    • or www.readytalk.com
      • then type access code 3421244 into "join a meeting" (you need java plugin)

Minutes

Present:

  • CMU: Paul Mattione
  • JLab: David Lawrence, Yi Qiang, Elton Smith, Simon Taylor, Beni Zihlmann
  • UConn:

data challenge meeting notes 12/17/12

paul david, simon, yi, elton, mark, beni, dmitry richard

3.4 billion events on grid some time correcting problems spared hazzards with crashes

mcsmear, reproduce hang take seeds and re-run on second try files look identical cause of hangs, deadlock due to exceeding 30 second time-out holds mutex lock hangs occur in mcsmear

24 hour jobs partial file, no files

jobs finished quickly 2-3% crashing resubmit on failure multiplie submimission, up to 30 changed to allow failed jobs to fail

submission node crashed, replaced with bigger memory machine peak out at 7k jobs running at once other host: user scheduler, maintains a daemon for each job, needed more memory srm that receives the results coming back, 20 TB of disk robust 100 MB, fills GB pipe

100 million events and go back to debug the code

10% being used right now only one person

archive all files to JLab tape library logs, histos, rest

distribution: ship all rest files to UConn, access via srm have all files spinning at JLab

SURA grid,

skims

srm plug-in

grid certificate, collaboration wide archive

set faujlts in hdgeant jana hangs relaunch random seed


--end of note--