Difference between revisions of "CMU Data Challenge 2"

From GlueXWiki
Jump to: navigation, search
Line 12: Line 12:
 
#* 9002 Series - 875    5E7 with EM Background (10k Events Each) : 8.75 MEvents : 0 failures  
 
#* 9002 Series - 875    5E7 with EM Background (10k Events Each) : 8.75 MEvents : 0 failures  
 
#* 9003 Series - 525    without EM Background  (50k Events Each) : 26.15 MEvents : 2 failures:
 
#* 9003 Series - 525    without EM Background  (50k Events Each) : 26.15 MEvents : 2 failures:
#** 1 Job lost to the aether (likely pbs fail)
+
#** 09003_0000014: lost to the aether (no record of it) (likely pbs fail)
 
#** 09003_0000392: timed-out ~9-10k events into hdgeant (96 hrs)
 
#** 09003_0000392: timed-out ~9-10k events into hdgeant (96 hrs)

Revision as of 11:54, 14 April 2014

  1. At CMU we are using 12 boxes, each with 4 8-core AMD Opteron Processors (32 cores per box). Each box has 64GB of physical memory. Data are being written to a local RAID disk. Jobs are manage by PBS (torque and maui).
  2. All 384 cores are reserved for the data challenge for three weeks.
  3. Did not switch to optional version.
  4. Start-up Problems
    • All jobs were initially reading from the same copy of sqlite, resources, and hdds, instead of having their own copies.
    • Large-cluster configuration problems slowed our start. Resolved by tuning PBS parameters to control the rate at which pbs_mom talked to the head node.
    • Still battling a scheduler issue. Work-around has been found.
    • Running smoothly since ~Tuesday.
  5. Final Tally: 7000 jobs:
    • 9001 Series - 5600 1E7 with EM Background (25k Events Each) : 139.87 MEvents, 1 failure:
      • 09001_0000136: DMagneticFieldMapFineMesh::GetFieldAndGradient()
    • 9002 Series - 875 5E7 with EM Background (10k Events Each) : 8.75 MEvents : 0 failures
    • 9003 Series - 525 without EM Background (50k Events Each) : 26.15 MEvents : 2 failures:
      • 09003_0000014: lost to the aether (no record of it) (likely pbs fail)
      • 09003_0000392: timed-out ~9-10k events into hdgeant (96 hrs)