Difference between revisions of "Online Data Challenge 2013 Run Plan"

From GlueXWiki
Jump to: navigation, search
 
Line 45: Line 45:
 
#* Feed events into EB's ET again using a different run number
 
#* Feed events into EB's ET again using a different run number
 
#* Verify that archiver closes first file(s) and opens new one(s) for new run
 
#* Verify that archiver closes first file(s) and opens new one(s) for new run
 
 
#:
 
#:
 
# RAID to tape silo test
 
# RAID to tape silo test

Latest revision as of 14:52, 20 August 2013

  1. Start-up/shut-down test
    • Test that the startup scripts properly launch all L3 and monitoring processes on appropriate nodes
    • Verify that shutdown scripts properly kill all L3 and monitoring processes on all nodes
  2. Low-rate systems check
    • Set L3 algorithm to "pass-through"
    • Feed events into EB's ET at 100Hz
    • Verify rate through L3 farm to RAID disk maintains 100Hz
    • Verify monitoring farm histograms full 100Hz rate
    • Check CPU loads on Ganglia
  3. 1kHz system check
    • Set L3 algorithm to "pass-through"
    • Feed events into EB's ET at 1kHz
    • Verify rate through L3 farm to RAID disk maintains 100Hz
    • Kill L3 process on 2 nodes and verify system does not crash and other nodes increase rate
    • Kill et2et program on Monitoring server and verify that rate to disk is unchanged
  4. low-luminosity system check
    • Set L3 algorithm to "pass-through"
    • Feed events into EB's ET at 20kHz
    • If rate to RAID disk is less than 20kHz then identify bottleneck
    • If monitoring rate is less than 20kHz then identify bottleneck
    • Profile the detector system plugins using janadot plugin
    • If rate to RAID is able to maintain 20kHz, kill L3 processes until rate drops
  5. L3 algorithm check
    • Set L3 algorithm to "random-reject" with rejection set to 50%
    • Feed events into EB's ET starting at 100Hz
    • Verify rate to disk is 50Hz
    • Increase rate to 20KHz
    • Verify rate to disk is 10kHz
    • Set L3 algorithm to "physics" (Justin's algorithm)
    • Feed events into EB's ET starting at 1KHz
    • Verify rate to RAID and histogram rate
    • Kill L3 process and verify system does not crash
    • Restart all L3 processes and increase rate to 20kHz
    • Verify that L3 pass-through rate increases as L3 nodes are killed and remaining nodes are unable to apply algorithm to all events
  6. RootSpy archiver test
    • Set L3 algorithm to "physics" (Justin's algorithm)
    • Feed events into EB's ET at highest rate monitoring system can keep up with (up to 20kHz)
    • Stop event feed
    • Signal archiver of end of run
    • Feed events into EB's ET again using a different run number
    • Verify that archiver closes first file(s) and opens new one(s) for new run
  7. RAID to tape silo test