Online Data Challenge 2013 Run Plan
From GlueXWiki
- Start-up/shut-down test
- Test that the startup scripts properly launch all L3 and monitoring processes on appropriate nodes
- Verify that shutdown scripts properly kill all L3 and monitoring processes on all nodes
- Low-rate systems check
- Set L3 algorithm to "pass-through"
- Feed events into EB's ET at 100Hz
- Verify rate through L3 farm to RAID disk maintains 100Hz
- Verify monitoring farm histograms full 100Hz rate
- Check CPU loads on Ganglia
- 1kHz system check
- Set L3 algorithm to "pass-through"
- Feed events into EB's ET at 1kHz
- Verify rate through L3 farm to RAID disk maintains 100Hz
- Kill L3 process on 2 nodes and verify system does not crash and other nodes increase rate
- Kill et2et program on Monitoring server and verify that rate to disk is unchanged
- low-luminosity system check
- Set L3 algorithm to "pass-through"
- Feed events into EB's ET at 20kHz
- If rate to RAID disk is less than 20kHz then identify bottleneck
- If monitoring rate is less than 20kHz then identify bottleneck
- Profile the detector system plugins using janadot plugin
- If rate to RAID is able to maintain 20kHz, kill L3 processes until rate drops
- L3 algorithm check
- Set L3 algorithm to "random-reject" with rejection set to 50%
- Feed events into EB's ET starting at 100Hz
- Verify rate to disk is 50Hz
- Increase rate to 20KHz
- Verify rate to disk is 10kHz
- Set L3 algorithm to "physics" (Justin's algorithm)
- Feed events into EB's ET starting at 1KHz
- Verify rate to RAID and histogram rate
- Kill L3 process and verify system does not crash
- Restart all L3 processes and increase rate to 20kHz
- Verify that L3 pass-through rate increases as L3 nodes are killed and remaining nodes are unable to apply algorithm to all events
- RootSpy archiver test
- Set L3 algorithm to "physics" (Justin's algorithm)
- Feed events into EB's ET at highest rate monitoring system can keep up with (up to 20kHz)
- Stop event feed
- Signal archiver of end of run
- Feed events into EB's ET again using a different run number
- Verify that archiver closes first file(s) and opens new one(s) for new run
- RAID to tape silo test