NU DC2 Tests

From GlueXWiki
Revision as of 01:34, 10 February 2014 by Sdobbs (Talk | contribs) (Created page with "* I've been running jobs simulating 10K events using the [https://halldweb1.jlab.org/data_challenge/02/conditions/data_challenge_2.html same package versions] as Mark described ...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
  • I've been running jobs simulating 10K events using the same package versions as Mark described in the meeting on Friday.
  • The machines have 0.75 - 1.5 GB/core of memory.
  • There are no resource limits
  • I've gotten to a success rate of >50% (the exact number is uncertain since I was staging some of the intermediate files on disks local to the nodes, which would fill up sometimes).
  • Nearly all failures happened at the REST stage, and were usually due to a thread taking too long and being killed. I've increased the thread timeout to 90s, and this seems to help.
  • The REST processes do get up to 1.5-2 GB in size
  • The failed jobs do seem consistent with either hitting some events that take very long to reconstruct or being resource starved. I'm going to see what I can find out about the events on which the jobs die.
  • I'm also running jobs simulating 50K events to more closely reproduce Mark's results.

--Sdobbs 00:34, 10 February 2014 (EST)