MantisBT - JANA
View Issue Details
0000298JANAFeature Requestpublic2012-12-20 08:272013-01-22 15:39
davidl 
davidl 
normalminorN/A
resolvedfixed 
0000298: Need list of mutexes to unlock in thread_HUP_sighandler
If a thread happens to hold a mutex lock when it is killed by an HUP signal then the replacement thread (or any other) will gridlock. (The HUP signal is sent by the main thread when the processing thread exceeds the timeout specified for processing a single event.) This occurred during the GlueX data challenge occasionally, apparently due to events taking a long time to write to disk (network issues?).

The fix is to have JANA maintain a list of mutex pointers (registered by user code) that can be unlocked in the thread_HUP_sighandler routine. The mutexes will need to be of type ERRORCHECKING so they behave correctly if the mutex is not locked by us when we attempt to unlock it. Ideally, this will be checked at the time the mutex is registered so an error can be issued.
The original problem manifest in mcsmear because it implemented its own mutex for serializing writing to the output file. This problem will exist for all output systems since they will all need to implement a similar mutex. It may be worth considering re-implementing the JEventSink base class to standardize this (??)
No tags attached.
Issue History
2012-12-20 08:27davidlNew Issue
2013-01-22 15:39davidlNote Added: 0000453
2013-01-22 15:39davidlStatusnew => resolved
2013-01-22 15:39davidlResolutionopen => fixed
2013-01-22 15:39davidlAssigned To => davidl

Notes
(0000453)
davidl   
2013-01-22 15:39   
Added data members to hold mutexes and callback routine pointers to be called whenever a SIGHUP signal is received in the default signal handler (defined at top of JEventLoop.cc). Also added additionl test to the thread_relaunch unit test to verify the callback is working.