Difference between revisions of "Online Monitoring Shift"

From Hall D Ops Wiki
Jump to: navigation, search
Line 10: Line 10:
 
Events will be transported across the network via the ET (Event Transfer) system developed and used as part of the DAQ architecture. The configuration of the processes and nodes are shown in Figs 1 and 2. Fig. 1 shows the simpler case when no L3 algorithm is being applied and only one monitoring farm is needed. Fig. 2 shows the more complicated case of when a Level-3 (L3) trigger algorithm is actively rejecting events. In this case, both the pre-L3 and post-L3 event streams must be monitored to help record what is being discarded by the algorithm.  
 
Events will be transported across the network via the ET (Event Transfer) system developed and used as part of the DAQ architecture. The configuration of the processes and nodes are shown in Figs 1 and 2. Fig. 1 shows the simpler case when no L3 algorithm is being applied and only one monitoring farm is needed. Fig. 2 shows the more complicated case of when a Level-3 (L3) trigger algorithm is actively rejecting events. In this case, both the pre-L3 and post-L3 event streams must be monitored to help record what is being discarded by the algorithm.  
  
== Procedures for Shift Workers ==
+
== Routine Operation ==
  
 
=== Starting and stopping the Monitoring System ===
 
=== Starting and stopping the Monitoring System ===

Revision as of 07:31, 6 June 2014

The Online Monitoring System

Fig. 2. Online Monitoring Architecture when a Level-3 trigger is inactive. A single "L3" process will still be present and operating in pass-through mode. Note that monitoring is done on the "post-L3" stream to allow the algorithm to set flags in the data stream indicating that pass-through mode was used.
Fig. 2. Online Monitoring and L3 Architecture when a Level-3 trigger is active.


The Online Monitoring System is a software system that couples with the Data Acquisition System to monitor the quality of the data as it is read in. The system is responsible for ensuring that the detector systems are producing data of sufficient quality that a successful analysis of the data in the offline is likely and capable of producing a physics result.

Events will be transported across the network via the ET (Event Transfer) system developed and used as part of the DAQ architecture. The configuration of the processes and nodes are shown in Figs 1 and 2. Fig. 1 shows the simpler case when no L3 algorithm is being applied and only one monitoring farm is needed. Fig. 2 shows the more complicated case of when a Level-3 (L3) trigger algorithm is actively rejecting events. In this case, both the pre-L3 and post-L3 event streams must be monitored to help record what is being discarded by the algorithm.

Routine Operation

Starting and stopping the Monitoring System

The monitoring system should be automatically started and stopped by the DAQ system whenever a new run is started or ended (see Data Acquisition for details on how to do that.) Shift workers may start or stop the monitoring system by hand if needed. This should be done from the hdops account by running either the start_monitoring or stop_monitoring script. These scripts may be run from any gluon computer since they will automatically launch multiple programs on the appropriate computer nodes. To check the status of the monitoring system run the status_monitoring program. A summary is given in the following table:

Program Action
start_monitoring Starts all programs required for the the online monitoring system. WARNING: This will kill any existing monitoring processes before restarting them.
stop_monitoring Stops all monitoring processes
status_monitoring Gives status of the monitoring system processes

Viewing Monitoring Histograms

Live histograms may be viewed using the RootSpy program. Start it from the hdops account on any gluon node. It will communicate with all histogram producer programs on the network and start cycling through a subset of them for shift workers to monitor. Users can turn off the automatic cycling and select different histograms to display using the GUI itself.

Program Action
RootSpy Starts RootSpy GUI for viewing live monitoring histograms


Advanced Details of the Monitoring System

The online monitoring consists primarily of generating numerous histograms that can be viewed by shift takers or analyzed automatically by macros to check the data quality. The system is therefore comprised of histogram producers and consumers.

Producers

These are produced by a set of plugins, each representing a different detector or online system. The plugins are attached to processes running on multiple computers in the counting house. The nodes used will vary depending on whether the DAQ is configured to run a L3 trigger and how many nodes are required by the algorithm being run. The node names will be in the pool specified as "L3" in the list maintained on the HallD Online IP Name And Address Conventions page of the GlueX wiki. The monitoring processes will be started and killed automatically by the DAQ system via scripts attached to state transitions.

The definitions of the histograms are ultimately the responsibility of the detector or online system experts.

Consumers

The primary consumer of the histograms will the RootSpy system. This has both a GUI interface for shift-takers to monitor and an archiver that can be used to store histograms in files for later viewing. To start the viewer, simply type "RootSpy" from the command line in the hdops account. The RSArchiver program is a command line tool used to gather histograms from the RootSpy producers and archive them in a ROOT file. This file will be copied automatically by a DAQ system script to the RAID disk alongside the raw data so that it is stored on tape with the data.


Expert personnel

The individuals responsible for the Online Monitoring are shown in following table. Problems with normal operation of the Online Monitoring should be referred to those individuals and any changes to their settings must be approved by them. Additional experts may be trained by the system owner and their name and signature added to the document residing in the Hall D Counting House.

Table: Expert personnel for the Online Monitoring system
Name Extension Date of qualification
David Lawrence 269-5567 May 28, 2014