Online task list for 2011

From GlueXWiki
Jump to: navigation, search

FY2011 Activity Schedule for Online Computing

The table below contains activities from the 12GeV project schedule in the Online Computing section which have work scheduled for FY2011. Detailed descriptions for the activities are kept at the bottom of the page and can be jumped to by clicking the short description in the table.

A breakdown of each activity into smaller tasks is maintained in an Excel file in the SVN repository:

Activity Line Activity Name Man-weeks Names of people Comments
1532025 Plan Front-End Software 16.5 D. Lawrence, D. Abbott, B. Moffit
1532030 Plan DAQ Software Event Unblocking 9 D. Lawrence, D. Abbott, B. Moffit
1532030a Plan DAQ Software Scripts 6 GlueX-doc-1876
1532030b Plan DAQ Software Run Control 5  ? + V. Gyurjyan Mostly Vardan
1532030c Plan DAQ Software Code Management 4
1532035 Plan Monitoring Framework 6 Wolin + ?
1532035a Plan Monitoring Scalers 3 Wolin + H. Egiyan
1532035b Plan Monitoring Histograms 4 D. Lawrence GlueX-doc-1721
1532035c Plan Monitoring Remote 3
1532035d Plan Monitoring Hardware Status 4
1532035f Plan Monitoring Process Status 3
1532035g Plan Monitoring Trigger 2 S. Somov
1532040 Plan Alarm Sys 6 Universities are expected to contribute more
1532045 Plan Archiving DAQ Configuration 3
1532045a Plan Archiving Run Info 5 GlueX-doc-1894
1532045b Plan Archiving Controls 5 H. Egiyan + N. Gevorgyan
1532050 Plan Event Display 2 Universities are expected to contribute more
1532055 Plan Storage Management 11.3 Computer Center can help GlueX-doc-1893
1532060 Plan Experiment Controls Framework 4 H. Egiyan + V. Gyurjyan
1532060a Plan Experiment Controls Display Management 3 H. Egiyan + ?
1532060b Plan Experiment Controls Backup/Restore 3 H. Egiyan + ?
1532060c Plan Experiment Controls Magnet PS 4 H. Egiyan + ?
1532060d Plan Experiment Controls HV 3 H. Egiyan + N. Gevorgyan
1532060f Plan Experiment Controls LV 4 H. Egiyan + N. Gevorgyan
1532060g Plan Experiment Controls Motors 4 H. Egiyan + ?
1532060h Plan Experiment Controls Gas Systems 4 H. Egiyan + ?
1532060j Plan Experiment Controls Temperature 4 H. Egiyan + ?
1532060k Plan Experiment Controls Target 5 H. Egiyan + ?
1532060n Plan Experiment Controls Interface with DAQ 3 H. Egiyan + V. Gyurjyan Mostly Vardan
1532065 Trigger Board Initialization 27 S. Somov + Electronics Group Two-year duration
1532035 Level 1 Verification/debugging 24 S. Somov + Electronics Group Two-year+ duration

Descriptions of Scheduled Activities

Plan Front-End Software

This will plan the Hall-D specific details of configuring and maintaining the software used in the front-end electronics and trigger system in the hall. This includes where the CODA 3 configurations will be kept (disk resident XML files, database, ...?), and how we will revert to previous configurations or implement new ones.

This will also include plans for how the translation table needed for the offline will be interfaced with the online. Specifically, if the DAQ system detects module types automatically, how/where it will record these for use in parsing by both the online monitoring system and the offline systems.

Because the online systems can be very sensitive to configuration details, access to changes should probably limited to certain individuals. This plan should address how access to deployed system configurations will be limited to ensure integrity of the DAQ system.

Finally, physical board distribution should be reviewed/planned using knowledge of hit rates from Monte Carlo. The integrated bandwidth of each crate should be estimated to make sure there are none with high concentrations that exceed (or approach) the limits of the hardware.

Note that this includes software for all front-end boards, including DAQ boards, trigger boards, discriminators, scalers, etc.

Plan DAQ Software Event Unblocking

In production running the events will arrive entangled meaning all of the fragments of a single event will not appear in a single, contiguous memory section. Rather, the fragments will be mixed with fragments from other events and must be disentangled (or unblocked) to get a single event that may be analyzed. This will have to be done for monitoring as well as for L3 event filtering where the ability to save or discard a single event will be required.

This activity will provide a plan for how and where the events will be disentangled (EB, L3/monitoring farm, offline code base, ...?) This will include how the single events will be passed on to the CODA 3 Event Recorder for writing to disk/tape.

Estimates of CPU/memory/bandwidth resources required will be included so they may be added into the overall requirements for the Hall-D online computing resources.

Plan DAQ Software Scripts

Plan for general organization of scripts used as part of the Hall-D online systems. This will include the languages (python, perl, bash, ...) used for the command-line, batch-mode, cron-job, and GUI scripts.How the scripts will be maintained, and editing access restrictions will be included.

Plan DAQ Software Run Control

Plan for implementing the CODA 3 Run Control in the Hall-D online systems. This will include how the configuration will be maintained and how access to editing the configuration will be restricted. Ability to access Run Control from the counting house, the experimental hall, and via a remote, secure connection (for on-call maintenance) will be required. How that will be done while minimizing risk of disrupting operations will be addressed.

The CODA 3 Run Control will have many more user configurable features. The plan should include a suggested list of features that may be included in the Hall-D implementation of Run Control.

Plan DAQ Software Code Management

A plan for maintaining the online code base. This will include compiled programs, scripts, and configuration files that comprise the online software systems. This will include a choice of code management system and where it will be hosted. How this integrates with the offline software code-base which will very likely be used as a basis for the L3 event filter will be addressed.

A build system and directory structure will be included in the plan.

Plan Monitoring Framework

The substantial number of independent monitoring subsystems developed for Hall D need to be coordinated and results presented to operators in a coherent way. Further, the monitoring system must interact with other independent systems such as the alarm system, archiving system, control system, etc. An overall strategy and architecture must be developed to ensure transparent interoperation among all these systems.

The histogram monitoring system will run on a mini-farm of computers. A farm management system needs to be developed to allow integration of the farm in the DAQ run control system.

The IRMIS system needs to be investigated as to whether it is a good choice for capturing information concerning cables and wiring, location, power and control for all aspects of the Online system. A prototype IRMIS system needs to be created to determine this.

Electronic logbook, notebook, knowledge database, documentation/manual database, bug tracking and issue tracking needs must be evaluated and software chosen to meet those needs.

Computer system management needs must be evaluated and an appropriate monitoring/alarm/process management system chosen.

Plan Monitoring Scalers

Scaler information generated by the trigger, DAQ and other systems must be extracted from hardware, then monitored, analyzed and presented to operators and other automated monitoring systems as appropriate. Analyzed and raw scaler information must further be archived, and for critical scalers, archived in multiple places for redundancy. Scaler information in the data stream may need to be diverted into separate data streams for ease of access by the Offline group, and some scaler data may need to be entered into databases.

Finally, alarms need to be generated when automated analysis programs find problems in the scaler data.

Plan Monitoring Histograms

Events taken by the DAQ system must be continuously monitored for quality. The histogram monitoring system must extract a sample of events from the DAQ in real-time, analyze them, generate histograms, then present the information to operators and to other automated monitoring systems. The histograms must be archived periodically, and a reset mechanism must exist to clear histograms e.g. at the beginning of a new run. The system must also be able to read and analyze events from a file and operate independently of the system monitoring events in real-time.

Currently the RootSpy framework, developed within the Offline group but with the Online in mind, appears to be the best foundation for event histogram monitoring.

Finally, alarms need to be generated when automated analysis programs find problems in the histogram data.

Plan Monitoring Remote

A large fraction of detector hardware and some online software is being developed by collaboraters from other institutions, and they need to be able to monitor performance of their systems from off site. A system needs to be developed to allow them access to almost all information available (EPICS, histograms, alarms, archives, etc.) to shift personnel, but in a way that satisfies JLab cyber security requirements. In some cases remote collaboraters may need to take control of DAQ and other control systems to diagnose and repair problems in their systems.

Plan Monitoring Hardware Status

A large amount of detector hardware must be monitored for health during hall operations, beyond what is done by the EPICS-based control system. Hardware may inject status information periodically into the data stream, and processes must extract this information, archive it, and present it to operators. Other information will need to be proactively extracted fromt the hardware at appropariate times and in such a way as to not interfere with the high-speed DAQ system. And some information will only be extracted during special runs or calibrations procedures. And of course action must be taken or alarms must be generated when problems are detected.

This system must be designed to handle a large variety of disparate hardware while minimizing the amount of special programming required and must avoid compromising fast DAQ and other common operations.

Plan Monitoring Process Status

A large number of processes running on a large number of computers in the counting house need to be started, stopped and monitored during operations. These processes run under widely varying conditions. E.g. some need to be started at boot time and run continuously, others just during data taking, others just under special conditions.

The existance and health of all these processes needs to be continuously monitored in real time. Alarms need to be generated in case of failed processes, and if operator action is not required they can be restarted automatically. The monitoring system must be highly and easily configurable as the critical process list will change fairly often.

Plan Monitoring Trigger

The state-of-the-art high-speed Hall D trigger system must be monitored at all times for proper operation. This includes extraction and monitoring of scaler and data generated by the trigger hardware. This data must be analyzed and compared to expectations based on understanding of the physics involved and the trigger programming. Alarms must be generated and operators notified if problems are detected.

Plan Alarm Sys

The alarm system is foundational for transmission of information to operators and other systems concerning problems detected by online software. It must accept alarm inputs from a wide variety of programs monitoring a large number of disparate systems. Operators need the ability to view alarms in time sequence and/or priority order, and must be able to acknowlege alarms so they no longer appear on critical alarm screens. Alarms must further be "shelved" for some length of time for known problems that cannot be solved quickly. Alarm history must be preserved and be easily viewed.

Overall alarm system design is best described in "Alarm Management: Seven Effective Methods for Optimum Performance" by Hollifield and Habibi. The SNS EPICS alarm system was designed according to many of the principles in the book, and currently seems to be best choice for use in Hall D.

Plan Archiving DAQ Configuration

Separate DAQ configurations will be required for normal data taking and for numerous special and calibration runs. These configurations must be archived for use by offline analysis programs. This includes which trigger is loaded in the hardware, which hardware is participating in the DAQ run, which calibration constants are loaded into the front-end modules, etc.

Plan Archiving Run Info

Offline analysis requires precise information on many details of how runs were taken, hardware and software configurations, and summaries of information gathered during the run. Information in this system must be made available to operators at the time it was taken, as well as years later to analysis groups looking at the raw data.

Plan Archiving Controls

The control system constantly monitors a wide variety of types of hardware. Much of the data collected needs to be archived for later playback by operators, and occasionally for use by offline analysis programs. In some cases years worth of data must be kept, although the time granularity will vary on a case-by-case basis.

The SNS EPICS archiver currently seems to be the best choice for Hall D.

Plan Event Display

Form a plan for the online event display. This display is expected to be running continuously in the counting house to provide a quick visual of individual events being read in from the DAQ. It will also be used to replay events to monitor data integrity and to help debug the DAQ system. The graphics package used and what features the event display must have will be included in the plan. How the event display will interface with the DAQ system to get events will also be addressed.

Plan Storage Management

Plan for online data storage from the DAQ and online systems. This will include hardware systems (raid disks?) to hold the data and how it will be transferred to the Computer Center for permanent storage. Bandwidth requirements for the disk will be included as it may need to support quick replay analysis while still acquiring data. Slow controls values critical for data replay will also need to be copied into long term storage, possibly alongside the event-level data so if/how that is done should be addressed.

Plan Experiment Controls Framework

Hall D control system will incorporate components based on EPICS, PLC and possibly other types of control systems, and very likely to be integrated with AFECS framework. Eventually we need to have a coherent structure which can accommodate all these subsystems and which can be an integral part of the Hall D online software system. This includes developing a directory structures, code management, and in some cases "makefile" system, which would be usable for all types of applications. The first step for achieving this goal is to create a few small test applications to evaluate different approaches and to chose the best one. We will also need to check interfaces between different types of control systems, such as EPICS-PLC, Labview-PLC, and EPICS-Labview interface. The following tasks will need to be completed for this activity line:

  • Design a directory structure and a makefile scheme.
  • Create an EPICS IOC application and compile the libraries and executables withing that directory scheme.
  • Study the possibility and practicality of code management system for PLC programs and HMIs.
  • Create a small PLC-based test application and interface it with PLC.
  • Thoroughly test PLC-EPICS interface to identify possible problems and ways to mitigate them.
  • Create a small Labview program and interface it with EPICS and evaluate the reliability of such a combined system.
  • Incorporate the aforementioned applications into AFECS framework.
  • Define guidelines for most efficient interface between different types of control systems.

Plan Experiment Controls Display Management

Slow controls system in Hall D will require a single Display Management framework to monitor and control different components in the Hall. A careful study needs to be done to identify the requirements for different components of the controls system and monitoring. Also we will need to evaluate different existing display management systems which are easy to interface with EPICS and to select the best Display Management system matching Hall D needs. The following are the main tasks needed to accomplish this activity and create an plan of action for the next two years:

Plan Experiment Controls Backup/Restore

After intentional shutdown, rebooting or during power outage some of the controlled systems may loose their current values of process variables. This may lead to wrong initialization values on the startup, and possible serious problems resulting in loss of beam time. As Hall D controls system makes use of EPICS, PLC and possibly other systems, a backup and restorations strategy needs to be found for each of the types. The following are the main tasks needed to accomplish this activity and create an plan of action for the next two years:

  • Study existing backup/restore option for EPICS-based applications.
  • Design a framework for configuring backing and restoring the large number of PVs.
  • Create a prototype EPICS application requiring backing and restoring variables, and thoroughly test it the chosen framework.
  • Study how PLC-based applications restore the value of its control tags, and how we could configure the applications such that the desired values are restored.
  • Design a framework for configuring backing and restoring the large number of tags.
  • Create a prototype PLC application requiring backing and restoring variables, and thoroughly test it the chosen approach.
  • Study the needs for backup/restore for other type of control systems used in the hall, such as LabView.

Plan Experiment Controls Magnet PS

According to the current plan Hall D and its tagger hall will house five relatively large magnets (solenoid, tagger magnet, pair spectrometer, sweeping magnet, quadrupole magnet) which will require remote control of the power supply. Solenoid magnet control system is not included in this activity because it will be provided by the group testing the solenoid. Quadrupole magnet control software is supposed to be provided by the accelerator division. Each of the other three power supplies need to be examined, if the specs available or present at the Lab, in order to determine what kind of control system it will require. Some of these power supplies may be provided by other groups from JLab, and the existing control system will need to be evaluated. The required hardware and software will be identified and detail work plan will be created.

Plan Experiment Controls HV

There is a large number of PMT-based detectors in Hall D, and they need to be remotely controlled. According to the current plan some HVs will be provided from CAEN SY1527 mainframes using A1535N, A1535SN and A1535P cards, and the the HV for FCAL will be controlled using CAN-bus based system. All hardware and firmware needs to be studied to find best options to integrate these HV systems into the Hall D slow controls . This will include the choice of hardware and the set of drivers required for the corresponding implementation. A prototype set of HV GUIs needs to be designed that will allow monitoring and changing the HV setting for various groups of detectors. These screens will need to be designed in such a way that will allow for generation of a large number of them based on different configuration of the detector. We also need to determine how the alarm conditions from a very large number of channels need to be generated from the HV systems to be integrated into the overall Hall D alarm system.

Plan Experiment Controls LV

There is a large number of detectors in Hall D, such as the drift chambers, BCAL SiPMs and tagger microscope SiPMs, which require application of low voltage or bias voltage (LV) to them. They LVs need to be remotely monitored and controlled. The hardware for each of these detector components needs to be studied to determine the best option of integrating them into the experimental control system of Hall D. We will need to determine which of these pieces of hardware with the associated firmware needs a newly developed EPICS interface. A prototype set of LV control GUIs needs to be designed which will allow monitoring and changing the LV setting for various groups of detectors. These screens will need to be designed in such a way that will allow for generation of a large number of them based on different configuration of the detector. We also need to determine how the alarm conditions from a very large number of channels need to be generated from the LV systems to be integrated into the overall Hall D alarm system. Some of these GUI-related effort will benefit from a similar work done during development of HV GUIs.

Plan Experiment Controls Motors

In the current plan for Hall D there are following remotely movable systems in Hall D:

  • Goniometer (5 axis)
  • Tagger harp (1 axis)
  • Tagger dump harp (1 axis)
  • Tagger microscope (4 axis)
  • Collimator table (1 axis)
  • Converter/photon harp (1 axis)
  • TAC (1 axis) ?
  • Gamma profiler (1 axis) ?

Each of these application, except the goniometer, needs to be designed and integrated into overall controls system. Development of the goniometer controls is included in a different activity line, but it would be nice to have a coherent framework.

Plan Experiment Controls Gas Systems

The CDC and FDC detectors will be using working gases mixed locally in the gas shed near Hall D. A prototype of such a gas mixing and delivery system will be designed and built by the tracking working groups within a year. Development of a control system for this prototype will be beneficial in evaluating various option for the final gas system. Based on this experience we can decide which type of hardware and software would be most appropriate and we can draft a plan for the final gas system.

Plan Experiment Controls Temperature

Temperature monitoring will be needed for BCAL and FCAL detector systems. At this point there are no details designs of these systems, but they are currently being developed. Close coordination between the online software and detector groups is necessary to develop a plan for the temperature control and monitoring systems.

Plan Experiment Controls Target

JLab target group will design and built the Hall D cryotarget. The cryogenics and the hydrogen/deuterium lines of this system needs to be controlled. Currently there is no detail plan for the target system hardware, and Hall D slow controls group needs coordinate the design and construction of the control system with the target group to be able to draft a plan the efforts for this system.

Plan Experiment Controls Interface with DAQ

Plan for generating the Hall-D specific configuration for interfacing the experiment controls with the DAQ. AFECS (part of CODA 3) will include support for full experiment controls which will be leveraged by the Hall-D online system. This will include checks on various non-DAQ online systems by the DAQ system to help ensure data integrity. How the configurations will be maintained and how access to their modification will be restricted will be addressed in the plan.

Trigger Board Initialization

Level 1 Verification/debugging

Possible Contributors from Other Groups and Their Expertise

  • Plan Front-End Software
    • Dave A, ROC design, readout libraries
    • Bryan M, ROC design, readout libraries
    • Vardan G, run control, automating readout lists
  • Plan DAQ Software Event Unblocking
    • Carl T, event builder design
  • Plan DAQ Software Scripts
    • Vardan G, interface to run control
  • Plan DAQ Software Run Control
    • Vardan G, run control
  • Plan DAQ Software Code Management
  • Plan Monitoring Framework
  • Plan Monitoring Scalers
    • Ed J, hardware
    • William G, hardware
  • Plan Monitoring Histograms
  • Plan Monitoring Remote
  • Plan Monitoring Hardware Status
  • Plan Monitoring Process Status
  • Plan Monitoring Trigger
    • Ed J, hardware
    • William G, hardware
    • Ben R, hardware
    • Hai D, hardware
  • Plan Alarm Sys
  • Plan Archiving DAQ Configuration
    • Vardan G, run control
  • Plan Archiving Run Info
  • Plan Archiving Controls
  • Plan Event Display
  • Plan Storage Mngmnt
  • Plan Experiment Controls Framework
  • Plan Experiment Controls Display management
  • Plan Experiment Controls Backup/Restore
  • Plan Experiment Controls Magnet PS
    • Nerses
  • Plan Experiment Controls HV
    • Nerses, EPICS drivers
  • Plan Experiment Controls LV
    • Nerses, EPICS drivers
  • Plan Experiment Controls Motors
    • Nerses, EPICS drivers
  • Plan Experiment Controls Gas Systems
    • Nerses, EPICS drivers
  • Plan Experiment Controls Temperature
    • Nerses, EPICS drivers
  • Plan Experiment Controls Target
    • Target group, target design
  • Plan Experiment Controls Interface with DAQ
    • Vardan G, AFECS

Possible tasks for DAQ group and others

  • self-configuring readout list with database storage of crate configuration
  • farm manager CODA component
  • disentangler
  • top-level AFECS experiment control supervisor and GUI
  • data file in disentangled raw event format
  • CLARA-enable DANA framework
  • convert Hall D facilities into CLARA services
  • readout/control libraries for Hall D specific modules (do we have any?)
  • implement EPICS driver using ASYN framework, e.g. for Anagate CAN gateway (Nerses?)
  • implement EPICS JavaIOC and driver using Port Driver framework (Nerses?)
  • design and prototype conditions database (Dmitry?)
  • other tasks that do not need DAQ expertise, just good programming skills