GlueX Offline Software Meeting
Wednesday, January 14, 2015
1:30 pm EST
JLab: CEBAF Center F326/327

Agenda

Announcements
1. Volatile disk expanded: reservation 10 -> 20 TB, quota 30 -> 50 TB
2. Marty Wise working on Run Conditions (Control?) Database (RCDB)
3. Computer Center has RHEL7 available for beta testers
4. Work disk full
Review of minutes from January 7 (all)
Data Challenge 3
Software Review Preparations
Commissioning Run Review:
1. Offline Monitoring Report (Kei)
  1. Ran over all files (online plugins, 2-track EVIO skim, REST) 2 weeks ago
  2. Next launch is this Friday
  3. Will be testing EventStore to mark events
  4. Quick update on CentOS65, multithread processing
2. Commissioning-branch-to-trunk migration (Simon)
3. Handling changing magnetic field settings (Sean)
4. Analysis of REST file data (Justin)
5. Data Management (Sean)
  1. Storing software information in REST files
  2. EVIO format definition for Level 3 trigger farm
  3. EventStore: implementation plan
Requests to SciComp on farm features (Kei)
1. Tools to track jobs:
  1. tools to track what percentage of nodes were being used by whom at a given time, preferably in both # of jobs and threads.We can see the pie charts for example in http://scicomp.jlab.org/scicomp/#/auger/usage but would like the information in a form that we can easily access and analyze.
  2. what % of nodes are currently available for each OS at a given time
  3. tools to track the life time of each stage of the job, such as sitting in queue, waiting for files from tape, running, etc.
  4. Would it be possible to make the stdout and stderr web-viewable?
  5. If possible, can you add the ability to search by “job name” (every job that includes the search term) in the auger custom job query website?
2. For more general requests:
  1. better transparency for whether there are problems in the system, such as heavy traffic due to users, broken disks, etc. Could there be an email list/webpage for that information?
  2. clarification of how 'priority' of jobs works between different halls and users.
  3. would it be possible for the system to auto-resubmit failed jobs if the failure is on the side of the system (e.g., bad farm nodes, temporary loss of connection)?
3. Additionally, ask for more space on cache disk?
HDDM versions and backward compatibility
Action Item Review

Communication Information

Remote Connection

The BlueJeans meeting number is 968 592 007 .
Join the Meeting via BlueJeans

Slides

Talks can be deposited in the directory /group/halld/www/halldweb1/html/talks/2015 on the JLab CUE. This directory is accessible from the web at https://halldweb1.jlab.org/talks/2015/ .

GlueX Offline Meeting, January 21, 2015

Contents

Agenda

Communication Information

Remote Connection

Slides

Navigation menu

Views

Personal tools

Navigation

Search

Tools