Planning for The Next GlueX Data Challenge

From GlueXWiki
Revision as of 16:22, 16 July 2012 by Cmeyer (Talk | contribs) (Created page with "__TOC__ ==Introduction== This document will provide planning information for the second GlueX data challenge which is planned for early 2013. ==Earlier Data Challenges== The firs...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

This document will provide planning information for the second GlueX data challenge which is planned for early 2013.

Earlier Data Challenges

The first data challenge was carried out in 2011 [1], and studied the reaction γp → π+π−π+n using the full GlueX simulation [2] and reconstruction software [3]. The study included the full pythia background for about 3.5 hours of 108 running, with the appropriate level of the 3π signal injected into the data stream. One of the limitations on the size of this first challenge was the available disk storage at the remote sites (Connecticut and Indiana) where the work was carried out.

Goals of The Next Data Challenge

The primary goal of the next data challenge is to test as many aspects of the GlueX analysis chain as possible using a large-enough data set to push things close to what will actually be encountered in processing real GlueX data in 2016. As such, we anticipate that the initial data set would be at least an order of magnitude larger than that used in the earlier challenge. • Check the large-scale batch processing of a large GlueX data set. • Implement monitoring tools to adequately handle the large-scale production. • Test schemes for accessing large amounts of processed data for analysis. • Finalize the DST format for GlueX. • Test tools (grid-ftp) for moving large amounts between JLab and outside sites. • Develop data analysis tools and frameworks that utilize the DSTs. In this regard, it is probably useful to break the work into two separate components. One carried out mostly at JLab to gain experience and resolve issues associated with the large-scale batch processing. The second can be a remote-site challange where events are simulated, and then reconstructed to the DST level. Only the DST-level events would then be available.