EventStore Administration

From GlueXWiki
Revision as of 19:20, 5 February 2015 by Sdobbs (Talk | contribs) (Merging Runs)

Jump to: navigation, search

This page describes procedures used in administering an EventStore installation.

Configuration

Each admin user should have configuration file in $HOME/.esdb.conf, which contains user:password entries and several other setting as illustrated below:

# sample .esdb.conf file
gluex:masterpassword
sdobbs:thisisnotapassword
ESMASTER=EventStore@hallddb:3306:/var/log/mysql

The users and passwords are used to control access to the master MySQL databases - management of SQLite databases is not access controlled. Note that the authentication is performed by the EventStore scripts themselves.

NOTE: authentication will change in upcoming versions

Skims and Event Lists

Adding new data to EventStore

The scripts for EventStore data management in GlueX are located (assuming the root EventStore directory is given by ESBASEDIR):

$ESBASEDIR/src/AdminScripts

Build input files and directories

The batch scripts that drive the indexing and cataloging of data use several text files as inputs:

  • data_location - this specifies the full path to the data files. The glob wildcard "*" can be used to [describe how we use the cache disk]
  • eventstore_location - this directory is where the EventStore files (indices, sqlite DB's, log files) are stored. Nothing else should be stored in these directories - they are deleted whenever the injection script is run.
  • idxa_location - this specifies the mapping between skim name and event list for the given run.

We generate versions of these files for both EVIO and REST data on disk.

The main script for generating these files at JLab is located in misc/build_eventstore_inputs.py. Before you run the script, make sure that the run period and revision are properly set in the script itself, e.g.

RUNPERIOD = "RunPeriod-2014-10"
DATAREVISION = "ver09"

By default, the script generates these files processes all available runs and overwrites any existing files. The script also supports running over a user-defined set of runs. For instance, if processing new runs 3500-3510, the following command line could be used:

./build_eventstore_inputs.py -b 3500 -e 3510

Indexing Runs

setenv EVENTSTORE_OUTPUT_GRADE "recon-unchecked"
setenv EVENTSTORE_WRITE_TIMESTAMP "20150201"
setenv DATA_VERSION_NAME "recon_RunPeriod-2014-10_20150123_ver09"
#
setenv EVENTSTORE_BASE_DIR "/work/halld/EventStore/RunPeriod-2014-10/ver09.1"


Merging Runs

Once the EventStore information for individual runs has been created, we can merge the information for the processed runs into the master DB. The script that performs this is merge.sh.

# example merge.sh settings
export MyWorkDir=/work/halld/EventStore/RunPeriod-2014-10/ver09.1/merge
export MyESDir=/work/halld/EventStore/RunPeriod-2014-10/ver09.1/rest_index
 
export MasterDB=EventStore@hallddb.jlab.org:3307

Notes:

  • MyESDir points to the directory where the sqlite files are, which is conventionally the same as the index files. The script uses find to build a list of the sqlite files. A local gzipped tar of the sqlite files is made in the case that merging fails.
  • MyWorkDir is where several files related to the merging are kept. The number of any failed runs is written to a text file in this directory named failed.lst
  • MasterDB points to the master database. A MySQL DB can be specified, as in the example above, or a SQLite master DB can be used by specifying a file name.

Merging procedure:

  1. Run merge.sh
  2. Check $MyWorkDir/failed.lst
  3. Fix and iterate

Making Data Accessible

Once all the data has been checked and the EventStore metadata created, injected, and merged into the main DB, the data version then can be moved to a readable grade for general use. The script that performs this action is moveGrade.sh. These variables must be properly set:

# example moveGrade.sh settings
export MyDB=EventStore@hallddb.jlab.org:3307
export OldGrade=recon-unchecked
export NewGrade=recon
export MyDataVersionName=recon_RunPeriod-2014-10_20150123_ver09
export OldTime=20150201
export NewTime=20150208
 
export MyLogDir=/work/halld/EventStore/RunPeriod-2014-10/ver09.1/logs

Notes:

  • MyDB should point to the master database that you merged into in the previous step.
  • OldGrade is the grade you injected the data with, NewGrade is the final grade. For a more detailed discussion of the grades used by GlueX, see here.
  • OldTime is the timestamp you injected the data with. NewTime is the timestamp that users will access the data with. Note that there does not have to be an particular relation between these times - NewTime can even be before OldTime, if you want. A classic trick used when processing a dataset incrementally (say, during data taking), is that each group of runs may have a different timestamp when injected into an -unchecked grade, and then moved to the same timestamp as all the rest of the runs.