EventStore Administration

From GlueXWiki
Revision as of 19:10, 5 February 2015 by Sdobbs (Talk | contribs) (Making Data Accessible)

Jump to: navigation, search

This page describes procedures used in administering an EventStore installation.

Configuration

Each admin user should have configuration file in $HOME/.esdb.conf, which contains user:password entries and several other setting as illustrated below:

# sample .esdb.conf file
gluex:masterpassword
sdobbs:thisisnotapassword
ESMASTER=EventStore@hallddb:3306:/var/log/mysql

The users and passwords are used to control access to the master MySQL databases - management of SQLite databases is not access controlled. Note that the authentication is performed by the EventStore scripts themselves.

NOTE: authentication will change in upcoming versions

Skims and Event Lists

Adding new data to EventStore

The scripts for EventStore data management in GlueX are located (assuming the root EventStore directory is given by ESBASEDIR):

$ESBASEDIR/src/AdminScripts

Build input files and directories

The batch scripts that drive the indexing and cataloging of data use several text files as inputs:

  • data_location - this specifies the full path to the data files. The glob wildcard "*" can be used to [describe how we use the cache disk]
  • eventstore_location - this directory is where the EventStore files (indices, sqlite DB's, log files) are stored. Nothing else should be stored in these directories - they are deleted whenever the injection script is run.
  • idxa_location - this specifies the mapping between skim name and event list for the given run.

We generate versions of these files for both EVIO and REST data on disk.

The main script for generating these files at JLab is located in misc/build_eventstore_inputs.py. Before you run the script, make sure that the run period and revision are properly set in the script itself, e.g.

RUNPERIOD = "RunPeriod-2014-10"
DATAREVISION = "ver09"

By default, the script generates these files processes all available runs and overwrites any existing files. The script also supports running over a user-defined set of runs. For instance, if processing new runs 3500-3510, the following command line could be used:

./build_eventstore_inputs.py -b 3500 -e 3510

Indexing Runs

setenv EVENTSTORE_OUTPUT_GRADE "recon-unchecked"
setenv EVENTSTORE_WRITE_TIMESTAMP "20150201"
setenv DATA_VERSION_NAME "recon_RunPeriod-2014-10_20150123_ver09"
#
setenv EVENTSTORE_BASE_DIR "/work/halld/EventStore/RunPeriod-2014-10/ver09.1"


Merging Runs

export MyWorkDir=/work/halld/EventStore/RunPeriod-2014-10/ver09.1/merge
export MyESDir=/work/halld/EventStore/RunPeriod-2014-10/ver09.1/rest_index
 
export MasterDB=EventStore@hallddb.jlab.org:3307


Making Data Accessible

Once all the data has been checked and the EventStore metadata created, injected, and merged into the main DB, the data version then can be moved to a readable grade for general use. The script that performs this action is moveGrade.sh. These variables must be properly set:

# example moveGrade.sh settings
export MyDB=EventStore@hallddb.jlab.org:3307
export OldGrade=recon-unchecked
export NewGrade=recon
export MyDataVersionName=recon_RunPeriod-2014-10_20150123_ver09
export OldTime=20150201
export NewTime=20150208
 
export MyLogDir=/work/halld/EventStore/RunPeriod-2014-10/ver09.1/logs

Notes:

  • MyDB should point to the master database that you merged into in the previous step.
  • OldGrade is the grade you injected the data with, NewGrade is the final grade. For a more detailed discussion of the grades used by GlueX, see here.
  • OldTime is the timestamp you injected the data with. NewTime is the timestamp that users will access the data with. Note that there does not have to be an particular relation between these times - NewTime can even be before OldTime, if you want. A classic trick used when processing a dataset incrementally (say, during data taking), is that each group of runs may have a different timestamp when injected into an -unchecked grade, and then moved to the same timestamp as all the rest of the runs.