Guide to roll-your-own python hddm transforms

From GlueXWiki
Revision as of 13:37, 24 July 2018 by Jonesrt (Talk | contribs) (example 3: writing Monte Carlo events)

Jump to: navigation, search

At one time, if you wanted to catenate two similar hddm files, you might do it with "hddmcat", or maybe with "hddmcp", or "hddmcp_c". Or if you have been in the collaboration long enough, you might remember our old friend "hddm_merge_files". Each of these utilities had a specific purpose at one time, no two of them are quite identical. There are others that do slightly different things, such as "hddm-xml", "xml-hddm", "hddm_cull_events", etc. The purpose of this page is to convince you that ALL of these are obsolete, now that the hddm python module is available. Using the python shell, you can perform a much more general set of actions on hddm input files, including merging, splitting based on event inspection, and general transforms of the event content. All of this is illustrated with examples below.

First, you need to make sure that your shell environment is properly configured for using the standard GlueX software stack. To verify this, try the following command at your linux shell prompt. If this command returns without an error, you are all set to go. If it complains about the hddm_s module not being found, you need to find the appropriate setup_env script for your login shell, source it and try again.

python -c 'import hddm_s'

example 1: merging files

In the first example, we learn how to catenate two hddm files named "t1.hddm" and "t2.hddm" into a single output file "tcat.hddm" with a single command.

python -c 'import hddm_s;fout=hddm_s.ostream("tcat.hddm");[[fout.write(rec) for rec in hddm_s.istream(fin)] for fin in ("t1.hddm","t2.hddm")]'

This is more readable if we save it to a file and write it on multiple lines, as follows.

import hddm_s
fout=hddm_s.ostream("tcat.hddm")
fout.compression = hddm_s.k_z_compression
for fin in ("t1.hddm","t2.hddm"):
  for rec in hddm_s.istream(fin):
    fout.write(rec)

In this example, I introduced one new feature, the ability to turn on compression in an output stream simply by assigning one of the predefined constants to the compression attribute. If you save the above program to a file x.py and execute it as "python x.py" then it will produce a compressed version of tcat.hddm that you created earlier with the one-liner above. Clearly I can extend the number of input files by adding to the list ("t1.hddm", "t2.hddm"). Alternately, I can specify a list of input files on the command line and replace "("t1.hddm", "t2.hddm") with sys.argv[1:]. You would also need to add "import sys" to the file header.

If you are a python user then you know that the command "pydoc hddm_s" gives you a ready-made man page-style documentation on all of the object types and functions provided by the hddm_s module. The "_s" in hddm_s stands for simulation hddm files. For working with REST files, you would use hddm_r. Other than the one-letter difference, the usage is same.

example 2: splitting files

Suppose you have generated a large bggen sample, and you want to split it into separate datasets depending on how many pi0's there are in each event. The following example does this by looking in the Monte Carlo record at the head of each event and counting the number of pi0's that it finds there. The particle types and their corresponding integer codes are listed in the header file particleType.h in the GlueX standard includes. If you change the "t1.hddm" to the name of your input hddm file and run it, it will write out separate files with names like "sample_1pi0.hddm", "sample_5pi0.hddm", etc., according to how many pi0s there were in the event. Note that if your Monte Carlo event file contains unstable particles that decay into pi0's then where an event ends after the split may depend on whether you run it on the file upstream or downstream from hdgeant/hdgeant4.

import hddm_s
pi0 = 7
fout = {}
for rec in hddm_s.istream("t1.hddm"):
  npi0=0
  for prod in rec.getProducts():
    if prod.type == pi0:
      npi0 += 1
  if not npi0 in fout:
    fout[npi0] = hddm_s.ostream("sample_{0}pi0.hddm".format(npi0))
  fout[npi0].write(rec)

example 3: writing Monte Carlo events

Suppose you have your own event generator that you want to use with hdgeant/hdgeant4, but the output from the generator is in plain text. This example provides a bare-bones starting point for writing your own converter to translate from whatever ascii format your generator uses to the hddm format that is recognized by hdgeant/hdgeant4. to be continued

example 4: reordering your events

example 5: stripping out bits of hit information

example 6: mining data, making histograms