A Version Management System for GlueX GlueX-Doc-2793-v16
A Version Management System for GlueX
GlueX-Doc-2793-v16
Mark M. Ito
Jefferson Lab
May 8, 2019
Abstract:
A system for building and managing GlueX software is
described. The goal is to insulate the user from the need to the
master details of building each of several software packages as well
as from the details of setting up a working environment. Multiple
versions of each of several packages can be maintained
simultaneously. Particular combinations of package versions can be
specified succinctly in an XML configuration file and this file can
be used both to guide a complete build of all needed packages and to
set up the shell environment to use the resulting build.
There are several software components that are needed to build and use
GlueX software. Most of them are assumed to be provided by the native
operating system or distribution, but there are some that have to be
built by the GlueX user. They are:
build_scripts: scripts to manage building and the shell environment
Xerces-C: for reading XML files
CERNLIB: to support GEANT 3 simulations
GEANT4: simulation engine
ROOT: general purpose HENP toolkit
EVIO: CODA format data handling library
CCDB: Calibration Constants Database
RCDB: Run Conditions Database
JANA: event-based analysis framework
HDDS: detector geometry specification library
sim-recon: simulation and reconstruction for GlueX
Detailed description of these packages will not be given here; please
see the GlueX Offline Software wiki page for more information.
There are in general multiple versions (releases) of each of these
packages and it is often convenient to have access to more that one
version of a package built and available for use. In addition some
packages depend on one or several others for libraries and include
files.
3 The Directory Structure
The VMS directory structure supports multiple versions of each
package. For an example see Fig. . In the
figure, ``gluex_top'' is a generic name, each installation may choose
a different directory name. VMS looks for the name of this directory
in the environment variable GLUEX_TOP.
Figure:
The directory structure.
Under gluex_top, each package has its own container directory ( e. g., JANA, hdds, sim-recon) and for each package container
directory one or more specific versions of that package are built.
Many of the scripts and makefiles described in this note require that
the environment variable BUILD_SCRIPTS be defined and point to
an instance of this directory.
Facility is provided for setting environment variables necessary both
for building the software and for using it. Both Bourne-shell-like and
C-shell-like shells are supported, but real testing has only been done
with bash and tcsh. In the following all examples will be appropriate
for bash. Note that whenever a script like foo.sh is mentioned, there
is also a foo.csh in the build-scripts directory.
5.1 Low-Level Environment Set-Up: gluex_env.(c)sh
The gluex_env.(c)sh script will define all environment
variables needed to run and build GlueX software. It takes as input
the home directories of each package as found in the environment
variables listed in Table .
Table:
Packages and their home directories.
Package
Home Directory Variable
Xerces-C
XERCESCROOT
CERNLIB
CERN
LAPACK/BLAS
see Section
Geant4
GEANT4_HOME
ROOT
ROOTSYS
EVIO
EVIOROOT
CCDB
CCDB_HOME
RCDB
RCDB_HOME
JANA
JANA_HOME
HDDS
HDDS_HOME
sim-recon
HALLD_HOME
Given the home directory of a package, there may or may not be other
environment variables that need to be set, those variables derived
from the value of home. gluex_env.(c)sh takes care of this. For
example, XERCES_INCLUDE is used in the build system and must be
defined as $XERCESCROOT/include. Directories containing
binaries must be added to the PATH variable and similarly for
LD_LIBRARY_PATH, and PYTHONPATH. For these path
variables directories are always added at the front with any
pre-existing directories maintained on the list.
If GLUEX_TOP is defined in advance, the pre-defined value will
be used. If not it will be defined as /usr/local/gluex.
If BUILD_SCRIPTS is defined in advance, again the pre-defined
value will be used. If not defined it will be defined as $GLUEX_TOP/build_scripts.
If any of the home directories are defined before sourcing gluex_env.(c)sh, those values will be respected. If any are not
defined, then a default will be provided, usually the prod
directory in the package container directory. Because of this
behavior, the user can define as many or as few of the home
directories as desired in advance of sourcing gluex_env.(c)sh,
letting the script finish the environment settings keying off of the
user definitions (or lack thereof). The user is thus only responsible
for setting the values of desired home directories. A side effect of
this behavior is that the environment that results is
non-deterministic in the sense that the result depend on the values
of pre-existing home directory variables. Different initial conditions
will give different environments.
The final step in gluex_env.(c)sh is to check the resulting
environment for consistency using the prerequisites system. Each home
directory is checked for a prerequisites version set file. Those files
list versions of prerequisite packages used at build time. The
build-time version are checked against the versions used in the
just-set-up environment and warnings are printed when mismatches are
detected. See Section for the
details.
Since gluex_env.(c)sh is sensitive to definitions hanging
around in the environment, there is a script provided that will undo
all GlueX-related definitions: gluex_env_clean.(c)sh. Sourcing
it will eliminate unintended consequences from previously made
definitions. For the path variables the script only removes the
GlueX-related elements leaving all others present in the path.
The most common reason to have a custom script is when you want to use
a package that is outside the standard directory structure. Since gluex_env.(c)sh will respect a pre-defined value of any of the
home directories, this can be done without making a private version of
gluex_env.(c)sh. See Fig. for an
example. Here a version of sim-recon built in a non-standard location
(the user's home directory) will be used in the resulting environment.
Figure:
Example of a custom set-up script. The build of sim-recon in
the user's home directory will be used. All other packages be set up
to use with their default builds under /home/gluex/gluex_top. Note that GLUEX_TOP and BUILD_SCRIPTS are defined explicitly rather than letting them
default.
Another way to get a non-default environment is to use the versioning
system to set home directory locations which in turn are respected by
gluex_env.(c)sh. The versioning system is described in detail
in Section , but an example set-up script is
shown in Fig. . In this example, package version
information is contained in the XML file my_versions.xml in the
user's home area. An example version set file is shown in Fig. . Alternate combinations of package versions can be tried by making alternate versions of the version set file.
Figure:
Example of a set-up script driven by a private version
file. See Section for the format of the
file.
At JLab there is script that packages the steps shown in Fig. , with the appropriate values of GLUEX_TOP and BUILD_SCRIPTS. It uses the latest version of version.xml. It can also be used as an example for using a custom version of version.xml. To use it, for bash:
Each of the packages have their own native build system and each build
system has its own set of details that have to be understood. In
addition, the technology used to do the build varies from system to
system. It may be make, imake, cmake, SCons, or something else. The
VMS system makes a choice of build options for each package so that
the user need not master these details.
The VMS build system is implemented in GNU Make. These makefiles
invoke the native package-specific build system. The top-level
makefile builds packages into the standard directory structure
described in Section . It in turn uses a
``package
makefile'' for each package (e. g., Makefile_jana,
Makefile_sim-recon). Invoking make with a package makefile will
build that package with the home directory placed in the current
working directory. In other words, the package makefiles have no knowledge of
the directory structure within which they are used; they just build
locally. Only the top-level makefile knows about the directory structure.
The all target builds every package that Makefile_all knows
about. The gluex target builds only the packages necessary for
using GlueX software. The gluex_jlab target is the same as gluex
except that it does not include cernlib_build (useful for JLab public
builds where we use the community-built versions of CERNLIB).
Makefile_all should always be invoked from the $GLUEX_TOP directory.
Each of the individual package targets (e. g., evio_build
and hdds_build) use the corresponding package
makefile. Directories are created and the package makes are executed
in way that gives the directory structure described in
Section . To do this, the package container
directory is created if it does not exist and the requested package
makefile is invoked from within the package container directory.
Of course, each individual package build target can be invoked
directly. More on this in Section
The target cernlib_debug is non-standard. If this target is
invoked, it well create a separate container directory cernlib_debug for the debug versions. Also it does not have its
own package makefile, rather on 64-bit machines, it invokes Makefile_cernlib_Vogt with command line options that cause
appropriate debug compiler flags to be used.
To use the resulting debug version of CERNLIB, the CERN variable
must be set to point to the cernlib_debug directory, either explicitly as an environment variable (as described in Section or by using the home attribute in the package element of a version set file (as described in Section .
6.2 The Package Makefiles
Each of the package makefiles is sensitive to environment variables
that control which version of the package to build. The makefiles
themselves take care of obtaining the source code. In general, there
are two ways to get the code: downloading a tarball or checking the
code out from a version control repository, although the later option
is not available for all packages.
In each case there is standard system for distributing tarballs marked
with the version name. Each package has different conventions, but the
package makefiles have that knowledge of the appropriate convention
coded in. Also, the name of home directory created depends on the name
that appears in the tarball (with exceptions as mentioned in
Section . Note that the version variable
can be set on the make command line as well.
Some packages can be checked out from a Subversion or Git
repository. If that is the desired source of the code, the version
variable should not be set. Instead a URL variable should be used to
specify the location of the repository. For example:
will cause the HDDS package makefile to check out the master branch of
the HDDS Git repository at GitHub. The names of the variables for
other packages are JANA_URL (Subversion), CCDB_URL
(Git), RCDB_URL (Git), and SIM_RECON_URL (Git)
respectively.
For packages that use a Git repository, there are two additional
variables that can be used to control the checkout. SIM_RECON_BRANCH is used to check-out a specific branch and SIM_RECON_HASH is used to check-out a specific commit of
sim-recon. If one is set, the other should not be. There are analogous
variables for CCDB, RCDB, and HDDS.
6.2.3 Extra Features for Specific Packages
SCons Options for Makefile_sim-recon
For the sim-recon package, there is a variable, SIM_RECON_SCONS_OPTIONS that can be defined, either in the shell
environment on on the make command line, that will supply optional
arguments to the scons command invoked by make. For example,
make -f $BUILD_SCRIPTS/Makefile_sim-recon \
SIM_RECON_SCONS_OPTIONS='SHOWBUILDS=1'
will cause SCons to show the compiler commands explicitly.
Building LAPACK/BLAS
The LAPACK and BLAS libraries are needed by CERNLIB. They
are downloaded and build automatically, but rather than being
installed in their own home directory, The ``install'' target of Makefile_lapack adds them to the lib directory of your CERNLIB
build.
After setting the desired values of the version environment variables
and/or the URL environment variables (see
Section you can invoke Makefile_all with the target(s) needed or with a high-level target
like gluex,
cd $GLUEX_TOP
make -f $BUILD_SCRIPTS/Makefile_all gluex
If some of the versions of individual packages requested already
exist, then make will do the usual thing: try to remake them and find
that there is nothing to do.
Since the individual package makefiles build in the local directory,
they can be used directly by going to the appropriate container
directory. For example,
cd $GLUEX_TOP/sim-recon
make -f $BUILD_SCRIPTS/Makefile_sim-recon SIM_RECON_VERSION=1.4.0
Note that in this example the version is specified on the make command
line rather than through an environment variable. That is not
necessary; it is an option supported by make and defining SIM_RECON_VERSION in the environment would work as well. Also
note that doing the build in the sim-recon container directory is not
necessary for the build to succeed; any directory will work. In this
example however we are adding to an existing standard directory
structure so we cd to the standard directory.
The versioning system uses an XML-formatted version set file to specify
both package version information and package home directory definition
in the shell environment. An example file is shown in
Fig. .
Figure:
An example version set file. version_1.7.xml is shown.
There is only one type of element, the package. Attributes are:
name: The name of the software package.
version: The version number of the package.
url: A URL to be used to checkout (Subversion) or clone
(Git) the code. The URL should point to an appropriate repository.
branch: When using a Git repository, the branch to be
checked-out.
hash: When using a Git repository, the hash of the commit to be checked-out.
dirtag: A string (directory tag) to be appended to the
standard directory name of the package when it is built.
home: Force the location of the package home directory when setting up the environment.
As we saw in Section ,
environment setting via gluex_env.(c)sh is sensitive to the
definition of the package home variables. In
Section , we saw that the package
makefiles are sensitive to either the version-defining environment
variables or the URL-defining environment variables, using them to choose the
version of code to build. The version set file can be used to define both
classes of variables. In this way it can be used to both build a
consistent set of packages and to set-up the environment to use the
build. Executing
$BUILD_SCRIPTS/version.pl version_1.7.xml
where for the purposes of this example version_1.7.xml is the
file shown in Fig. and GLUEX_TOP is
/home/gluex/gluex_top, creates the output shown in
Fig. .
Figure:
Output of $BUILD_SCRIPTS/version.pl.
Since you would want these commands applied to the current shell level, in practice you use
eval `$BUILD_SCRIPTS/version.pl version_1.7.xml`
Following this step, one normally would invoke gluex_env.(c)sh
to complete the set-up of the environment.
In this example, the variable definitions come (mostly) in pairs, a
version variable and a home directory variable. The version variable
affects only the build process since the corresponding package
makefile keys off it (see
Section ). The home directory variable
affects the build as well in that it tells the package makefile where
to find any prerequisite packages and in addition it affects use of a
build via its effect on path variables.
Finally, the script gluex_env_version.(c)sh combines use of
version.pl and gluex_env.(c)sh to more conveniently set
up the environment. We have already seen an example of its use in
Fig. . The script uses version.pl as
shown above to set the stage for an invocation of gluex_env.(c)sh.
We saw in Section that a URL variable
can be used to instructed the package makefiles to get the source code
from a version control repository rather downloading a tarball. The
url attribute in the package element calls out the value of the
URL to use directly. In a particular package element, either the
version attribute or the url attribute should
appear; if both appear then the version attribute will
be used (i. e., tarball).
If the url attribute is used, each package makefile
will interpret the URL as is appropriate for that package, either as a
Subversion repository or a Git repository; there can be only one
answer and it is coded into the package makefile. For Git
repositories, the optional branch attribute controls which
branch is checked out. If it is absent, the master branch is
used. For example, to clone sim-recon and checkout the branch
test_stuff, use
This will cause the SIM_RECON_BRANCH variable to be set in
the environment. Similarly the hash attribute can be used to
specify the hash of the particular commit to be checked-out.
Note that for Subversion repositories, the branch specification is
encoded in the URL itself and the branch and hash attributes are ignored.
7.4 Directory Tags
The dirtag attribute can be used to distinguish different builds
of a package where the only difference between them is the version(s)
of one or more prerequisite packages. The string used is arbitrary. A
directory tag can be attached to either a source directory made from a
tarball or one from a source code repository. The tag name is appended
after a caret symbol (^), for example,
in a version set file would cause version.pl to add an additional
variable to the environment:
setenv HDDS_DIRTAG xerces_test
and Makefile_hdds would then produce a directory named
hdds-3.3^xerces_test, with source code obtained from the standard
tarball, hdds-3.3-src.tar.gz in this case.
The corresponding home directory variable will also reflect the
directory tag, of course.
There are a lot possible meanings for the directory tag. It could mark
different combinations of prerequisites as well as designating
packages where the source code does not come from a standard
source (tarball or repository). Because of the large number of
possibilities, the form of the tag string is left to the user; no
assumption is made about it meaning.
Often one wants to use a build of a specific package that lies outside
of the standard directory structure. This can be put into the
environment by setting the home attribute of the corresponding
package element. For example
Note that this feature is mainly useful for creating an environment
for use; when building it (a) gives no guidance on where the source
code should come from and (b) does not cause the build to be done in
the named directory. It is useful when a pre-built package needs to be
referenced for the current task.
The names of the files are of two main types, (1) version sets that
correspond to a periodic package update release and (2) version sets
that correspond to a ``launch'' over GlueX data.
File names for package update releases are of the form
version_i.j.k.xml, where i, j, and k are a
major version number for the version set, a minor version number,
and a sub-minor version number. For example, version_4.1.0.xml.
File names for launches are of the form launch-name_i.xml, where launch-name corresponds to the name
assigned to the launch and i is a version number. For example
recon-2018_08-ver00_1.xml
8 The Prerequisites System
Each package may or may not have a build dependency on other packages
under the VMS. For example a particular version of sim-recon can be
built against any of a number of versions of HDDS, including custom
versions provided by the user. To insure that the environment being
set-up has a consistent set of package versions, a facility is
provided to warn the user if possible inconsistencies are detected.
At build time, a version xml file is created in the home directory of
a package if that package has dependencies on others in the
system. For example, $HALLD_HOME will have the file
sim-recon_prereqs_version.xml, listing the versions used to build
sim-recon. An example is shown in Fig.
Figure:
An example of sim-recon_prereqs_version.xml.
At set-up time, when gluex_env.(c)sh is invoked, if a version
file with prerequisites is found in the package home directory, then
each package in that file is checked for version consistency. A match
is sought between the version number specified in the version set file and
the version number encoded in the home directory for the prerequisite
package, i. e., the directory defined as home in the environment
being set-up. Here the version from the home directory is extracted in
two ways depending on how the package was built:
Tar File. If the source code came from a tar file, then
the version number is parsed out of the name of the home directory.
Subversion Check-Out. If the source was checked out of a
subversion repository, the svn info command is used to get the
name of the subversion directory checked out and the version is
parsed from that directory name.
Git Clone and Check-Out. If the source code was
checked-out from a Git repository, the git remote -v command
is issued and the URL is parsed from the ``fetch'' line. The branch
is obtained from the git status command. If the prerequisite
file does not contain a branch specification, then the version check
will require the master branch to have been checked out. At this
writing, version checking for specific commit hashes has not been
implemented.
If a version mismatch is found, a warning is written to the screen.
There are cases where the home directory does not contain any
information about the source of its source code. For example, the code
could have come via the svn export command or the git archive command. If a such a build (for example HDDS) is a
prerequisite of another package (for example, sim-recon), then the
dependent package (sim-recon) will usually have listed the
prerequisite (HDDS) with neither a version nor a url
attribute defined in its (sim-recon's) prerequisite file. In that
case, a warning will be issued noting the absence of both version and url attributes.
The gluex_install system uses build_scripts to create a
complete install of GlueX software from scratch. This is especially
useful for new machines.
No interaction from the user should be required to get a successful
build. The only assumption made is that the basic packages that come
in a minimal install are present. The definition of minimal
depends on the installation. In all cases, the distribution was tested
by first installing from a DVD or CD ISO image. Typically, the ``live
DVD'' version was chosen since that installs the smallest number of
packages.
The scripts have been tested on the following distributions listed in
Table .
System Update. It is recommended that you update your
system to the latest versions of of all system supplied
software. For RedHat-like distributions you do a ``yum update''. For
Debian-like systems you do a ``apt-get update''.
Get the Scripts. A tar file with the scripts described
here is available at
Prerequisites: gluex_prereq_distribution.sh. The
prerequisites script installs packages from the distribution
repository necessary for the GlueX build. As such, it must be
executed by root. In addition it makes some symbolic links in system
directories that are necessary for the cernlib build. These scripts
are specific to particular distributions. You must run this
script from inside the ``gluex_install'' directory created when you
get the scripts (see Get the Scripts, step 2).
Install: gluex_install.sh. Creates a directory,
``gluex_top'', in the current working directory to house the build,
sets up an environment, downloads all source files, and builds all
libraries and executables needed to run GlueX software. The install
assumes a directory structure that accommodates multiple versions of
the GlueX packages if they are needed later. The script is
distribution independent.
After the build is complete, there are two files in the gluex
directory, setup.sh and setup.csh, that can be used to set-up the
complete GlueX environment under Bourne-like shells or C-like shells
respectively.