GlueX and the OSG, Meeting on Resource Contribution, March 31, 2017
From GlueXWiki
Present: Sean Dobbs (Northwestern), Paul Eugenio (Florida State), Rob Gardner (Chicago), Ken Herner (Fermilab), Richard Jones (UConn), Zisis Papandreou (Regina), Suchandra Thapa (Chicago)
Contents
Introduction
- Richard: described this meeting's genesis, discussion at last OSG all-hands meeting. Rob volunteered to help bring up University-based clusters on the OSG.
- Current deployment from UConn and Northwestern about 1,000 cores, but GlueX could use more.
- Rob: Adding resources easier now than it used to be.
- Want to instantiate Host Compute Elements. Local cluster admins shielded from a lot of the details.
- Jobs from GlueX would have priority on our resources, demand could overflow onto non-GlueX OSG resources.
- Local admin work: how software is expected to be handled
- CernVM File System [?] consensus choice for software distribution
- Squid service recommended to cache software access.
- Suchandra will collect info about resources from local admin
- Some requirements
- outbound IP connectivity
- RHEL6, CentOS6, Scientific Linux 6
- RHEL7 support in near future
- Batch systems supported: PBS, Condor, Slurm, Sun Grid Engine
- SSH login for Suchandra
- Nodes need not be completely uniform (memory, cores, etc.)
Potential resource contributing GlueX institutions
- University of Connecticut (Richard Jones)
- Experience with several generations of OSG job submission systems, Compute Element already running.
- Docker containers: Richard asked about feasibility. Rob: several sites using these already. Use has not been formalized and documented. Ken: by the time GlueX starts in earnest this could be more fully developed.
- Northwestern University (Sean Dobbs)
- Compute Element currently running also. 2 to 3 hundred cores. Brought on in context of GlueX data challenges.
- Carnegie Mellon University (Curtis Meyer)
- Indiana University (Matt Shepherd)
- Florida State University (Paul Eugenio)
- Access to several hundred cores and 100 TB of disk from HMP Grid Resources at FSU. Good network connectivity (Florida Lambda Rail)
- Will have new faculty hire in this area
- Florida International University (Jorge Rodriguez)
- Running a CMS Tier 3 currently, fully integrated with OSG, extensive experience.
- University of Regina (Zisis Papandreou)
- Currently involved with Compute Canada (WestGrid). Prospective participation at some future date.
OSG Submit Host at JLab
- submit host recently installed
- Scientific Computing at JLab did installation in consultation with OSG experts (i. e., Edgar)
- log-in with JLab credentials for authorized users
Using centrally managed on-campus resources
- Paul's Department has buy-in to HPC machine on campus. Can that be contributed?
- Rob: has been done, e. g., ATLAS using Blue Waters at NCSA. Involves customization.
- Such sites often suffer from "scheduling capacity" problems. Large collaborations can provide back-fill jobs and can withstand preemption.
Milestones
- Richard:
- need to get started before Fall run becomes a distraction
- need to demonstrate demand for new Submit Host at JLab
- Sean:
- Will need simulation campaign to support large amount of data just taken (Spring 2017 run)
- Aim for early July for having significant capacity online.
Next Steps
- GlueX: identify site to go first with Hosted Compute Element instantiation
- Site Admin will contact Suchandra to get technical process rolling
- Points of Contact:
- OSG: Rob
- GlueX: Mark
- Next Meeting: Friday, April 28