GlueX and the OSG, Meeting on Resource Contribution, March 31, 2017

From GlueXWiki
Jump to: navigation, search

Present: Sean Dobbs (Northwestern), Paul Eugenio (Florida State), Rob Gardner (Chicago), Ken Herner (Fermilab), Richard Jones (UConn), Zisis Papandreou (Regina), Suchandra Thapa (Chicago)

Introduction

  • Richard: described this meeting's genesis, discussion at last OSG all-hands meeting. Rob volunteered to help bring up University-based clusters on the OSG.
    • Current deployment from UConn and Northwestern about 1,000 cores, but GlueX could use more.
  • Rob: Adding resources easier now than it used to be.
    • Want to instantiate Host Compute Elements. Local cluster admins shielded from a lot of the details.
    • Jobs from GlueX would have priority on our resources, demand could overflow onto non-GlueX OSG resources.
    • Local admin work: how software is expected to be handled
      • CernVM File System [?] consensus choice for software distribution
      • Squid service recommended to cache software access.
    • Suchandra will collect info about resources from local admin
    • Some requirements
      • outbound IP connectivity
      • RHEL6, CentOS6, Scientific Linux 6
      • RHEL7 support in near future
      • Batch systems supported: PBS, Condor, Slurm, Sun Grid Engine
      • SSH login for Suchandra
      • Nodes need not be completely uniform (memory, cores, etc.)

Potential resource contributing GlueX institutions

  • University of Connecticut (Richard Jones)
    • Experience with several generations of OSG job submission systems, Compute Element already running.
    • Docker containers: Richard asked about feasibility. Rob: several sites using these already. Use has not been formalized and documented. Ken: by the time GlueX starts in earnest this could be more fully developed.
  • Northwestern University (Sean Dobbs)
    • Compute Element currently running also. 2 to 3 hundred cores. Brought on in context of GlueX data challenges.
  • Carnegie Mellon University (Curtis Meyer)
  • Indiana University (Matt Shepherd)
  • Florida State University (Paul Eugenio)
    • Access to several hundred cores and 100 TB of disk from HMP Grid Resources at FSU. Good network connectivity (Florida Lambda Rail)
    • Will have new faculty hire in this area
  • Florida International University (Jorge Rodriguez)
    • Running a CMS Tier 3 currently, fully integrated with OSG, extensive experience.
  • University of Regina (Zisis Papandreou)
    • Currently involved with Compute Canada (WestGrid). Prospective participation at some future date.

OSG Submit Host at JLab

  • submit host recently installed
  • Scientific Computing at JLab did installation in consultation with OSG experts (i. e., Edgar)
  • log-in with JLab credentials for authorized users

Using centrally managed on-campus resources

  • Paul's Department has buy-in to HPC machine on campus. Can that be contributed?
  • Rob: has been done, e. g., ATLAS using Blue Waters at NCSA. Involves customization.
    • Such sites often suffer from "scheduling capacity" problems. Large collaborations can provide back-fill jobs and can withstand preemption.

Milestones

  • Richard:
    • need to get started before Fall run becomes a distraction
    • need to demonstrate demand for new Submit Host at JLab
  • Sean:
    • Will need simulation campaign to support large amount of data just taken (Spring 2017 run)
  • Aim for early July for having significant capacity online.

Next Steps

  • GlueX: identify site to go first with Hosted Compute Element instantiation
  • Site Admin will contact Suchandra to get technical process rolling
  • Points of Contact:
    • OSG: Rob
    • GlueX: Mark
  • Next Meeting: Friday, April 28