CADESUser DocumentationSHPC Condo User GuideRunning JobsResource Queues


NOTICE: Nodes belonging to the birthright, chem, cnms, ccsd, bsd, mstd, ntrc, nucphys, virtues, theory, qmc, and ccsi groups have now moved to the Slurm scheduler and can be accessed from or-slurm-login.ornl.gov. See the Running Jobs documentation for more details.


SHPC Condo PBS Resource Queues

Three condo groups, ARM in CADES Open SHPC Condo mand NSET (formerly GIST) and NSED in CADES Moderate SHPC Condo
Use Moab/Torque/PBS to schedule jobs. The SHPC Condos have separate PBS and SLRUM partitions and login addresses.

Cades-Open SHPC Condo (Arm) or-condo.login.ornl.gov CADES-Moderate SHPC Condo (NSET, NSED) mod-condo-login.ornl.gov

Slurm Transition

Moab has gone out of support, so all of the condo groups will eventually be transitioned to the Slurm partition. As such, all three of these condo access groups have a Slurm testbed available to use to ready their codes for that transition.

Open Condo Slurm Testbed

Slurm is available in the CADES Open SHPC Condo through the load-balanced login nodes or-slurm-login.ornl.gov. These nodes can be ssh'd into directly from the ORNL network:

$ ssh <uid>@or-slurm-login.ornl.gov

Use the following group specific batch directives in your #SBATCH script. Full batch script examples can be found in Execute a Slurm Job.

#SBATCH -A arm
#SBATCH -p testing

Moderate Condo Slurm Testbed

Slurm is available in the mod-condo through the login nodes mod-slurm-login01.ornl.gov and mod-slurm-login02.ornl.gov. These two nodes are accessible by first logging into one of the existing mod-condo login nodes, and then the Slurm login node:

$ ssh <uid>@mod-condo-login01.ornl.gov
$ ssh <uid>@mod-slurm-login01.ornl.gov
`

Use the following group specific batch directives in your #SBATCH script. Full batch script examples can be found in Execute a Slurm Job.

NSED:
#SBATCH -A nsed
#SBATCH -p testing
NSET:
#SBATCH -A nset
#SBATCH -p testing

This page describes PBS resources queues in the CADES SHPC Condo environment. The hardware in each queue, queue access policies, quality of service specifications, and PBS directives required to access each queue are described. The two tables below list the technical specifications of each resource queue available in the CADES SHPC Condos. For more information on submitting a job to a resource queue, view the Execute a Job page. The PBS directives required to submit to each queue are listed in the the PBS Directive section near the bottom of the page.

Open Condo PBS Queues

Name # Nodes Cores Micro arch. RAM Local Scratch GPU GPU Details
batch 25 32(156), 36(70) Haswell, Broadwell 125G (204), 250G (22) 233G (164), 1.9T (62) N/A N/A
gpu_ssd 2 36 Broadwell 250G 1.8T 1x K80 (GK210) 2x 12G GDDR5, Kepler
arm_high_mem 28 36 Broadwell 250G 1.8T N/A N/A

| Total: | 55 |

Moderate PBS Condo Queues

Name # Nodes Cores Micro arch. RAM Local Scratch GPU GPU Details
batch 155 32 Haswell 128G 250G N/A N/A
dell_gpu 9 36 Broadwell 256G N/A 4x K80 (GK210) 2x 12G GDDR5, Kepler
Total: 204

The two tables below list the number of nodes allocated in each resource queue to each group:

Open Condo PBS Nodes by Group

Group Nodes
cades-arm batch:25,gpu_ssd:2,arm_high_mem:28

Moderate Condo PBS Nodes by Group

Group Nodes
cades-gist dell_gpu:9
cades-nsed batch:155

PBS Resource Queue Access

Groups which have purchased nodes in a resource queue have priority access to those resources when submitting jobs. Groups which have not purchase resources in a queue can still submit jobs to the queue, but their jobs will be preempted and restarted if a user with priority submits a job that needs the resources.

📝 Note: Jobs using the burst qos may be preempted at anytime.

To submit a job with priority access to your group resources, you need to specify either the 'standard' (std), 'long' (long), or 'development' (devel) Quality of Service (QoS) setting in your job script or qsub command. The std QoS is the default and does not need to be specified if you want to use it. I n order to submit jobs to a resource queue that your group does not have priority access to, you need to submit a 'burst' job. More information on the difference between each QoS setting is listed in the next section on QoS specification. Details on how to submit jobs with and without priority are listed for each group in the PBS directive section near the bottom of the page.

PBS Resource queue group access quick reference:

Purchased Queue Resources Can Submit Jobs QoS Type Job Flags
Yes Yes std,long,devel Preemptor
No Yes burst Preemptee,Restartable

PBS Quality of Service (QoS) Specification

There are four primary Quality of Service levels that can be specified for a PBS job: standard, long, burst, and development. The tables below shows the differences in maximum walltime and priority (preemptability) between each of these service levels:

Open Research Condo PBS Partition

QoS Name Preemptable Max Walltime
devel No 00:04:00:00
burst Yes 02:00:00:00
std No 02:00:00:00
long No 14:00:00:00

Moderate Condo PBS Partition

Qos Name Preemptable Max Walltime
burst Yes 05:00:00:00
std No 02:00:00:00
long No 05:00:00:00

Resource Queue PBS Directives

The PBS directives required to submit jobs to each resource queue are listed below organized by group. Each group has two lists of directives:

  • 'Standard PBS directives' : These directives are used to submit non-burst (std,long,devel) jobs.
  • 'Burst PBS directives' : These directives are used to submit burst jobs.

📝 Note: This syntax indicates that you should pick one of the options in brackets. Lines without brackets can be copied without any changes.

  • [ option_a | option_b | ... ]

Group List:

📝 Note: If you do not see your group listed, please contact the CADES team and include:

  • Help with SHPC Condo Registration
  • UCAMS ID or XCAMS ID, contact information, reason for requesting an SHPC Condo allocation, and the name of your directorate and division.

Resource Queue Directives by Group

Atmospheric Radiation Measurement (ARM)

Standard PBS directives:

#PBS -W group_list=cades-arm
#PBS -A arm
#PBS -q [batch|gpu_ssd|arm_high_mem]
#PBS -l qos=[std|long|devel]

Burst PBS directives:

#PBS -W group_list=cades-user
#PBS -A arm-burst
#PBS -q [batch|gpu|gpu_p100|gpu_ssd|high_mem|arm_high_mem|chem_high_mem|dhigh_mem|skylake]
#PBS -l qos=burst

National Security Emerging Technologies (NSET) (formerly Geographic Data Science (GDS)

Standard PBS directives:

#PBS -W group_list=cades-gist
#PBS -A gist
#PBS -q dell_gpu
#PBS -l qos=[std|long]

NSET does not have burts in Moab.

Nuclear Science and Engineering Directorate (NSED)

Standard PBS directives:

#PBS -W group_list=cades-nsed
#PBS -A nsed
#PBS -q batch
#PBS -l qos=[std|long]

Burst PBS directives: NSED users can use the burst qos to run two additional jobs as burst jobs.

#PBS -W group_list=cades-nsed
#PBS -A nsed
#PBS -q [batch|dell_gpu]
#PBS -l qos=burst

Moderate Condo Burst (mod-burst)

Users who have been approved for the Mod-burst UCAMS group may run up to 50 burst jobs in Moderate. Burst PBS directives:

#PBS -W group_list=cades-mod-burst
#PBS -A mod-burst
#PBS -q [batch|dell_gpu]
#PBS -l qos=burst

results matching ""

    No results matching ""