PBS Job Submission

From Wikisym

Jump to: navigation, search


Contents

Introduction

For the efficient use of the cluster, Cluster Management software such as the Portable Batch System, PBS, has been installed. After logon to a cluster, the user is at the Front-End node. When a program is run, it is also immediately run on the Front-End node. This is called the interactive mode, which is convenient for running simple commands like ls, vi, etc. or for compiling a program. But long computing jobs should be submitted through a queuing system like PBS instead. The submitted job will be in a queue waiting for its turn, then will be sent to one or more Compute node(s), where the job will have dedicated processors until it finishes. Therefore, the job will run faster and the cluster will be more efficiently utilized.

PBS Commands

PBS supplies both command line commands and a graphical interface. These are used to submit, monitor, modify, and delete jobs. The following are some frequent used PBS user commands and their functions:

  • qsub is used to submit the job
  • qstat is used to see the status of the submitted jobs and the cluster
  • qdel is used to cancel a submitted job

PBS Script File

To submit a job, it must described by a script file, in which the names of programs to be executed and other parameters are specified. Two examples below are script files for submitting a serial (single processor) and a parallel (MPI) jobs.

Serial Job Sample Script

You can copy the text below and save it as your script. The script file name can be anything. Then edit the file to change the job name on the 5th line. The job name is your choice. Edit the 16th line to specify the directory the job should be started (working directory). Edit the 22nd line (the last line) to specify your program name.

#! example of job file to submit serial applications
#! lines starting with #PBS are options for the qsub command

#! Name of job
#PBS -N Test_Serial
#! Specific the shell types 
#PBS -S /bin/bash

#! Specific the queue type
#PBS -q default
 
#! Mail to user when job terminate or abort
#PBS -m ae

#!change the working directory (default is home directory)
cd <working directory>
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`

#! Running the "a.out" program.
./a.out

If you want you can also specify the error and output files by adding the following lines BEFORE #!change the working directory (default is home directory)

#! Name of output files for std output and error;
#! if non specified defaults are <job-name>.o<job number> and <job-name>.e<jobnumber>
#PBS -e test.err
#PBS -o test.log

Parallel Job Sample Script

To run an MPI parallel job, you can copy the text below and save it as your script. The script file name can be anything. Then edit the file to change the job name on the 5th line. The job name is your choice. Edit the 9th line to specify the number of nodes required. Because each node has 2 processors, the number of nodes is half of the number your parallel processes. Your job will not start until there are enough free nodes as specified here. Edit the 21st line to specify the directory the job should be started (working directory). Edit the 33rd line to specify your program name in the MPIRUN command.

#! example of job file to submit parallel MPI applications
#! lines starting with #PBS are options for the qsub command

#! Name of job
#PBS -N Test_MPI

#! Number of nodes (in this case I require 4 nodes with 2 CPU each)
#! The total number of nodes passed to mpirun will be nodes*ppn 
#PBS -l nodes=4:ppn=2

#! Specific the shell types 
#PBS -S /bin/bash

#! Specific the queue type
#PBS -q default
 
#! Mail to user when job terminate or abort
#PBS -m ae

#!change the working directory (default is home directory)
cd <working directory>
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo `cat $PBS_NODEFILE`
 
#! Create a machine file for MPICH
cat $PBS_NODEFILE > machine.test
 
#! Run the parallel MPI executable with 8 processor (nodes*ppn) 
#! Running the "a.out" program.
/opt/mpich/intel/bin/mpirun -np 8 -machinefile machine.test a.out
rm -f machine.test

If you want you can also specify the error and output files by adding the following lines BEFORE #!change the working directory (default is home directory)

#! Name of output files for std output and error;
#! if non specified defaults are <job-name>.o<job number> and <job-name>.e<jobnumber>
#PBS -e test.err
#PBS -o test.log

Job Submission

For the Itanium cluster use the PBS is the primary scheduler. Here is an example of command to sumit job on the Itanium cluster you have only to change the name of PBS script.

/opt/torque/bin/qsub <pbs_scripte_file>

For example

[pao@cluster pao]$ /opt/torque/bin/qsub test.pbs 
837.cluster.hpcc.nectec.or.th

837.cluster.hpcc.nectec.or.th is the job identification number returned by PBS. You can use this number to communicate with your job. If you want to check your job status you can use "qstat" command for example

[pao@cluster pao]$ /opt/torque/bin/qstat
Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
824.cluster      ReD1000          c00b010          02:15:42 R default         
825.cluster      ReD1200          c00b010          02:15:30 R default         
826.cluster      ReD1400          c00b010          02:15:15 R default         
827.cluster      ReD1600          c00b010          02:15:02 R default         
828.cluster      ReD1800          c00b010          02:14:48 R default         
829.cluster      ReD2000          c00b010          02:14:32 R default         
830.cluster      ReD2500          c00b010          02:08:43 R default         
831.cluster      ReD3000          c00b010          02:08:33 R default         
835.cluster      ABINIT           c00c00x         00:56:43 R default 
837.cluster      Test_Serial      pao               00:01:43 R default 

If the processors in the system are free and enough for the user's request, The jobs status will be change state from "Q" to "R". When the job are computed success the job will be released from the "qstat" command.

To delete a job that you have submitted to the queue.

/opt/torque/bin/qdel <job_number>

For example

/opt/torque/bin/qdel 837.cluster.hpcc.nectec.or.th

PBS Document and Man pages

PBS user command
PBS Man pages
PBS FAQ

Personal tools