PBS Job Submission
From Wikisym
Contents |
Introduction
For the efficient use of the cluster, Cluster Management software such as the Portable Batch System, PBS, has been installed. After logon to a cluster, the user is at the Front-End node. When a program is run, it is also immediately run on the Front-End node. This is called the interactive mode, which is convenient for running simple commands like ls, vi, etc. or for compiling a program. But long computing jobs should be submitted through a queuing system like PBS instead. The submitted job will be in a queue waiting for its turn, then will be sent to one or more Compute node(s), where the job will have dedicated processors until it finishes. Therefore, the job will run faster and the cluster will be more efficiently utilized.
PBS Commands
PBS supplies both command line commands and a graphical interface. These are used to submit, monitor, modify, and delete jobs. The following are some frequent used PBS user commands and their functions:
- qsub is used to submit the job
- qstat is used to see the status of the submitted jobs and the cluster
- qdel is used to cancel a submitted job
PBS Script File
To submit a job, it must described by a script file, in which the names of programs to be executed and other parameters are specified. Two examples below are script files for submitting a serial (single processor) and a parallel (MPI) jobs.
Serial Job Sample Script
You can copy the text below and save it as your script. The script file name can be anything. Then edit the file to change the job name on the 5th line. The job name is your choice. Edit the 16th line to specify the directory the job should be started (working directory). Edit the 22nd line (the last line) to specify your program name.
#! example of job file to submit serial applications #! lines starting with #PBS are options for the qsub command #! Name of job #PBS -N Test_Serial #! Specific the shell types #PBS -S /bin/bash #! Specific the queue type #PBS -q default #! Mail to user when job terminate or abort #PBS -m ae #!change the working directory (default is home directory) cd <working directory> echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` #! Running the "a.out" program. ./a.out
If you want you can also specify the error and output files by adding the following lines BEFORE #!change the working directory (default is home directory)
#! Name of output files for std output and error; #! if non specified defaults are <job-name>.o<job number> and <job-name>.e<jobnumber> #PBS -e test.err #PBS -o test.log
Parallel Job Sample Script
To run an MPI parallel job, you can copy the text below and save it as your script. The script file name can be anything. Then edit the file to change the job name on the 5th line. The job name is your choice. Edit the 9th line to specify the number of nodes required. Because each node has 2 processors, the number of nodes is half of the number your parallel processes. Your job will not start until there are enough free nodes as specified here. Edit the 21st line to specify the directory the job should be started (working directory). Edit the 33rd line to specify your program name in the MPIRUN command.
#! example of job file to submit parallel MPI applications #! lines starting with #PBS are options for the qsub command #! Name of job #PBS -N Test_MPI #! Number of nodes (in this case I require 4 nodes with 2 CPU each) #! The total number of nodes passed to mpirun will be nodes*ppn #PBS -l nodes=4:ppn=2 #! Specific the shell types #PBS -S /bin/bash #! Specific the queue type #PBS -q default #! Mail to user when job terminate or abort #PBS -m ae #!change the working directory (default is home directory) cd <working directory> echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` echo This jobs runs on the following processors: echo `cat $PBS_NODEFILE` #! Create a machine file for MPICH cat $PBS_NODEFILE > machine.test #! Run the parallel MPI executable with 8 processor (nodes*ppn) #! Running the "a.out" program. /opt/mpich/intel/bin/mpirun -np 8 -machinefile machine.test a.out rm -f machine.test
If you want you can also specify the error and output files by adding the following lines BEFORE #!change the working directory (default is home directory)
#! Name of output files for std output and error; #! if non specified defaults are <job-name>.o<job number> and <job-name>.e<jobnumber> #PBS -e test.err #PBS -o test.log
Job Submission
For the Itanium cluster use the PBS is the primary scheduler. Here is an example of command to sumit job on the Itanium cluster you have only to change the name of PBS script.
/opt/torque/bin/qsub <pbs_scripte_file>
For example
[pao@cluster pao]$ /opt/torque/bin/qsub test.pbs 837.cluster.hpcc.nectec.or.th
837.cluster.hpcc.nectec.or.th is the job identification number returned by PBS. You can use this number to communicate with your job. If you want to check your job status you can use "qstat" command for example
[pao@cluster pao]$ /opt/torque/bin/qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 824.cluster ReD1000 c00b010 02:15:42 R default 825.cluster ReD1200 c00b010 02:15:30 R default 826.cluster ReD1400 c00b010 02:15:15 R default 827.cluster ReD1600 c00b010 02:15:02 R default 828.cluster ReD1800 c00b010 02:14:48 R default 829.cluster ReD2000 c00b010 02:14:32 R default 830.cluster ReD2500 c00b010 02:08:43 R default 831.cluster ReD3000 c00b010 02:08:33 R default 835.cluster ABINIT c00c00x 00:56:43 R default 837.cluster Test_Serial pao 00:01:43 R default
If the processors in the system are free and enough for the user's request, The jobs status will be change state from "Q" to "R". When the job are computed success the job will be released from the "qstat" command.
To delete a job that you have submitted to the queue.
/opt/torque/bin/qdel <job_number>
For example
/opt/torque/bin/qdel 837.cluster.hpcc.nectec.or.th
PBS Document and Man pages
PBS user command
PBS Man pages
PBS FAQ
Categories: Public | RDC4
