|
|
|
HOW-TO: Sun GridEngine (SGE) on the CTBP
Cluster
- Introduction to SGE
- Writing and Submitting Batch Jobs
- Monitoring and Controlling Jobs
- Site Scheduling Policies
- Sample SGE scripts
Sun Grid Engine has a large set of programs that let the user
submit/delete jobs, check job status, and have information about
available queues and environments. For the normal user the
knowledge of the following basic commands should be sufficient to
get started with Grid Engine and have full control of his jobs:
|
qconf | Shows (-s) the user the configurations and access
permissions only. For example qconf -sql will give you a list of
all available queues.
| | qdel | Gives the user the ability to delete his own jobs
only. |
| qhost | Displays status information about Sun Grid Engine
execution hosts. |
| qmod | Modify the status of your jobs (like
suspend/resume) |
| qmon | Provides the X-windows GUI command interface. |
| qstat | Provides a status listing of all jobs and queues
associated with the cluster. |
| qsub | Is the user interface for submitting a job to Grid
Engine. |
To run a job with grid engine you have to submit it from the
command line or the GUI. But first, you have to write a batch
script file that contains all the commands and environment requests
that you want for this job. If, for example,
test.sh is the name of the script file, then use the command
``qsub'' to submit the job:
qsub test.sh
And, if the submission of the job is successful, you will see
this message:
your job 1 (``test.sh'') has been submitted.
After that, you can monitor the status of your job with the
command ``qstat'' or the GUI QMON.
When the job is finished you will have two output files called
"test.sh.o1" and "test.sh.e1".
In Grid Engine, it is a batch script that contains additionally
to normal UNIX command special comments lines defined by the
leading prefix ``#$''.
The first line of the batch file starts with
#$/bin/bash
which is default shell interpreter for Grid Engine. But you can
force Grid Engine to use your preferred shell interpreter (tcsh for
example) by adding this line at your script file
#$ -S /bin/tcsh
to tell GE to run the job from the current working directory add
this script line
#$ -cwd
if you want to pass some environment variable VAR (or a list of
variables separated by commas) use the -v option like
this
#$ -v VAR (#$ -V passes all variables listed in
env).
Insert the full path name of the files to which you want to
redirect the standard output/error respectively.
#$ -o <path_name>
#$ -e <path_name>
The prefix #$ has many options and is used the same way
you use qsub, so check qsub man
pages to take a look at those options.
Here is a serial sample script that has to be modified to fit
your case:
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
#$ -M myemail
#$ -e error_file
#$ -o output_file
date
sleep 10
date
|
Insert you email-address after the "#$ -M", and also
insert the full path name of the files to which you want to
redirect the standard output/error. after the "#$ -o" (the
"#$ -e") statement, respectively.
Note that that qsub accepts shell scripts
only, not executable files, and also that shell scripts need to be
executable, if it's not the case run the command
chmod u+rwx serial.sh
And after that, to submit the job you simply type
qsub serial.sh
And, from the command line you can use the same options and
type:
qsub -cwd -v VAR -o /home/user -e /home/user
serial.sh
An example of parallel job using 2 processors:
#!/bin/bash
#
#$ -pe mpi 2
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
/opt/mpich/gnu/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines \
-nolocal /opt/hpl/gnu/bin/xhp
|
To actually submit this parallel job:
qsub test.sh
Note: In order to submit your jobs on the cluster your account must be
set up for password-less ssh login to the nodes.
To do this perform the following on ctbp1.ucsd.edu:
cd $HOME
ssh-keygen -t rsa1 -N "" -f $HOME/.ssh/identity
ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
cd .ssh
touch authorized_keys authorized_keys2
cat identity.pub >> authorized_keys
cat id_rsa.pub id_dsa.pub >> authorized_keys2
chmod 640 authorized_keys authorized_keys2
|
After submitting your job to Grid Engine you may track its
status by using either the qstat command, the GUI
interface QMON, or by email.
Monitoring with qstat
The qstat command provides the status of all
jobs and queues in the cluster. The most useful options are:
-
qstat: Displays list
of all jobs with no queue status information.
-
qstat -u hpc1***:
Displays list of all jobs belonging to user
hpc1***
-
qstat -f: gives full
information about jobs and queues.
-
qstat -j [job_id]: Gives the reason why the
pending job (if any) is not being scheduled.
You can refer to the man pages for a complete description of all
the options of the qstat command.
Monitoring Jobs by Electronic Mail
Another way to monitor your jobs is to make Grid Engine notify
you by email on status of the job.
In your batch script or from the command line use the
-m option to request that an email should be send
and -M option to precise the email address where
this should be sent. This will look like:
#$ -M myaddress@work
#$ -m beas
Where the (-m) option can select after which
events you want to receive your email. In particular you can select
to be notified at the beginning/end of the job, or when the job is
aborted/suspended (see the sample script lines
above).
And from the command line you can use the same options (for
example):
qsub -M myaddress@work -m be job.sh
How do I control my jobs
Based on the status of the job displayed, you can control the
job by the following actions:
-
Modify a job: As a user, you
have certain rights that apply exclusively to your jobs. The Grid
Engine command line used is qmod. Check the man
pages for the options that you are allowed to use.
-
Suspend/(or Resume) a job: This uses the UNIX
kill command, and applies only to running jobs, in
practice you type
qmod
-s/(or-r)job_id (where
job_id is given by qstat or
qsub).
-
Delete a job: You can delete a job that is running or
spooled in the queue by using the qdel command like
this
qdel job_id (where job_id is
given by qstat or qsub).
Monitoring and controlling with QMON
You can also use the GUI QMON, which gives a convenient
window dialog specifically designed for monitoring and controlling
jobs, and the buttons are self explanatory.
For further information, see the SGE User's Guide (
PDF, HTML).
Note: these policies may be changed any time, please check this page for updates
To see current SGE queue settings execute q_settings on
ctbp1.ucsd.edu (the cluster frontend).
- The maximum walltime is set to 48 hours (2 days). Default is 30mins, you can change this limit with:
#$ -l h_rt=XX:00:00
- Maximum number of processors per user is 48.
- There is 1 node (2 processors, 4GB RAM) dedicated for debugging runs. The hard wall clock time limit there is 30min. To use these two processors just request maximum 30min wallclock time and your job will be scheduled there:
#$ -l h_rt=00:30:00
You can also access this node through qrsh facility. Just submit the following command: qrsh -l h_rt=00:29:59 and you should immediately get a prompt back on the debugging node. You can use the node as you would frontend with addition that you can run there interactive jobs for up to 30min.
- Parallel jobs (i.e., requesting more than 1 CPU) will have higher
priority than serial, single CPU jobs. So if there are several
serial jobs in the queue and a parallel job is submitted to the
queue this parallel job will most likely skip the queued serial
jobs and will be scheduled ahead of them. This policy is setup to
encourage parallel job submission on the cluster.
- Jobs requiring large amount of memory can request a large memory node with this statement:
#$ -l mem_free=1G
This will guarantee that the job will be sent to one of the 2GB RAM nodes. To check how much free memory is available per node use
qhost -F mem_free
and to see a list of nodes which will be considered for large memory job:
qhost -l mem_free=1G
- There is also defined parallel environment mpi-uni which guarantees that only 1 processor per node will be assigned, leaving the other processor idle. This can be used for jobs which require large memory or heavy IO resources. This can be requested for example with:
#$ -pe mpi-uni 2
- Part of the cluster or even all nodes can be reserved for a user
or group - if there is justified need for this. This will be
determined on a case by case basis. If you would like to reserve any
of the CTBP resources please contact CTBP help desk.
Note: If you're generating and submitting several SGE scripts on the fly please
make sure the individual scripts are submitted with at least 10-15 sec
pause between them (e.g., use sleep 20 in your submitting
loop). Submitting a lot of SGE scripts at the same time puts a
significant strain on SGE resources.
If you have any questions or concerns about these policies please
contact CTBP help desk (ctbp-help @ ctbp.ucsd.edu)
- An example of simple APBS serial job.
#!/bin/csh -f
#$ -cwd
#
#$ -N serial_test_job
#$ -m e
#$ -e sge.err
#$ -o sge.out
# requesting 12hrs wall clock time
#$ -l h_rt=12:00:00
/soft/linux/pkg/apbs/bin/apbs inputfile >& outputfile
|
- An example script for running executable
a.out in
parallel on 8 CPUs. (Note: For your executable to run in parallel it
must be compiled with parallel library like MPICH, LAM/MPI, PVM, etc.)
This script shows file staging, i.e., using fast local filesystem
/scratch on the compute node in order to eliminate speed
bottlenecks.
#!/bin/csh -f
#$ -cwd
#
#$ -N parallel_test_job
#$ -m e
#$ -e sge.err
#$ -o sge.out
#$ -pe mpi 8
# requesting 10hrs wall clock time
#$ -l h_rt=10:00:00
#
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
set orig_dir=`pwd`
echo This job runs on the following processors:
cat $TMPDIR/machines
echo This job has allocated $NSLOTS processors
# copy input and support files to a temporary directory on compute node
set temp_dir=/scratch/`whoami`.$$
mkdir $temp_dir
cp input_file support_file $temp_dir
cd $temp_dir
/opt/mpich/intel/bin/mpirun -v -machinefile $TMPDIR/machines \
-nolocal -np $NSLOTS $HOME/a.out ./input_file >& output_file
# copy files back and clean up
cp * $orig_dir
rm -rf $temp_dir
|
- An example of SGE script for Amber users (parallel run, 4 CPUs,
with input file generated on the fly):
#!/bin/csh -f
#$ -cwd
#
#$ -N amber_test_job
#$ -m e
#$ -e sge.err
#$ -o sge.out
#$ -pe mpi 4
# requesting 6hrs wall clock time
#$ -l h_rt=6:00:00
#
setenv MPI_MAX_CLUSTER_SIZE 2
# export all environment variables to SGE
#$ -V
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This job runs on the following processors:
cat $TMPDIR/machines
echo This job has allocated $NSLOTS processors
set in=./mdin
set out=./mdout
set crd=./inpcrd.equil
cat <<eof > $in
short md, nve ensemble
&cntrl
ntx=7, irest=1,
ntc=2, ntf=2, tol=0.0000001,
nstlim=1000,
ntpr=10, ntwr=10000,
dt=0.001, vlimit=10.0,
cut=9.,
ntt=0, temp0=300.,
&end
&ewald
a=62.23, b=62.23, c=62.23,
nfft1=64,nfft2=64,nfft3=64,
skinnb=2.,
&end
eof
set sander=/soft/linux/pkg/amber8/exe.parallel/sander
set mpirun=/opt/mpich/intel/bin/mpirun
# needs prmtop and inpcrd.equil files
$mpirun -v -machinefile $TMPDIR/machines -nolocal -np $NSLOTS \
$sander -O -i $in -c $crd -o $out < /dev/null
/bin/rm -f $in restrt
|
Please note that if you are running parallel amber8 you must include
the following in your .cshrc:
# Set P4_GLOBMEMSIZE environment variable used to reserve memory in bytes
# for communication with shared memory on dual nodes
# (optimum/minimum size may need experimentation)
setenv P4_GLOBMEMSIZE 32000000
|
- An example of SGE script for APBS job (parallel run, 8 CPUs,
running example input file which is included in APBS distribution
(/soft/linux/src/apbs-0.3.1/examples/actin-dimer):
#!/bin/csh -f
#$ -cwd
#
#$ -N apbs-PARALLEL
#$ -e apbs-PARALLEL.errout
#$ -o apbs-PARALLEL.errout
#
# requesting 8 processors
#$ -pe mpi 8
echo -n "Running on: "
hostname
setenv APBSBIN_PARALLEL /soft/linux/pkg/apbs/bin/apbs-icc-parallel
setenv MPIRUN /opt/mpich/intel/bin/mpirun
echo "Starting apbs-PARALLEL calculation ..."
$MPIRUN -v -machinefile $TMPDIR/machines -np 8 -nolocal \
$APBSBIN_PARALLEL apbs-PARALLEL.in >& apbs-PARALLEL.out
echo "Done."
|
- An example of SGE script for paralell CHARMM job (4 processors):
#!/bin/csh -f
#$ -cwd
#
#$ -N charmm-test
#$ -e charmm-test.errout
#$ -o charmm-test.errout
#
# requesting 4 processors
#$ -pe mpi 4
# requesting 2hrs wall clock time
#$ -l h_rt=2:00:00
#
echo -n "Running on: "
hostname
setenv CHARMM /soft/linux/pkg/c31a1/bin/charmm.parallel.092204
setenv MPIRUN /soft/linux/pkg/mpich-1.2.6/intel/bin/mpirun
echo "Starting CHARMM calculation (using $NSLOTS processors)"
$MPIRUN -v -machinefile $TMPDIR/machines -np $NSLOTS -nolocal \
$CHARMM < mbcodyn.inp > mbcodyn.out
echo "Done."
|
- An example of SGE script for paralell NAMD job (8 processors):
#!/bin/csh -f
#$ -cwd
#
#$ -N namd-job
#$ -e namd-job.errout
#$ -o namd-job.out
#
# requesting 8 processors
#$ -pe mpi 8
# requesting 12hrs wall clock time
#$ -l h_rt=12:00:00
#
echo -n "Running on: "
hostname
/soft/linux/pkg/NAMD/namd2.sh namd_input_file > namd2.log
echo "Done."
|
- An example of SGE script for paralell Gromacs job (4 processors):
#!/bin/csh -f
#$ -cwd
#
#$ -N gromacs-job
#$ -e gromacs-job.errout
#$ -o gromacs-job.out
#
# requesting 4 processors
#$ -pe mpich 4
# requesting 8hrs wall clock time
#$ -l h_rt=8:00:00
#
echo -n "Running on: "
cat $TMPDIR/machines
setenv MDRUN /soft/linux/pkg/gromacs/bin/mdrun-mpi
setenv MPIRUN /soft/linux/pkg/mpich/intel/bin/mpirun
$MPIRUN -v -machinefile $TMPDIR/machines -nolocal -np $NSLOTS \
$MDRUN -v -nice 0 -np $NSLOTS -s topol.tpr -o traj.trr \
-c confout.gro -e ener.edr -g md.log
echo "Done."
|
Please direct any questions or comments related to this web page to
ctbp-help @ ctbp.ucsd.edu
Last modified: February 02 2008 09:20:50 pm.
|