Using the Rivanna Cluster
About the Rivanna Cluster:
In 2014, the College of Arts and Sciences, the Engineering
School, the UVa Library and the Data Science Institute purchased a
powerful new computing cluster called "rivanna". If you do
computationally intensive work, you might benefit from using it.
The new cluster has 7,000 cores, 1.4 PBs of temporary storage
and fast Infiniband interconnects between nodes. It runs a minimal
version of CentOS operating system. The cluster is managed by
the UVa Advanced Research Computing Services (ARCS) group.
Usage of rivanna is metered in terms of "service
units" (currently equal to the number of CPU hours used), with
each research group being allocated some number of service units. New
research groups are given a free allocation of 5,000 service units,
and can purchase more later or be granted more through a request
For the most up-to-date information about rivanna, see
the ARCS web page: http://arcs.virginia.edu.
The following information is intended to help Physics users get
up and running quickly on the cluster.
You'll need to do two things before you can begin using
- Create a "MyGroups" group to identify the members of your
research group. The people in this group will be allowed to
use part of your research group's allocation of service units
on rivanna. If you have an existing MyGroups group, you can use it.
To create a new MyGroups group, visit:
- Request a startup allocation of service units. You can
do this by filling out this form:
or through the contact form here:
Using the Cluster:
Submitting Batch Jobs:
- Logging in:
Once you've been granted an allocation, you can ssh into the cluster at:
use your Eservices password to log in.
You can try things out interactively there, and it won't count against
- Storage Space:
At least two chunks of storage space will be available to you on
- Your home directory.
This is your ITS home directory space, and will usually be limited
to 4 GB. When you log into the cluster, your initial working
directory will be here.
- Scratch space.
Rivanna has a 1.4 PB pool of scratch space located under the
directory /scratch. Each user has his or her own space there, with a
name like "/home/mst3k". Users are normally limited to a maximum of
10 TB here. Unused files stored in the scratch space will be deleted
after about 28 days.
- Available Software:
I've compiled geant4 version 10.00.p02 and associated packages. In
order to use these, people will need to create or modify their
.bashrc and .bash_profile files and log in again. You can download
an example .bashrc file here and an example
.bash_profile here. The .bashrc
must contain the following lines, and .bash_profile must source
# Load modules:
module load xerces-c-devel
module load cmake/3.0.2
module load gccxml
module load hepmc
module load clhep
module load root
module load openscientist
module load pythia6
module load pythia8
module load cernlib
module load geant4
- Other loadable modules:
If you need other software besides whatever's available by default,
type "module avail" to see optional pre-installed software. If you
see something you need in the resulting list, you can make it
available to you by typing "module load whatever". The command
"module list" will show you a list of the modules you've loaded.
Here are some of the modules currently available:
...and many others.
If you need software that isn't available, please contact
email@example.com and we'll work with ARCS and try to
get it installed for you
Once you have a program you'd like to submit as a batch job, you'll
need to write a "slurm" script. (Slurm is the batch queue management
system used on rivanna.)
The required slurm script will be different depending on whether you
want to submit a single job, or a group of related jobs. Here is
an example for each.
- First, here's an example of a slurm script for a single task:
pawX11 -b testing.kumac
The #SBATCH lines tell the queue system about the job. Initially
you'll want to at least set the --output file and --error files, which
define where stdout and stderr from your job will go. The --mail-user doesn't
currently do anything, but it might be good to put the user's
address there, just in case this feature is enabled later. The
last line is just the command you want to run. The current working
directory will automatically be set to the directory you're in
when you submit the job.
- Now, here's an example showing a slurm script appropriate for
submitting a group of related tasks (for example,
to analyze a group of related data sets, or to run a simulation
several times with different parameters):
# Set the number of tasks in the following line:
# Customize this section to meet your needs.
# In this example, we submit tasks to analyze many data files.
# Each data file has a name like "run*.rz". We want to analyze
# runs 5001 through 5010 (10 runs) so we've set "ntasks" to 10,
# Define the variable "command" so that it does what you want:
echo "Submitting task $i: $command"
srun --cpus-per-task=1 --cpu_bind=cores --exclusive --nodes=1 --ntasks=1 $command 1> slurm-$i.out 2> slurm-$i.err &
Assuming that your slurm script is called "testing.slurm", you could
submit it to the batch queues by typing:
The slurm system divides the cluster's resources into several
sections called "partitions". (Many other batch systems would
use the term "queues" for these.) Most users will want to use
either the "serial" partition, which
includes most of the
cluster's computing resources, or the "economy" partition.
The "economy" partition includes 768 cores that can be used
at no charge. You can choose which partition your jobs will
use by modifying the "--partition" line in your slurm files.
(See the example above.)
Note that there's also a "parallel" partition, intended for
parallel jobs. It's important to remember that "parallel" is just a name.
Your jobs don't need to
be parallelized (e.g., by using MPI) in order to use this partition,
and using this partition won't do any magic parallelization of
For more information about each of these partitions, and a few
You can watch the progress of your batch jobs with:
The command "sacct" will also give you a summary of your running
and completed jobs, along with their exit status.
To get an overview of the queues, type "sview". (This is an X
program, so you'll need to be sure you ssh in with X forwarding
turned on.) Remember that, in slurm's terminology, a batch queue is called a
ARCS provides more information about using slurm on rivanna here:
Complete documentation about slurm can be found here:
Running Graphical Programs:
If you need to use graphical tools while developing the program, the
cluster supports the "x2go" protocol. This is similar to (and based on)
NX. To connect, use the command x2goclient on our CentOS 6 machines,
or download other x2go clients for Windows or OS X.
Viewing Your Remaining Allocation:
The command "allocations" will show you how many CPU hours remain
in your allocation.