Amarel Compute Cluster - General
Date: February 6, 2024 11:31 AM — Table of Contents
Notes:
- When making files, sure the ‘group owner’ is the Holmes Lab group, g_ah1491_1. To change the group owner, run in the terminal $
chgrp g_ah1491_1 /path/to/file.ext
(or for a folder, $chgrp -R g_ah1491_1 /path/to/folder
)- To see who the group owner is, run $
ls -l
. The file/folder will be listed asrwxrwxrwx author group ...
- To see who the group owner is, run $
- 1-2 days a month Amarel does maintenance- you can’t connect to the compute nodes, see or edit your files, and any running jobs will be paused (though they won’t be stopped). You can see when maintenance days are scheduled here: https://oarc.rutgers.edu/amarel-system-status
Personal Storage
Each NetID Gets personal storage of:
- /home/NetID > 100GB of storage, not fastest, backed up
- /scratch/NetID > 1TB of storage, fast, up to 2TB before purging, not backed up
Best practices
- Move files to your /scratch directory and run your job from that location.
- Don’t leave files sitting unused (unaccessed) for a long time because you may lose them to the 90 day purge process.
- Frequently check the utilization or quota for your /home and /scratch directories to ensure that they don’t become unusable due to over-filling with files.
The general approach for using /scratch is to copy your job’s files (input files, libraries, etc.) to /scratch, run your job and write output files to /scratch, then move the files you need to save to your /home or /projects directory.
Helpful Code Snippets:
You can stage needed files in /scratch from within a job (either within your job script for a batch job or on the command line during an interactive job):
mkdir /scratch/$USER/$SLURM_JOB_NAME-$SLURM_JOB_ID scp my-dir-of-input-files.tar.gz /scratch/$USER/$SLURM_JOB_NAME-$SLURM_JOB_ID tar -zxf /scratch/$USER/$SLURM_JOB_NAME-$SLURM_JOB_ID/my-dir-of-input-files.tar.gz
Holmes Lab Storage
Holmes lab storage and user group is located in Amarel at /projects/f_ah1491_1
To be added to the user group to acces this storage, email help#oarc.rutgers.edu with your NetID and CC Avram, avram.holmes@rutgers.edu. If no response, email pgarias#oarc.rutgers.edu
Storage capacity: 100TB
For file structure and norms, see:
Permissions
Example: Using getfacl
to view the current permissions for ‘examplescript’. Using setfacl
to provide user ‘netID’ read, write, and execute (rwx
) permissions to ‘examplescript’.
getfacl examplescript
setfacl -m u:netID:rwx examplescript
To see what permissions you have in a directory, you can do
ls -ld /home/netID
Slurm Job
Slurm jobs (sending jobs to be run in the compute cluster) should be used for everything EXCEPT downloads from the internet. Downloads from the internet should be run in login nodes (see below). Non-download jobs should all be packaged and run via slurm in the compute nodes
- Save your script as a scriptname.sh file (or if it’s a python script, scriptname.py)
- Create shell script
- Open a new file in text editor (BBEdit, Textedit, VSCode, etc.)
- paste this code:
#!/bin/bash #SBATCH --partition=p_dz268_1 #SBATCH --job-name=name.sh #SBATCH --cpus-per-task=9 #SBATCH --mem=1G #SBATCH --time=2-00:00:00 #SBATCH --output=/path/batch_jobs/out/name_%A.out #SBATCH --error=/path/batch_jobs/err/name_%A.err module purge # Activate the holmesenv virtual environment to use installed packages eval "$(conda shell.bash hook)" # Properly initialize Conda conda activate /projects/community/holmesenv #change to whatever conda env you need # Run the Python script (or bash) python3 /projects/f_ah1491_1/analysis_tools/script.py
- Change time=48:00:00 to however much time you think you’ll need. Max to request is 2 weeks, but the more time you request the longer your slurm job will sit in the queue before running.
- To estimate timing, try downloading 1 subject file and time how long the download takes, then multiply that by number of subjects
- Change
python3 /projects/f_ah1491_1/analysis_tools/script.py
to whatever the script you want to run is - change
/projects/community/holmesenv
to whatever conda you need, or keep this as default - change #SBATCH –output and —err paths
- if you have a name like ‘name.out’, that is not changing based on job, it will override each time you run this job, so the err and out file will only be from the most recent run
- if you want to save the err and out file from each run, have the name like name_%A.out
- %A = job ID
- IMPORTANT if running a job array
- %A = job ID
- other ways to name:
- %N = node
- %j = job allocation number
- %a = array index
- change #SBATCH –job-name=name.sh to a name you want to see on the ‘Running jobs’ when you call sacct
- make it short— sacct or watch only allows you to see the first 8 characters of this name
- Doesn’t need to be consistent with anything else
- can also have the % options listed above,
- IMPORTANT if running a job array
Save this file as a run_scriptname.sh file, naming it something relevant to the package + shell
Make sure both .sh files are in the SAME folder in your home directory, or somewhere in amarel, not on your local computer
- open terminal ($ indicates terminal entry)
- $
cd /home/netID/folder...
← replace with wherever your run_scriptname.sh files are saved - $
chmod ugo+rwx filename.ext
chmod ugo+rwx run_filename.ext
- $
chmod ugo+rwx dirname
- $
sbatch run_filename.sh
- $
- check in terminal using
sacct
to see if your job worked- Make sure state says “Running”
- 2 days a month is maintenance, so jobs will say “failed” during those times. Maintenance calendar: https://oarc.rutgers.edu/amarel-system-status/
Helpful commands
if anything weird comes up can do scancel <your netID>
sacct -e
shows all the variables you could pull up for existing/past jobs
sacct —state
=failed, running, pending, completed
From Cluster User Guide:
Common commands
Sending Files to and from Amarel
Let’s assume you’re logged-in to a local workstation or laptop and not connected to Amarel. To send files from your local system to your Amarel /home directory,
scp file-1.txt file-2.txt <NetID>@amarel.rutgers.edu:/home/<NetID>
To pull a file from your Amarel /home directory to your laptop (note the “.” at the end of this command),
scp <NetID>@amarel.rutgers.edu:/home/<NetID>/file-1.txt .
If you want to copy an entire directory and its contents using scp, you’ll need to “package” your directory into a single, compressed file before moving it:
tar -czf my-directory.tar.gz my-directory
After moving it, you can unpack that .tar.gz file to get your original directory and contents:
tar -xzf my-directory.tar.gz
A handy way to synchronize a local file or entire directory between your local workstation and the Amarel cluster is to use the rsync utility. First, let’s sync a local (recently updated) directory with the same directory stored on Amarel:
rsync -trlvpz work-dir gc563@amarel.rutgers.edu:/home/gc563/work-dir
In this example, the rsync options I’m using are:
- t (preserve modification times)
- r (recursive, sync all subdirectories)
- l (preserve symbolic links)
- v (verbose, show all details)
- p (preserve permissions)
- z (compress transferred data)
To sync a local directory with updated data from Amarel:
rsync -trlvpz <your NetID>@amarel.rutgers.edu:/home/<your NetID>/work-dir work-dir
Here, we’ve simply reversed the order of the local and remote locations.
For added security, you can use SSH for the data transfer by adding the e option followed by the protocol name (SSH, in this case):
rsync -trlvpze ssh <your NetID>@amarel.rutgers.edu:/home/<your NetID>/work-dir work-dir
Modules available in Amarel (2024)
Help desk: email help@oarc.rutgers.edu
Amarel Info:
Amarel OS: CentOS Linux release 7.9.2009 (Core)
holmesenv modules installed:
Modules/pkg Installed in /projects/community/holmesenv
BASICS (From [Amarel User Guide](https://sites.google.com/view/cluster-user-guide#h.17qhrejyd98m)):
Cluster User Guide:
Getting access (requesting an account)
Boilerplate for proposal development
Granting access to files and folders for other users
Job partitions (job submission queues)
Using the Open OnDemand interface
Storage file sets and how to use them
Basics of moving files to/from the cluster
Transferring files with external institute using cloud bucket
Transferring files using Globus Personal Connect
Transferring files to cloud storage using rclone
Setting-up your rclone configuration on Amarel
Passwordless access and file transfers using SSH keys
Parallel (multicore MPI) job example
Parallel interactive job example
Connecting your lab’s systems to Amarel
Snapshots of /home and /projects data
Terminal Commands
conda activate /projects/community/holmesenv #activate holmesenv conda
cd /home/netID #change working directory
ls #contents of wd
tree #structure of files within wd
cd .. #goes to parent dir
mkdir dirName #make directory
rm filename #remove file
mv /sourceDir /movingLocation
cat file.extension #displays file contents
vim file.extension #like cat but with scroll
sacct #see all the jobs you're running
#optional specifications for sacct:
sacct --format=JobId,JobName%50,Partition%15,State,Elapsed,ExitCode,Start,End --starttime=2024-06-08T22:43:21
watch -n 1 squeue -u netID # View in real time all the jobs you're running
#ctrl+C to exit
#Use terminal packages
module use /projects/community/modulefiles
module load FreeSurfer/7.4.1-ez82
module load fsl/6.0.0-gc563
Notes