Compute Resource
Note: To access links on this page, you must be connected to UMD's GlobalProtect VPN.
Zaratan Basics
Please refer to the official documentation:
- UMD HPC Wiki: https://hpcc.umd.edu
- Zaratan Wiki: https://hpcc.umd.edu/home/clusters/zaratan/
Start with the Getting Started and Knowledge Base sections. These pages explain:
- How to log in to Zaratan
- Basic Linux shell usage
- SLURM job submission and monitoring commands
It is essential that you read and understand these topics before running jobs. Knowing how to submit, monitor, and troubleshoot jobs will save time and prevent misuse of the cluster.
Login Node Etiquette
Do not run compute-heavy tasks on the login nodes. These nodes are shared by all users and are intended only for:
- Submitting jobs
- Monitoring jobs
- Editing files and scripts
- Transferring data
Running large tasks on the login node can slow down or hang sessions for everyone. Always submit compute workloads through SLURM so they run on proper compute nodes.
Dataset and Model Storage
Your home directory quota is 10GB. Do not store large datasets, models, or checkpoints in your home directory.
For storage guidelines, see the HPC Storage Tiers page: https://hpcc.umd.edu/kb/storage/#high-performancescratch-storage-tier
Recommended Practice
Use scratch storage or project storage for:
- Datasets
- Model checkpoints
- Caches for tools like Conda, HuggingFace, DeepSpeed, etc.
Scratch is high-performance storage designed for large files and frequent reads/writes.
Setting Cache Locations to ~/scratch
Create directories first:
mkdir -p ~/scratch/conda_pkgs
mkdir -p ~/scratch/hf_cache
mkdir -p ~/scratch/deepspeed_cache
Conda Cache
Add this to your ~/.condarc:
pkgs_dirs:
- ~/scratch/conda_pkgs
HuggingFace Cache
Add to ~/.bashrc (or your job script):
export HF_HOME=~/scratch/hf_cache
export TRANSFORMERS_CACHE=~/scratch/hf_cache
export HF_DATASETS_CACHE=~/scratch/hf_cache
DeepSpeed Cache
Add to ~/.bashrc (or your job script):
export DS_ACCELERATOR_CACHE=~/scratch/deepspeed_cache
export DEEPSPEED_CACHE=~/scratch/deepspeed_cache
Reload your shell config:
source ~/.bashrc
Usage Budget
Our allocation is limited. Please try to keep your compute usage under ~9,000 SU per team and storage usage under 400GB per team. Be mindful and efficient with resource requests.
If you have questions or run into issues, please reach out early — we're here to help you use the cluster effectively!
Back to top | © Furong Huang at UMD | View template on Github