Compute Resource

Note: To access links on this page, you must be connected to UMD's GlobalProtect VPN.

Zaratan Basics

Please refer to the official documentation:

Start with the Getting Started and Knowledge Base sections. These pages explain:

It is essential that you read and understand these topics before running jobs. Knowing how to submit, monitor, and troubleshoot jobs will save time and prevent misuse of the cluster.


Login Node Etiquette

Do not run compute-heavy tasks on the login nodes. These nodes are shared by all users and are intended only for:

Running large tasks on the login node can slow down or hang sessions for everyone. Always submit compute workloads through SLURM so they run on proper compute nodes.


Dataset and Model Storage

Your home directory quota is 10GB. Do not store large datasets, models, or checkpoints in your home directory.

For storage guidelines, see the HPC Storage Tiers page: https://hpcc.umd.edu/kb/storage/#high-performancescratch-storage-tier

Recommended Practice

Use scratch storage or project storage for:

Scratch is high-performance storage designed for large files and frequent reads/writes.

Setting Cache Locations to ~/scratch

Create directories first:

mkdir -p ~/scratch/conda_pkgs
mkdir -p ~/scratch/hf_cache
mkdir -p ~/scratch/deepspeed_cache

Conda Cache

Add this to your ~/.condarc:

pkgs_dirs:
  - ~/scratch/conda_pkgs

HuggingFace Cache

Add to ~/.bashrc (or your job script):

export HF_HOME=~/scratch/hf_cache
export TRANSFORMERS_CACHE=~/scratch/hf_cache
export HF_DATASETS_CACHE=~/scratch/hf_cache

DeepSpeed Cache

Add to ~/.bashrc (or your job script):

export DS_ACCELERATOR_CACHE=~/scratch/deepspeed_cache
export DEEPSPEED_CACHE=~/scratch/deepspeed_cache

Reload your shell config:

source ~/.bashrc

Usage Budget

Our allocation is limited. Please try to keep your compute usage under ~9,000 SU per team and storage usage under 400GB per team. Be mindful and efficient with resource requests.

If you have questions or run into issues, please reach out early — we're here to help you use the cluster effectively!




Back to top | © Furong Huang at UMD | View template on Github