RCI Cluster
A powerful shared compute cluster with 100+ A100 GPUs, accessed via login nodes. Ideal for large experiments, sweeps, and long jobs.
π Access
ssh username@login3.rci.cvut.cz
Use login nodes for job submission and environment setup only (they donβt have GPUs).
π¦ Storage
Refer to storage info. Use /scratch, /home, or /storage/plzen1/home/username as appropriate.
βοΈ Interactive Jobs
Launch a debugging session:
srun --partition=interactive --gres=gpu:1 --mpi=pmix --mem 25G --ntasks-per-node=1 --pty bash -i
π Job Submission
Submit with:
sbatch bash_scripts/my_job.sh
Example job script:
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --output=output.log
#SBATCH --error=error.log
#SBATCH --gres=gpu:1
#SBATCH --mem=25G
#SBATCH --ntasks=1
module load PyTorch/2.5.1-foss-2023b-CUDA-12.4.0
source path/to/my_env/bin/activate
python train.py
π Monitoring Jobs
squeue | grep username # Check running jobs
scancel <job_id> # Cancel job
π Array Jobs
Useful for launching multiple jobs with a single script:
#!/bin/bash
#SBATCH --job-name=multi_train
#SBATCH --output=logs/out_%A_%a.log
#SBATCH --error=logs/err_%A_%a.log
#SBATCH --array=0-3
#SBATCH --gres=gpu:1
#SBATCH --mem=32G
#SBATCH --ntasks=1
module load PyTorch/2.5.1-foss-2023b-CUDA-12.4.0
source path/to/my_env/bin/activate
# List of configs to run
CONFIGS=("imagenet" "coco" "voc" "laion")
python train.py --dataset ${CONFIGS[$SLURM_ARRAY_TASK_ID]}
Submit with:
sbatch run_array.sh
Each job will run train.py with a different dataset.
Tutorial on Interactive Python development on RCI GPU nodes
π Debugging
- Jupyter on RCI
- Use interactive jobs for runtime debugging
π File Sync
Same options as Non-RCI servers:
- PyCharm deployment
rsync,scp,git