HPC Resource Management System — User Guide

Introduction

Welcome to the Adamas University shared High Performance Computing (HPC) resource management system. This platform helps students and researchers request CPU and GPU resources through a web portal, get administrator approval, and then run compute jobs using simple wrapper scripts. You never need to interact with SLURM directly—everything is managed for you.

How the System Works — Architecture Overview

Below is a high-level architecture diagram of the system:

HPC architecture diagram

Component Summary

React Frontend (NGINX in Docker): The web portal where users submit requests and admins approve them. The React app is served as static files by NGINX inside Docker.
Node.js Backend (Host): Runs on the HPC host to execute privileged provisioning scripts using sudo.
MongoDB (Docker): Stores requests, approvals, and metadata.
CI/CD (GitHub Actions): Builds and deploys the frontend, database configurations, and backend.
Provisioning Scripts: create_cpu_user.sh and create_gpu_user.sh create Linux accounts and prepare the user environment.
Wrapper Scripts: ~/bin/submit_cpu_job.sh and ~/bin/submit_gpu_job.sh submit jobs to SLURM for you.
SLURM: Enforces CPU/GPU partitions and schedules jobs.
XFS Quotas: Enforced on /data to limit storage usage.

End-to-End Workflow

You submit a resource request through the web portal.
An administrator reviews your request and approves or rejects it.
If approved, a Linux account is created and the correct wrapper script is added to your ~/bin directory based on your request (CPU-only or GPU).
You receive credentials by email and connect via SSH (or VS Code Remote SSH).
You use the wrapper script to run jobs; SLURM schedules and enforces resource limits.

Applying for Compute Resources

How to Access and Log In

Open the web portal URL provided by Adamas University.
Sign in using your university credentials.
Go to New Request to open the resource request form.

Request Form Fields (Step-by-Step)

Fill in the form carefully. These details are used for approval, scheduling, and creating your Linux account. Example fields are shown below.

Personal and academic details
- Enter your full name, institutional email, and department/lab.
- Use your official email because approval notifications and credentials will be sent there.
Project identification
- Add a short Project Title that you can recognize later.
- Provide a concise Project Summary describing what you will run and why HPC is needed.
Compute type (CPU or GPU)
- Choose GPU Required = Yes only if your work requires GPU acceleration.
- If you choose No, you will get a CPU-only environment and only the CPU job script.
Resource sizing
- Specify CPU cores and RAM required for your workload.
- If GPU is required, include the number of GPUs.
Schedule window
- Provide Estimated Start Date/Time and Estimated End Date/Time.
- This helps administrators plan availability and approve requests faster.
Review and submit
- Double-check all values for accuracy.
- Submit the form and wait for admin approval.

Field	Description	Example
Full Name	Your legal name	Priya Sharma
Email	Institutional email	priya@adamasuniversity.ac.in
Department	Your department or lab	Bioinformatics
Project Title	Short project name	RNA-Seq Analysis
Project Summary	Brief description of work	Transcriptome analysis for cohort X
CPU Cores	Requested CPU cores	16
RAM (GB)	Requested memory	64
GPU Required	Yes/No	Yes
GPU Count	Number of GPUs (if required)	1
Estimated Start Date/Time	When you plan to begin	2026-02-10 10:00
Estimated End Date/Time	When you expect to finish	2026-03-05 18:00

Approval Workflow

Admin Review

Administrators review requests in the admin panel. They can approve or reject based on resource availability and policy.

What Approval Means

Approved:
- Your Linux account is created automatically based on your request type.
- Your home directory contains the correct wrapper script:
  - CPU-only users: ~/bin/submit_cpu_job.sh
  - GPU users: ~/bin/submit_gpu_job.sh
- You receive an email with login credentials and instructions for SSH or VS Code Remote SSH.
Rejected:
- You receive a notification with the reason for rejection.
- You may edit and re-submit the request if allowed by policy.

Running Your Jobs

Accessing the HPC System

You can connect using:

SSH terminal
VS Code Remote SSH

Using Wrapper Scripts

The system provides one wrapper script in your ~/bin directory, depending on your approved request:

CPU-only users: ~/bin/submit_cpu_job.sh
GPU users: ~/bin/submit_gpu_job.sh

You do not need to use SLURM commands directly—these scripts handle everything.

CPU Job Example

~/bin/submit_cpu_job.sh train_cpu.py

GPU Job Example

~/bin/submit_gpu_job.sh train_gpu.py

Where Job Output Goes

When you run a job using the wrapper script, a slurm_jobs directory is created automatically (if it does not already exist). SLURM output and error logs are stored there:

Standard output: .out files
Errors: .err files

The wrapper script will print the log file path after submission.

Common Commands

Use these commands in your SSH session to monitor usage and jobs:

# List files and disk usage
ls -lh
du -sh .

# Check running and queued jobs
squeue -u "$USER"

# View job details (replace <JOB_ID>)
scontrol show job <JOB_ID>

# View SLURM output logs (replace <JOB_NAME> and <JOB_ID>)
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.out

# View SLURM error logs (replace <JOB_NAME> and <JOB_ID>)
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.err

Example Workflows

Example 1: CPU Job (Data Preprocessing)

Upload your dataset to /data/<your-username>/project1/.
Start a CPU job using the wrapper script:

~/bin/submit_cpu_job.sh  preprocess.py --input /data/$USER/project1/raw.csv --output /data/$USER/project1/clean.csv

Monitor job status:

squeue -u "$USER"

Check results in /data/$USER/project1/.

Example 2: GPU Job (Model Training)

Upload your training data to /data/<your-username>/project2/.
Start a GPU job using the wrapper script:

~/bin/submit_gpu_job.sh  train.py --data /data/$USER/project2 --epochs 50

Monitor job status:

squeue -u "$USER"

Review the output log file and results in /data/$USER/project2/.

Troubleshooting & Tips

Common Error Cases

"Permission denied" when writing to /data: You may be in the wrong directory. Use your assigned path under /data/$USER/.
Job stays in PENDING: The cluster may be busy or your request exceeds available resources. Try fewer resources or a shorter time limit.
Job fails quickly: Check the SLURM output log for errors, missing files, or environment issues.

How to Check SLURM Output and Error Logs

# Replace with your actual output file name
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.out

# Check error logs for failures
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.err

# If the wrapper script printed a log path, use that file directly
cat /path/to/your/slurm-output.log

Best Practices

Request only the resources you need.
Use /data/$USER/ for project storage and clean up unused files.
Start with small test jobs before launching large runs.
Keep your job commands reproducible (store scripts in your project folder).