HPC Resource Management System — User Guide

Introduction

Welcome to the Adamas University shared High Performance Computing (HPC) resource management system. This platform helps students and researchers request CPU and GPU resources through a web portal, get administrator approval, and then run compute jobs using simple wrapper scripts. You never need to interact with SLURM directly—everything is managed for you.

How the System Works — Architecture Overview

Below is a high-level architecture diagram of the system:

HPC architecture diagram

Component Summary

End-to-End Workflow

  1. You submit a resource request through the web portal.
  2. An administrator reviews your request and approves or rejects it.
  3. If approved, a Linux account is created and the correct wrapper script is added to your ~/bin directory based on your request (CPU-only or GPU).
  4. You receive credentials by email and connect via SSH (or VS Code Remote SSH).
  5. You use the wrapper script to run jobs; SLURM schedules and enforces resource limits.

Applying for Compute Resources

How to Access and Log In

  1. Open the web portal URL provided by Adamas University.
  2. Sign in using your university credentials.
  3. Go to New Request to open the resource request form.

Login page

Request Form Fields (Step-by-Step)

Fill in the form carefully. These details are used for approval, scheduling, and creating your Linux account. Example fields are shown below.

  1. Personal and academic details

    • Enter your full name, institutional email, and department/lab.
    • Use your official email because approval notifications and credentials will be sent there.
  2. Project identification

    • Add a short Project Title that you can recognize later.
    • Provide a concise Project Summary describing what you will run and why HPC is needed.
  3. Compute type (CPU or GPU)

    • Choose GPU Required = Yes only if your work requires GPU acceleration.
    • If you choose No, you will get a CPU-only environment and only the CPU job script.
  4. Resource sizing

    • Specify CPU cores and RAM required for your workload.
    • If GPU is required, include the number of GPUs.
  5. Schedule window

    • Provide Estimated Start Date/Time and Estimated End Date/Time.
    • This helps administrators plan availability and approve requests faster.
  6. Review and submit

    • Double-check all values for accuracy.
    • Submit the form and wait for admin approval.
Field Description Example
Full NameYour legal namePriya Sharma
EmailInstitutional emailpriya@adamasuniversity.ac.in
DepartmentYour department or labBioinformatics
Project TitleShort project nameRNA-Seq Analysis
Project SummaryBrief description of workTranscriptome analysis for cohort X
CPU CoresRequested CPU cores16
RAM (GB)Requested memory64
GPU RequiredYes/NoYes
GPU CountNumber of GPUs (if required)1
Estimated Start Date/TimeWhen you plan to begin2026-02-10 10:00
Estimated End Date/TimeWhen you expect to finish2026-03-05 18:00

Approval Workflow

Admin Review

Administrators review requests in the admin panel. They can approve or reject based on resource availability and policy.

What Approval Means

Running Your Jobs

Accessing the HPC System

You can connect using:

Using Wrapper Scripts

The system provides one wrapper script in your ~/bin directory, depending on your approved request:

You do not need to use SLURM commands directly—these scripts handle everything.

CPU Job Example

~/bin/submit_cpu_job.sh train_cpu.py

GPU Job Example

~/bin/submit_gpu_job.sh train_gpu.py

Where Job Output Goes

When you run a job using the wrapper script, a slurm_jobs directory is created automatically (if it does not already exist). SLURM output and error logs are stored there:

The wrapper script will print the log file path after submission.

Common Commands

Use these commands in your SSH session to monitor usage and jobs:

# List files and disk usage
ls -lh
du -sh .

# Check running and queued jobs
squeue -u "$USER"

# View job details (replace <JOB_ID>)
scontrol show job <JOB_ID>

# View SLURM output logs (replace <JOB_NAME> and <JOB_ID>)
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.out

# View SLURM error logs (replace <JOB_NAME> and <JOB_ID>)
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.err

Example Workflows

Example 1: CPU Job (Data Preprocessing)

  1. Upload your dataset to /data/<your-username>/project1/.
  2. Start a CPU job using the wrapper script:
~/bin/submit_cpu_job.sh  preprocess.py --input /data/$USER/project1/raw.csv --output /data/$USER/project1/clean.csv
  1. Monitor job status:
squeue -u "$USER"
  1. Check results in /data/$USER/project1/.

Example 2: GPU Job (Model Training)

  1. Upload your training data to /data/<your-username>/project2/.
  2. Start a GPU job using the wrapper script:
~/bin/submit_gpu_job.sh  train.py --data /data/$USER/project2 --epochs 50
  1. Monitor job status:
squeue -u "$USER"
  1. Review the output log file and results in /data/$USER/project2/.

Troubleshooting & Tips

Common Error Cases

How to Check SLURM Output and Error Logs

# Replace with your actual output file name
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.out

# Check error logs for failures
cat slurm_jobs/slurm-<JOB_NAME>-<JOB_ID>.err

# If the wrapper script printed a log path, use that file directly
cat /path/to/your/slurm-output.log

Best Practices