High-Performance Computing (HPC) Facility: Speed

Since December 2018, Gina Cody School of Engineering and Computer Science faculty, students, postdocs, and staff, have had access to a powerful HPC facility called Speed, which has been optimised for compute jobs that are multi-core aware, require a large memory space, or are iteration intensive.

The current infrastructure comprises:

Seven nodes with 4x 80GB A100 GPUs, sliced into 4x 20GB MIGs.
Twenty four (28), 32-core nodes, each with 512 GB of memory and approximately 10 TB
of volatile-scratch disk space.
Twelve (18) NVIDIA Tesla P6 GPUs, with 16 GB of memory (compatible with the CUDA,
OpenGL, OpenCL, and Vulkan APIs).
One AMD FirePro S7150 GPUs, with 8 GB of memory (compatible with the Direct X,
OpenGL, OpenCL, and Vulkan APIs).
One node with six (6) V100 GPUs

Job Management is handled by the Slurm Workload Manager.

The "cluster" mounts multi-TB, NFS-provided storage, which serves both persistent-scratch data (not backed up) and persistent-store data (backed up).

Software (both open-source and commercial):

Linux OS which supports containers
- Nodes are running either Scientific Linux 7 or Almalinux 9. All nodes to be migrated to Almalinux 9.
Singularity (supports conversion from Docker containers), various machine- and specifically deep learning frameworks, Conda, Ubuntu Lambda Stack, TensorFlow, OpenCV, OpenMPI, OpenISS, MARF, OpenFOAM
Commercial tools, subject to licensing, Fluent, MATLAB, Ansys, and many others.

This infrastructure is continuously maintained by dedicated and professional AITS staff for sysadmin, applications, storage, and networking needs.

In alignment with the University’s digital strategy and open learning, GCS ENCS Network, Security, and HPC (NAG) group began to release some HPC resources for Speed publicly, including job submission script samples. The GCS HPC users / community are encouraged to contribute their own scripts, tricks, and hints via pull requests or report issues with the existing ones on our GitHub page:

● GCS NAG Speed GitHub Page

For more information, please e-mail: rt-ex-hpc@encs.concordia.ca

There are ongoing plans and work in progress for future expansion of Speed compute, GPU, and storage capabilities.

Concordia University is a member of Calcul Québec and the Digital Research Alliance of Canada. The team of analysts at Concordia are trained to offer support, consultation and training on the use of advanced research computing resources hosted within the DRAC.

Please visit Advanced Research Computing (https://www.concordia.ca/it/services/advanced-research-computing.html) for more information.

FAQs

To request access to the HPC Performance Cluster, please e-mail: rt-ex-hpc@encs.concordia.ca.

All students require the written permission of their Supervisor before being granted access to Speed.

Students requesting access should:

Have an ENCS Account (contact the GCS Service Centre for ENCS Accounts).
Forward an email from their Supervisor that states the student may have access to speed to rt-ex-hpc@encs.concordia.ca
For fastest results, email should be sent from an ENCS Account's email address.

To use the HPC cluster, use your ENCS credentials to create an SSH connections with speed.encs.concordia.ca

Instructions on how to create ssh connections to ENCS servers may be found here:

https://www.concordia.ca/ginacody/aits/support/faq/ssh-to-gcs.html

Once you have created an ssh connection, be sure to follow the instructions in the "Getting Started" section of the Speed Manual available as PDF and HTML.

A copy of the Speed user manual can be found here:

https://github.com/NAG-DevOps/speed-hpc/blob/master/doc/speed-manual.pdf

Examples of job scripts are on our public GitHub page:

https://github.com/NAG-DevOps/speed-hpc/tree/master/src

High-Performance Computing (HPC) Facility: Speed

Calcul Québec and the Digital Research Alliance of Canada

How do I Request a Speed Account?

Faculty

Students

How do I Connect to Speed?

How do I run jobs on Speed?

Gina Cody School on social media