Factory-AI for Deep Learning Purposes

Tutorial, CEA List, SIALV Laboratory, 2022

Taught how to effectively use the internal HPC cluster to optimize Deep Learning experiments.

The course included:

  • SLURM Tutorial:
    • Principles of nodes, jobs, submissions, queue
    • How to submit a job
      • configuration to maximize usage of partition resources
      • multi-node and multi-process setting
  • Pytorch Tutorial:
    • Dataloader
    • Distributed training
      • along with SLURM by using set environment variables
    • Avoid CPU/GPU synchronization

After this course, a noticable number of experiments were achieved faster on the cluster and computational resources were better used.