Accelerating Machine Learning: Tim's Journey with Supercomputer LUMI

Advancing Machine Learning Algorithms with supercomputing

The Processing Speech and Images (PSI) Group at KU Leuven performs demand-driven image and audio processing research. Tim Lebailly, PhD student at PSI, aims to improve current machine learning algorithms for AI through his research.
In self-supervised learning, a new branch of research in machine learning, the computer is trained to learn from its own data without the need for explicit and costly labels or supervision. However, to compensate for the lack of labels, the model must be trained with even larger quantities of data, requiring more computational resources. This is where HPC comes into play.

Tim Lebailly (PSI, KU Leuven): ‘When using supercomputing, the processing of huge image datasets can be parallelised on many GPUs, which is crucial to reduce the runtime to reasonable values, e.g., days. A single laptop cannot even store the model and its gradients in memory. Even if that were the case, training our model on a laptop CPU would take many years. One run of our method will process about three billion images throughout the whole training.’

Since Belgium is part of the LUMI consortium, Belgian researchers could apply to participate in LUMI pilot phases before the supercomputer was fully operational. Tim participated in the second pilot phase of LUMI, which aimed to test the scalability of the GPU partition (LUMI-G) and generate workloads on the GPUs.

More straightforward research process thanks to LUMI

Tim Lebailly (PSI, KU Leuven) is very excited about LUMI: “LUMI made it easier to perform my research since I can conduct numerous experiments in parallel. If I attempted the same while using Tier-1, I would be using the entire GPU partition exclusively, causing some jobs to remain in the queue for a long time. However, on LUMI, I only utilise a small fraction of the supercomputer's capacity, enabling me to schedule all my jobs simultaneously. This makes the research process much more straightforward.”

Challenges of Running machine learning workloads on AMD GPUs
LUMI features a lot of new technology, including the first generation of AMD GPUs for high-performance computing.
Tim Lebailly (PSI, KU Leuven): “Since this is so new, documentation resources online are very limited (as opposed to NVIDIA hardware). So, getting my machine learning workloads to run on AMD GPUs required some persistence. But now that I have the set-up working, it’s basically the same as running on NVIDIA GPUs, and the performance is more or less the same.”

From Karolina to LUMI: Tim's Experience with various supercomputers
Tim is an experienced HPC user: he previously worked with the supercomputer Karolina in the Czech Republic and Meluxina in Luxembourg.
When comparing these supercomputers to LUMI, Tim adds: “The user experience was good for Karolina and Meluxina, and the support team was very reactive. Because they had NVIDIA GPUs similar to those on Hortense, it was straightforward to port my application there. On LUMI, you can ask for much more resources and get scheduled almost instantly, whereas smaller supercomputers like Karolina simply cannot because of the limited resources.”

Running in containers on different supercomputers
Tim Lebailly (PSI, KU Leuven): “I had some general framework where I could easily move from supercomputer to supercomputer, and I was running in containers, which was quite easy for me. But for LUMI, of course, I needed a different container because of the AMD GPUs, but it was still quite easy.”

Geert Jan Bex (Consultant/analist, EuroCC Belgium/Vlaams Supercomputing Centrum): “Containers play an increasingly important role in the deployment of HPC workloads. Since there are a couple of pitfalls when using containers on supercomputing infrastructure, compute centres such as the Vlaams Supercomputing Centrum provide training and support on this topic.”

LUMI Support: facilitating communication and exchange of experiences
The LUMI support team organised the course "Introduction to LUMI-G hardware and programming environment” to introduce all pilot users to LUMI-G, providing valuable tips and tricks.

Tim Lebailly: “This course day allowed me to learn from other pilot users and their experiences in addition to the support teams’ help. This was a valuable opportunity to gain insights and knowledge, making the experience even more helpful.”

A group chat was also established for all pilot users to make communication more convenient.
Tim Lebailly: “Initially, I was one of the pilot users, and the experience was new for everyone involved, including the support team. This made it much simpler for me to ask questions and go through the chat history rather than directly reaching out to the support team, who were likely receiving similar questions from numerous users. Overall, the group chat was incredibly helpful in facilitating communication between pilot users and the support team.”

LUMI enables running advanced models and conducting multiple experiments simultaneously
When asked why he would recommend LUMI to other users, he states:

“If you need to conduct numerous experiments simultaneously or perform a large search, LUMI is a great option. Additionally, if you need to run more advanced models that are too demanding for Tier-1, LUMI is the way to go. For instance, I plan on running certain models on LUMI that require a minimum of 128 GPUs with 40 gigabytes of memory, which is not feasible on Tier-1. However, it can be accomplished on LUMI.”

Empowering Breakthroughs in Self-Supervised Learning for Vision-Based Industries through LUMI's Capacity
Tim's research holds tremendous potential for various industries that rely on vision data, such as scene understanding for self-driving cars or robotic surgeries. Given the importance of self-supervised learning in the broader field of AI, LUMI's capacity to facilitate this research significantly empowers researchers to make breakthroughs in this field.

Relevant papers:

Paper ‘Cross-View Online Clustering for Dense Visual Representation Learning.’
Research facilitated by the LUMI supercomputer, owned by the EuroHPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium. This paper was accepted for the "Conference on Computer Vision and Pattern Recognition (CVPR)", one of the best conferences in machine learning/computer vision.
Paper ‘Adaptive Similarity Bootstrapping for Self-Distillation.’
Research facilitated by the LUMI supercomputer, owned by the EuroHPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium.

Tim Lebailly (PSI, KU Leuven)

Tim Lebailly

Tim Lebailly obtained his master’s degree in Data Science from the Swiss Federal Institute of Technology Lausanne (EPFL) in 2021. He is currently pursuing a PhD at the PSI lab (Processing Speech and Images) of the KU Leuven) under the supervision of Prof. Tinne Tuytelaars. His research interests lie mostly at the intersection of Computer Vision and Machine learning.