User Story: Improving Qualitative Research Transcription with Whisper and HPC
In the world of qualitative research, interviews are a goldmine of insight — but only if they can be fully and accurately transcribed. For many researchers, the process of turning hours of recordings into text is an exhausting bottleneck: it can take five to ten hours of work to transcribe just one hour of audio. Even then, accuracy is far from guaranteed.
Existing tools often fail to handle multiple speakers, require tedious voice training, or produce transcripts that demand extensive manual correction. Worse still, for projects dealing with sensitive or personal data, the use of commercial online transcription services is ruled out entirely due to confidentiality and GDPR constraints.
"In social sciences, the content of an interview is deeply personal," says Jonathan Dedonder, Research Logistician at UCLouvain’s IACCHOS Institute. "We simply cannot risk uploading this kind of data to external servers. We needed a secure, efficient, high-quality solution we could trust."
From Personal Experiment to HPC-Driven Solution
When OpenAI released Whisper, a state-of-the-art speech recognition model, Jonathan decided to experiment. Running it locally on his own machine, he quickly saw its potential — but also hit its limits. Larger, more accurate models were too heavy for his machine, and processing was slow.
Turning to the Centre de Calcul Intensif et de Stockage de Masse (CISM) — the high-performance computing and mass storage centre of UCLouvain, and a key member of CÉCI (the Consortium des Équipements de Calcul Intensif, the HPC consortium of French-speaking universities in Belgium) — he found the missing piece. The CISM team deployed Whisper on dedicated GPU computing resources available at UCLouvain, ensuring fast processing without compromising data security.
"Once we moved to CISM’s infrastructure, the difference was night and day," Jonathan recalls. "What used to take me hours could now be done in minutes, and without ever leaving a secure local environment."
From UCLouvain to the Whole CÉCI Network
Initially available only to UCLouvain researchers through the CISM, the Whisper service proved so effective that it inspired a wider rollout. Today, all CÉCI users — across the French-speaking universities of Belgium — can access Whisper through Lyra, the CÉCI cluster dedicated to GPU-based workloads.
This makes the secure, high-quality transcription solution available to a much broader research community, without compromising on confidentiality, speed, or accuracy.
A Transformation in Research Workflow
The impact was immediate and profound. Transcriptions that once demanded an entire working day now take five to ten minutes per hour of audio. The accuracy of Whisper’s output drastically reduces the need for corrections, allowing researchers to move quickly from raw recordings to data analysis.
Equally important is the security of the process: data never leaves the controlled environment of CISM and CÉCI infrastructure, ensuring full compliance with GDPR and research ethics standards.
While speaker diarization still has room for improvement, the combined solution has already revolutionised transcription practices in the social sciences and humanities. CECI and CISM also offers regular training sessions to help researchers take advantage of the tool.
HPC as an Enabler for the Humanities
High-Performance Computing (HPC) is often associated with physics, climate modelling, or genomics — but the Whisper project shows its transformative potential in social sciences and humanities.
"HPC doesn’t just make things faster," Jonathan points out. "It makes entirely new workflows possible. Without CISM, CÉCI, to provide GPU cluster, we couldn’t run these large models securely and at scale."
With this capability, qualitative research teams can now handle larger datasets, process them more efficiently, and open new possibilities for collaborative, ethical, and reproducible research.
Looking Ahead
Whisper on HPC infrastructure has already proven itself as a cornerstone tool bridging AI and humanities. As models evolve and diarization improves, the solution will become even more precise, further reducing manual intervention.
For the researchers who once spent days hunched over headphones and keyboards, this is more than just a productivity boost — it’s a fundamental shift in how qualitative research is conducted. By bringing together AI innovation, local high-performance infrastructure, and the collaborative strength of CÉCI, Belgian universities have turned a long-standing obstacle into an opportunity.
Jonathan Dedonder

Jonathan Dedonder is Research Logistician at UCLouvain’s IACCHOS Institute. He supports researchers in collecting, processing, analysing, storing and protecting data—qualitative and quantitative—so they can spend less time on technical tasks. A PhD in psychology, he has worked at UCLouvain since 2007. He delivers training with SMCS on qualitative data analysis and tools, and with CISM on Whisper transcription. He is also part of the Wallonia-Brussels Federation’s Data Ambassadors network for research data awareness.