Improving chocolate with supercomputing

How HPC can help steer the cocoa fermentation process

Fermentation, the first step in chocolate production
Chocolate is made from the fruits of the cocoa tree. The fruits (also called pods) are handpicked in cocoa plantations situated around the equator (Central and South America, Africa, and South-East Asia).[1]

Inside the pods, you can find the cocoa beans surrounded by a sugary pulp. The seeds of the cocoa tree have an intense bitter taste and must be fermented to develop flavour. Once the pods are opened, the content is fermented spontaneously in the open air, mainly in wooden boxes or heaps. After fermentation, the cocoa beans must be dried before they can be transported to chocolate manufacturing facilities, where they are processed into chocolate.[2]

Cocoa pods Beans and pulp 2

So, fermentation is the first step in the production of chocolate: microorganisms, present naturally in the environment, ferment the carbohydrate-containing pulp. During that fermentation process, yeasts produce mainly ethanol, lactic acid bacteria produce mainly lactic acid, and acetic acid bacteria produce mainly acetic acid. During the fermentation process, which takes up to six days, also several flavour precursors are produced that eventually contribute to the chocolate taste.

Wet Beans Fermented beans

IMDO – unravelling the mystery of food fermentation processes
The research of the Research Group of Industrial Microbiology and Food Biotechnology (IMDO) of the Vrije Universiteit Brussel (VUB) deals with the qualitative and quantitative study of the microbial species diversity, community dynamics, and meta-metabolomics of fermented food ecosystems. The fundamental aims of these studies are to unravel why certain microorganisms prevail in certain ecosystems and how their competitiveness and functionality can be explained biochemically and molecularly.

Prof. Stefan Weckx (IMDO-VUB): “By gaining knowledge and expertise on food fermentation processes, we can develop new starter cultures to control and even steer the fermentation process. For example, in the case of cocoa: a farmer starts a fermentation of 500 kilograms of wet beans. If we control the fermentation process, we are sure the fermentation will be successful. We know the farmer will be able to market those well-fermented beans to the middleman so the beans can be processed into chocolate. Also, by changing the yeasts in the starter culture, we can influence the flavour of the chocolate.”

Shotgun metagenomic sequencing – breaking DNA into smaller pieces
DNA sequencing [4] is the determination of the order of the building blocks or nucleotides that make up a DNA molecule (determination of the DNA code). Next-generation sequencing (or high-throughput sequencing) allows researchers to determine all genomic DNA present in a given complex sample. This technology provides unseen possibilities to investigate microbiomes involved in food fermentation processes since it allows to reveal the microorganisms present (taxonomic classification) and the genes their genomes harbour.

For about a decade, IMDO-VUB applies shotgun metagenomic sequencing to unravel the microbial composition and elucidate the genetic potential of fermented food ecosystems. In this approach, a sample from, for example, a fermenting cocoa pulp-bean mass is taken, after which the microbial cells are separated. Total DNA extraction is performed on the microbial cells, followed by DNA purification and fragmentation into smaller pieces, which are sequenced.
Three research questions are addressed: (i) what is the microbial composition of these ecosystems; (ii) what is the adaptation to and functional role of the microorganisms in these ecosystems; and (iii) where do the microorganisms come from?

Growing datasets and databases require more computing power
Because of technological advancements, the average DNA sequence data set per sample increases in the number of sequence reads. Also, all the own datasets obtained have to be submitted into public sequence databases resulting in more extensive databases. With growing datasets and databases, the computational requirements to perform taxonomic classification of the metagenomics sequence reads are also increasing, so supercomputing is the only way to do this.

Scaling up from Tier-3 to Tier-1

Stefan Weckx: "In the early days, we were able to perform the computations on our own small cluster at the research group (Tier-3). But we needed more capacity and made the transfer to VUB-VSC’s Hydra (Tier-2). This required some knowledge of how a supercomputer works (e.g., the submission of jobs). Once the team obtained that knowledge, the step from Tier-2 to Tier-1 was easier."

IMDO-VUB was one of the first research groups aiming to run bioinformatics projects on the VSC Tier-1 supercomputing infrastructure.

Stefan Weckx: "When applying for Tier-1 for the first time, we were the new kids on the block, so to speak. We used algorithms different from what was in use on the Tier-1 at that time. When writing the project request, we had to switch our mindset to the technical aspects of a supercomputer. We reconsidered our workflow to make it more efficient and did some benchmarking. Scaling up to Tier-1 required some work but was a necessary step, taken with the great help of the VSC support teams. Now we are happy with the upscaling and very satisfied with the VSC infrastructure."

Relevant links: https://imdo.vub.ac.be

[1] Youtube video VUB

[2] The production of chocolate

[3] DNA Sequencing (Wikipedia)

Prof. dr. Stefan Weckx - IMDO-VUB

Prof. Dr. Stefan Weckx is an associate professor at the Faculty of Sciences and Bioengineering Sciences of the Vrije Universiteit Brussel (VUB). He obtained a MSc. in Biochemistry in 1996 and a PhD in Sciences in 2004, both at the University of Antwerp, Belgium. As PhD student, he stayed as a Marie-Curie training fellow at the European Bioinformatics Institute in Hinxton, Cambridge, UK. After obtaining his PhD, he was a postdoctoral fellow at the MicroArray Facility of VIB in Leuven, Belgium.
In 2006, he joined the Research Group IMDO of the VUB as postdoctoral fellow. He supervises the molecular (micro)biological research and initiated research on (meta)genomics and bioinformatics to investigate food fermentation processes and promising strains isolated from those fermentation matrices, a research line that becomes more and more prominent in the overall research approach of IMDO. He teaches courses in omics, bioinformatics, industrial biotechnology, and industrial systems biology. He is involved in policy on High-Performance Computing at the Vrije Universiteit Brussel and the Flemish Supercomputer Center.