Interview Series Talks with em. prof. dr. Dirk Roose - Part 1

Prof. Roose, for readers who are not yet familiar with you, could you briefly introduce yourself and describe your work?

I am Dirk Roose. I studied engineering at the Dept. Computer Science of KU Leuven, with a major in applied mathematics, now called mathematical engineering.

In 1985, when I finished my PhD, I wanted to change my research subject. At that time, the numerical community was quite interested in parallel computing and parallel algorithms. There were experiments with the first MIMD (Multiple Instruction, Multiple Data) machines. For example, the team of Geoffrey Fox at Caltech (Pasadena, CA) developed an experimental machine, the `Cosmic Cube’ [1], using a hypercube interconnect with the help of Intel.
One of our former students was involved in that project, giving me the opportunity for a research stay in the team at CalTech.

Intel picked up the idea of Caltech and developed some prototype machines, the iPSC [2] (‘Intel personal supercomputer’) series, also using a hypercube interconnect.

After my stay at Caltech, I travelled North to the headquarters of Intel, where I made the preliminary steps to buy an Intel iPSC/2 computer. An iPSC/2 consisted of up to 128 386/387 processors interconnected as a cluster with distributed memory. What we now have in a classical cluster was already available in this prototype machine. The IPSC/2 was not really a supercomputer because it was a small system with simple processors.

With research funds from KU Leuven, we could buy a 16-processor IPSC/2, the first one to be installed in Europe; it even required a special export licence. It was a very interesting system to experiment with and to feel the reality of parallel computing.

At that time, when people thought about parallel algorithms, they often considered very fine-grained algorithms. But during the whole history of parallel computing, the time to start up a message in distributed memory systems was and is much longer than the time to do a floating point operation. So, fine-grained algorithms don’t work in practice. To achieve high performance, you need a coarse-grained approach.

iPSC2 Outside and Inside — **iPSC2 at KULeuven**

During a research stay in Germany in 1988, I worked at GMD (now part of Fraunhofer Institute) on HPC. GMD was working on the Suprenum [3] project to build a German supercomputer. The supercomputer was not a success, but they did interesting research on HPC. GMD also had a Connection Machine CM- 2 [4], quite exciting to program since it was a SIMD (Single Instruction Multiple Data) machine.

Was this Connection Machine mainly targeted at AI applications?
It was mainly targeted at AI, indeed. The idea was to mimic a neural network by using many 1-bit processors. In the eighties, it was too expensive to build in a lot of logic, so it was a SIMD machine: all processors operated in lockstep. It was really a challenge to program the CM. It was a very interesting machine, but in the end mainly used for numerical simulation by coupling 32 of these 1-bit processors with a floating point processor. At that time, there were no powerful AI algorithms. Both the knowledge about AI and algorithms were lacking to use such a machine effectively.

In the 1980s, several companies built their own processors for parallel systems, for example nCube [5]. In Europe, we had the Transputer initiative [6]. Even at that time some people and agencies said ‘we should build a European supercomputer’. A number of companies, Meiko Scientific in the UK [7], Parsytec [8] in Germany and some others, built clusters based on the transputer, supported by money from the EU.

A system built around a special processor, like nCube or Transputer, could give good performance. Still, it was difficult to build a new generation of these specially dedicated processors in time to remain competitive with the mainstream processors. Thus, the HPC world took the other route, namely using commodity processors and communication systems to build parallel systems.

Having these prototype machines (iPSC/2, Meiko) in our research group was very important for our research.
The large research project on parallel numerical algorithms, started after my PhD, focused on CFD (computational fluid dynamics) and load balancing. We were involved in two European projects, and we developed tools for load balancing numerical simulations that were later used in a Japanese research project.

Was Kurt Lust (currently LUMI User Support Specialist at UAntwerpen) at that point in your research group?

Yes, when Kurt started his PhD, he worked on a European project funded by ESA on CFD (computational fluid dynamics) with the Von Karman Institute (VKI) and a German group. Our team mainly worked on the computer science aspects like load balancing.

But after a while, we were not catching up with the new machines. We could not afford it, and Flanders and Belgium had no supercomputing facility. Our research on HPC stopped after about ten years because we had no up-to-date equipment because there were no supercomputers in Flanders, while other groups had powerful machines. So, we couldn’t contribute any more to HPC, and I shifted back to my original research topic of numerical algorithms for non-linear dynamical systems.

But in 2010, we became again active in HPC due to the Exascience project with teams from the other Flemish universities and imec, funded by Intel and IWT (now VLAIO).

What were the challenges when you started, and how did you address these?

In the eighties, supercomputing was rare; few researchers had access to supercomputers, but in the nineties, in most countries, supercomputer centres were set up, some at a university but mainly at a regional level. For example, in Germany several regions had a supercomputer centre, jointly financed by universities and industries. In Belgium, we tried, but we were not able to set up a centre. We were really lagging behind compared to other countries.

In the 1990s, I was involved in discussions at a national level to buy a supercomputer and to make access to supercomputers also available in Belgium.
Still, it never materialised because we are such a divided country.

We were joking that the only place for a Belgian supercomputer was in Ukkel, at federal institutes like KMI. So, for a long time, there was no supercomputer available in Belgium.

Now and then, one of the universities bought a machine that could be considered a supercomputer, but new investments did not follow this up. So sometimes we had a decent machine, but after some years, it was outdated and not replaced.

What are the current challenges?

Students do not learn early enough about parallel programming. In some computer science courses, parallel programming or concurrent programming is taught, but, in my opinion, not early enough.

I hope that when students think about parallelism in algorithms and use it early enough in their education, it will be easier for them to develop efficient parallel algorithms.

Also, in many cases we are still basically hand-coding parallelism via MPI and/or OpenMP. Frameworks like Hadoop and Spark are still limited with respect to acceptance and possibilities. Not all problems can be cast in a map-reduce formalism.

In my course on parallel computing, where I taught the basics of OpenMP and MPI, students in computer science often say: “This is so primitive”. One must think about so many details making parallel programming in OpenMP and MPI, and especially its combination, difficult.

Quite long time ago, there was a lot of research in computer science on automatic parallelisation of Fortran and C programs, but these efforts were not very successful. Of course, OpenMP does a kind of automatic parallelisation, but you still have to give detailed instructions to the compiler.
You can avoid many of the technical details by using a library. For numerical simulation, very good libraries exist where the most common algorithms are implemented and can run in parallel. That makes life easy for programmers, but often they have to do the hard work of using MPI and/or OpenMP.

How long have you been involved in HPC? Why has this domain continued to fascinate you?

I have been involved in HPC since 1986. One of the fascinating aspects is this tremendous growth of performance, not only floating point speed but also memory and communication performance. There are always applications that can use all these resources: researchers and industry always ask for more resources.

Another fascinating aspect for me is the cyclic evolution of HPC: some things come back in another form. For example, the original idea of vector processing in the seventies and the eighties was used in some limited way in commodity processors.

Also, shared versus distributed memory: the first supercomputers were shared memory systems with a small number of vector processors (e.g. Cray [9]). Afterwards, clusters with distributed memory became the norm, starting with these hypercube systems that we already talked about. Now we have up to 128 cores per node, so we have again shared memory. This also means that some lessons learned in the past about parallel programming and algorithms can be reused in some form.

For my own research: I’m an applied mathematician, but I also like to work on concrete aspects. Considering the hardware architecture to develop numerical algorithms is a challenge, but this makes it fascinating and fun for me. I like this mix of technology and conceptual thinking.

[1] http://calteches.library.caltech.edu/3419/1/Cubism.pdf

[2] https://en.wikipedia.org/wiki/Intel_iPSC

[3] https://de.wikipedia.org/wiki/SUPRENUM

[4] https://en.wikipedia.org/wiki/Connection_Machine

[5] https://en.wikipedia.org/wiki/NCUBE

[6] https://en.wikipedia.org/wiki/Transputer

[7] https://en.wikipedia.org/wiki/Meiko_Scientific

[8] https://en.wikipedia.org/wiki/Parsytec

[9] https://www.hpe.com/us/en/compute/hpc/cray.html

About em. prof. dr. Dirk Roose

em. Prof. dr. Dirk Roose is emeritus professor at KU Leuven, Department of Computer Science. He chaired the KU Leuven HPC Steering Committee, played an important role in forming the VSC and chaired the user committee of the VSC. Much of his research dealt with numerical methods and software tools for HPC.

He also worked on the simulation of nonlinear systems, multi-scale modelling in materials science, and discrete optimisation. He taught courses on numerical mathematics, algorithms for HPC, nonlinear systems, genetic and evolutionary algorithms, among others.

The Master in Mathematical Engineering/Mathematical Engineering at KU Leuven was established on his initiative.
Among others, he is editor of the Lecture Notes in Computational Science and Engineering book series (Springer).