LUMI

The LUMI system

LUMI is one of the two EuroHPC pre-exascale systems currently being assembled. LUMI is a mostly GPU-accelerated system built by HPE-Cray using AMD Epyc CPUs and AMD Instinct GPUs. It has an architecture similar to the USA exascale system Frontier. It will be operated by CSC, the national supercomputer centre of Finland. Leonardo on the other hand is based on Intel CPUs and NVIDIA GPUs. It will be operated by CINECA, the Italian national supercomputer centre. A third pre-exascale system may be assembled at the Barcelona Supercomputer Center. EuroHPC is currently also installing five petascale systems. A brief overview of those systems is available on the web site of the EuroHPC Joint Undertaking.

LUMI is jointly funded by EuroHPC and a consortium of ten countries, including Belgium. It is a Cray EX system with several partitions

  • A CPU-based partition of 1536 compute nodes with 128 cores each. The peak performance of this section is just over 7.5 Pflops. General availability is planned towards the end of 2021.

  • A GPU-based partition with a peak performance of around 550 Pflops in double precision. Each node will be equipped with 4 powerful future-generation AMD GPUs and a 64-core AMD CPU. Due to the component shortage in the computer market, the precise general availability is still uncertain, but is currently expected to be in late spring or early summer 2022.

  • A small partition for data analytics and visualization to process results obtained in the main two partitions locally. These nodes will have visualisation-oriented GPUs and a fairly large memory. General availability will be shortly after general availability of the CPU-based partition.

  • A small container-based cloud-partition, mainly meant to run micro-services in service of work done on the compute partitions of the system. General availability of this partition is not yet known.

  • An early access platform based on an earlier generation of AMD GPU will be available together with the CPU-based partition to start porting codes to the AMD GPU architecture.

Servicing these partitions are three storage systems:

  • A 7 PB flash-based storage layer with extreme I/O bandwidth and high IOPS capability, with a parallel file system.

  • An 80 PB disk-based main storage system with a parallel file system (LUSTRE).

  • A 30 PB encrypted object storage system (using Ceph) for storing, sharing and staging data.

The flash and main storage system will be available from the start while the object storage system will only become available at a later date yet to be determined. Also, note that LUMI is a compute service and not a long-term data archiving or publishing service. Other European and national projects take care of those tasks.

Given the enormous investment, it is important that scientific production can start as early as possible. This implies that the system will not be fully operational from the start. Many of the services will only become available over time as the set-up of those services is very time consuming. This is why the visualisation and cloud partitions and the object storage system will not be available from day one. Similarly, work is going on on a web-based portal that will offer an elementary GUI to run visualisation software (but don’t expect a full GNOME desktop as on your Linux work station), Jupyter notebooks and R Studio, but that will also only become available at a later date.

Getting compute time on LUMI

50% of the compute time on LUMI will be distributed via EuroHPC calls. The other 50% is distributed among the participating countries according to their contributions to the consortium. Belgium as the second largest participant will get 7.4% of the total compute time on LUMI for its own allocation programs.

Both the EuroHPC and Belgian national track to get access to LUMI will have different access modes, including tracks for academic research, open industrial research, commercial access or preparatory access to prepare a larger proposal or develop exascale software.

The first call in the EuroHPC program for compute time on LUMI is expected towards the spring of 2022. A preliminary version of the access policy is available as EuroHPC JU decision 06/2021. Calls will be published on the EuroHPC web page and pages of the EU. Expect the first call in early spring 2022.

No decision has been made yet on the allocation policy for the Belgian fraction of the LUMI compute time. Calls will be published on this page.

Support

LUMI documentation is available on the LUMI support web site.

LUMI has a distributed support model.

  • Support questions for access via the Belgian fraction of the compute time, and all issues involving user account creation, compute time allocation and disk quota should be directed to the LUMI-BE help desk at lumi-be-support [ati] enccb [dota] be.

  • It is not clear yet who will provide the support for those issues for users getting access via one of the EuroHPC calls.

  • Once you have your account and compute and disk allocations, the LUMI User Support service takes over. The preferred way of contacting user support is via the LUMI support web site. However the resources of the LUMI User Support Team are limited as from the start of the project it was decided that participating countries should also offer support to their users. The LUMI User Support Team can certainly assist in installing software, but cannot install each and every very specialised package and certainly cannot be expected to debug faulty installation procedures or rewrite installation procedures that do not support the HPE-Cray Programming Environment that is at the core of the LUMI software stack. There are also experts in code porting in the team, but their role is to give advise and not to do the actual porting. Nor does the team have the resources to do debugging of applications. Though some people in the support team have a background in a particular science field, the team is also too small to have domain knowledge in all relevant fields so don’t expect knowledge about all possible algorithms available to solve a particular problem or knowledge to work with all application codes. The LUMI User Support Team will also be assisted by a few experts from HPE-Cray and AMD, and will take care of contacting those experts or a national support service for those issues that are better dealt with by those services.

LUMI User Support will function in a very different mode from many user support service at Tier-2 university HPC centres. Data privacy regulations and data confidentiality are very important on a supercomputer that will also see industrial use. This is why questions about account creation and allocations of compute time and storage for the Belgian compute fraction have to be directed to the Belgian help desk. LUMI support does not have access to that data and only sees your account name. Similarly, the members of the LUMI User Support Team cannot see the files in your account or the content of your jobs. You will have to gather and pass yourself all data that is needed to handle your support question to the support team. LUMI User Support Team members have regular user accounts with little to no extra privileges.

Training

The LUMI User Support Team will also provide trainings at different levels to our (prospective) users. Trainings will focus on the actual use of LUMI and the HPE-Cray programming environment installed on LUMI, and on HIP, OpenMP and maybe SYCL, the main programming approaches for AMD GPUs. Many of those will be organised in close collaboration with HPE-Cray and AMD. We also plan an annual hackaton to port software to LUMI with support of HPE-Cray and AMD experts. Other European programs take care of more domain-specific courses, and basic training on parallel computing is taken care of by the local courses organised by the VSC and CÉCI. We also plan a regular user meeting at the Belgian level.

Specific LUMI trainings and events organised by the LUMI support team will be announced on the LUMI web site.

More info?