The HPC group offers several hot and interesting topics for bachelor and master theses, as well as individual projects at the bachelor and master levels in the area of ​​parallel and distributed computing. Come and join our team!

Check out the Completed Theses and Student Projects for earlier work.

Below is a list of topics for theses and projects that are only an example of what you could work on in our team.
This list is not actively maintained. Therefore, interested students please contact us for further details on an existing topic, for updates on novel hot topics, or to discuss a topic of your own interest.


LB4MPI, a Modern MPI Load Balancing Library in C ++

Load imbalance across distributed memory compute nodes is a critical performance degradation factor. The goal of this work is to modernize the code of DLS4LB library into a C ++ MPI load balancing library. The library should be able to handle the distribution of computations as well as the distribution of data. Application data can be centralized, replicated, or distributed. LB4MPI library should be able to learn data distribution from the user and to adjust this distribution dynamically during execution.  


Fault Tolerance of Silent Data Corruption (SDC) in Scientific Applications

Silent data corruptions are very common in modern HPC systems. SDCs can occur due to bit flips in memory or system buses that do not directly cause a failure of the system but rather could alter the final result of the application. Replication is an established fault tolerance method. Robust dynamic load balancing (rDLB) is a robust scheduling method for parallel loops that employ replication to tolerate failures and severe perturbations in computing systems. Selective particle replication (SPR) is a method for the detection of silent data corruptions in smoothed particle hydrodynamics (SPH) simulations.The goal of this work is to combine the SPR approach with the rDLB, ie, particles (loop iterations) selected by SPR for replication, will be schedule and load balanced using the rDLB, to achieve an SDC tolerant, load-balanced, high-performance SPH simulation. 


Dynamic loop scheduling at scale

Load imbalance in scientific applications is one of the most performance degradation factors. Dynamic loop scheduling (DLS) is essential to improve the performance of applications, especially when scaling to a large number of processing elements. The goal of this work is to examine the performance of various applications with different DLS techniques while scaling (strong and weak) and assess the usefulness and effectiveness of DLS techniques at a large scale. Experiments could use native experimentation to the limit of the available HPC resources and simulations using our in-house loop scheduling simulator LoopSim


Multi-level robust scheduling 

High performance computing (HPC) systems offer multiple levels of parallelism, eg core, sockets, and nodes. In return, HPC software stack usually supports multiple levels of parallelism corresponding to the HW levels of parallelism, eg, thread and process levels. Various scheduling methods are employed at every level of hardware and software parallelism (more information is on the MLS project page ). The goal of this project is to use scheduling information from various levels of parallelism and employ it for fault tolerance.


Algorithms and Experiments for Quantum Computing (Master Thesis)

Quantum computing (QC) is radically different from the conventional computing approach. Based on quantum bits that can be zero and one at the same time, a quantum computer acts as a massively parallel device with an exponentially large number of computations taking place at the same time. This will make problems tractable that are non-tractable even for the most powerful classical supercomputers. While the physics behind QC has been explored hundred years ago, implementations are still in an early development state. But major companies as well as research funding agencies currently massively invest in this direction. In the master thesis, you will explore this fascinating field and get hands-on experience on QC simulators and early systems.


What is your name, benchmark scheduler?

There are numerous benchmarks and parallel workloads available in the HPC community. They are believed to employ very good schedulers. The documentation accompanying these workloads does not provide the details about the scheduling techniques/algorithms involved therein. During this thesis, scheduling algorithms will be identified in HPC workloads and the findings will be assessed comparatively.


Identification and analysis of the communication behavior of parallel applications

The execution of applications on parallel computing systems requires that application processes communicate during their execution. Understanding the communication behavior of parallel applications is important for optimizing their parallel execution. The communication patterns can be represented as process graphs (or networks) and/or task graphs. This work involves (1) the identification and classification of communication behavior types from various synthetic and real parallel applications and (2) the investigation of the similarity and differences between the process graphs and the task graphs of single parallel applications. To realize this work synthetic communication patterns may be developed and the communication behavior of real applications will be extracted and classified based on their execution traces.


From OTF2 traces to the SimGrid toolkit

OTF2 refers to the open trace format (version 2), a format used to store the execution traces of applications as a sequence of events. Understanding the traces helps in analyzing the behavior of the applications during execution. The goal is to develop a tool that reads OTF2 trace files as input and extract the structure of the application, execution times, and use this information to develop a simulator that simulates the application using SimGrid simulation framework programming interfaces. The developed tool will be used to automatically create inputs for simulating the execution of parallel applications by reading their execution traces.  

Efficient Task Scheduling on Heterogeneous Devices

Scheduling task-graph on heterogeneous devices CPUs and GPUs is necessary in modern computing platforms. Moreover, with the variety that HPC needs to manage (HPC, ML, and Big Data), we would like to implement efficient task scheduling on CPUs and GPUs. List scheduling is one of the algorithms used to schedule tasks with data dependencies. Tasks with dependencies will be scheduled based on the available computing resources according to their priority and platform target. We use list scheduling over multiple computing platforms and evaluate how the scheduling can be affected, altered, and migrated regarding the task computation platform requirement, optimal makespan, and incurred synchronization costs needed to balance the task to resource allocation. The objectives are: (1) Analyze the existing task scheduling algorithm with data dependencies on heterogeneous devices e.g HeteroPrioDep; (2) Evaluate the task-scheduling on heterogeneous devices regarding the computation, communication, and synchronization costs.