Theses | High Performance Computing Group

BACHELOR, MASTER THESES AND MASTER PROJECTS (Selection)

Excited about parallel and distributed computing? So are we.

The HPC group offers a dynamic environment for Bachelor’s and Master’s theses, as well as individual (master) student projects. Whether you’re interested in cutting-edge research topics or want to shape your own idea/passion for performance into a project/thesis, we are here to support you.

Explore our Completed Theses and Student Projects to see the type of impactful and interesting work students have done in the past in collaboration with our group in high performance parallel and distributed computing.

As a starting point, below is a sample list of thesis and project topics. This list is not updated regularly; if something catches your eye, or if you have your own idea in mind, reach out to us. We’re happy to discuss current topics and opportunities.

LB4MPI, a Modern MPI Load Balancing Library in C ++

Load imbalance across distributed memory compute nodes is a critical performance degradation factor. The goal of this work is to modernize the code of DLS4LB library into a C ++ MPI load balancing library. The library should be able to handle the distribution of computations as well as the distribution of data. Application data can be centralized, replicated, or distributed. LB4MPI library should be able to learn data distribution from the user and to adjust this distribution dynamically during execution.

Fault Tolerance of Silent Data Corruption (SDC) in Scientific Applications

Silent data corruptions are very common in modern HPC systems. SDCs can occur due to bit flips in memory or system buses that do not directly cause a failure of the system but rather could alter the final result of the application. Replication is an established fault tolerance method. Robust dynamic load balancing (rDLB) is a robust scheduling method for parallel loops that employ replication to tolerate failures and severe perturbations in computing systems. Selective particle replication (SPR) is a method for the detection of silent data corruptions in smoothed particle hydrodynamics (SPH) simulations.The goal of this work is to combine the SPR approach with the rDLB, ie, particles (loop iterations) selected by SPR for replication, will be schedule and load balanced using the rDLB, to achieve an SDC tolerant, load-balanced, high-performance SPH simulation.

Dynamic loop scheduling at scale

Load imbalance in scientific applications is one of the most performance degradation factors. Dynamic loop scheduling (DLS) is essential to improve the performance of applications, especially when scaling to a large number of processing elements. The goal of this work is to examine the performance of various applications with different DLS techniques while scaling (strong and weak) and assess the usefulness and effectiveness of DLS techniques at a large scale. Experiments could use native experimentation to the limit of the available HPC resources and simulations using our in-house loop scheduling simulator LoopSim.

Multi-level robust scheduling

High performance computing (HPC) systems offer multiple levels of parallelism, eg core, sockets, and nodes. In return, HPC software stack usually supports multiple levels of parallelism corresponding to the HW levels of parallelism, eg, thread and process levels. Various scheduling methods are employed at every level of hardware and software parallelism (more information is on the MLS project page ). The goal of this project is to use scheduling information from various levels of parallelism and employ it for fault tolerance.

Algorithms and Experiments for Quantum Computing (Master Thesis)

Quantum computing (QC) is radically different from the conventional computing approach. Based on quantum bits that can be zero and one at the same time, a quantum computer acts as a massively parallel device with an exponentially large number of computations taking place at the same time. This will make problems tractable that are non-tractable even for the most powerful classical supercomputers. While the physics behind QC has been explored hundred years ago, implementations are still in an early development state. But major companies as well as research funding agencies currently massively invest in this direction. In the master thesis, you will explore this fascinating field and get hands-on experience on QC simulators and early systems.

What is your name, benchmark scheduler?

There are numerous benchmarks and parallel workloads available in the HPC community. They are believed to employ very good schedulers. The documentation accompanying these workloads does not provide the details about the scheduling techniques/algorithms involved therein. During this thesis, scheduling algorithms will be identified in HPC workloads and the findings will be assessed comparatively.

Identification and analysis of the communication behavior of parallel applications

The execution of applications on parallel computing systems requires that application processes communicate during their execution. Understanding the communication behavior of parallel applications is important for optimizing their parallel execution. The communication patterns can be represented as process graphs (or networks) and/or task graphs. This work involves (1) the identification and classification of communication behavior types from various synthetic and real parallel applications and (2) the investigation of the similarity and differences between the process graphs and the task graphs of single parallel applications. To realize this work synthetic communication patterns may be developed and the communication behavior of real applications will be extracted and classified based on their execution traces.

From OTF2 traces to the SimGrid toolkit

OTF2 refers to the open trace format (version 2), a format used to store the execution traces of applications as a sequence of events. Understanding the traces helps in analyzing the behavior of the applications during execution. The goal is to develop a tool that reads OTF2 trace files as input and extract the structure of the application, execution times, and use this information to develop a simulator that simulates the application using SimGrid simulation framework programming interfaces. The developed tool will be used to automatically create inputs for simulating the execution of parallel applications by reading their execution traces.

Efficient Task Scheduling on Heterogeneous Devices

Scheduling task-graph on heterogeneous devices CPUs and GPUs is necessary in modern computing platforms. Moreover, with the variety that HPC needs to manage (HPC, ML, and Big Data), we would like to implement efficient task scheduling on CPUs and GPUs. List scheduling is one of the algorithms used to schedule tasks with data dependencies. Tasks with dependencies will be scheduled based on the available computing resources according to their priority and platform target. We use list scheduling over multiple computing platforms and evaluate how the scheduling can be affected, altered, and migrated regarding the task computation platform requirement, optimal makespan, and incurred synchronization costs needed to balance the task to resource allocation. The objectives are: (1) Analyze the existing task scheduling algorithm with data dependencies on heterogeneous devices e.g HeteroPrioDep; (2) Evaluate the task-scheduling on heterogeneous devices regarding the computation, communication, and synchronization costs.