MSc Projects

Below a selection of the current available MSc projects. More information about other MSc projects can be found in the presentation.

A LaTeX template for a MSc thesis report can downloaded here.

Current MSc Projects

[CE-PRJ-2018-04]  Optimizing Machine Learning Algorithms on Quantum Annealing Architectures

Machine Learning (ML) algorithms have shown significant progress in the past decade, enabling applications in various fields ranging from image recognition to automated driving. One of the important challenges facing ML is the high computational complexity needed to train the algorithms and to optimize the hyper-parameter space. In order to address this problem, innovative solutions such quantum annealers have been proposed by a number of leading organizations (such as Google with the Quantum AI project). These solutions essentially offload the calculation onto the physical evolution of a quantum mechanical system.

More precisely, quantum annealers are physical devices that solve (certain) NP-complete optimization problems by exploiting quantum mechanics. In order to use a quantum annealer, one first has to translate (or compile) the optimization problem into an energy minimization problem tailor-made for the annealer. The annealer is then initialized and given the time to stabilize to its ground state which corresponds to the optimal solution of the optimization problem. This approach promises to overcome classically challenging computational complexities.

This project aims to design an automated translator (or compiler) that maps optimization problems onto quantum annealers. Currently, the translation occurs in two steps and we believe both steps can be improved by considering the structural properties of the underlying optimization problem. For example, the second step relies strongly on a specific graph representation of the system, and the graph topology may potentially be better exploited. 

We will integrate the solution into the quantum computing architecture being developed in TUDeft in order to drive actual quantum annealer systems and work towards creating a demo of the capabilities of such a system to optimize ML algorithms. Due to the multidisciplinary nature of this project, we can accommodate students who are looking for both, an engineering as well as a more theoretical solution approach of this problem. Interested? Contact Zaid Al-Ars


[CE-PRJ-2018-03]  High-Performance Accelerators & Programming for Modern FPGA-based Data Centers

Big Data analytics is emerging as a vital process for many application domains such as machine learning, earth systems, and bioinformatics. At the same time, the forthcoming end of Moore’s law forces us to rethink the computer organization and programming techniques. Therefore, to meet the ever-increasing computing needs, the High-Performance Computing (HPC) industry has begun to adopt reconfigurable hardware such as Field-Programmable Gate Arrays (FPGAs) and integrate these into Data Centers. FPGAs are seen as a powerful technology to provide the application-specific extensions (i.e., hardware accelerators) leading to substantially better performance and power efficiency. According to Intel’s technology roadmap, by 2020 30% of the Data Centers will include at least one FPGA. However, FPGA programming is still an active area of research and more work is needed to provide the much needed (programming) infrastructure (e.g., compilers, libraries, etc.) to fully exploit FPGA's acceleration potential. 

Within this context, several projects are possible: 

- Domain-Level Synthesis for Reconfigurable Computing. Analyze, Design, and Implement a Temporal-to-Spatial (T2S) proof-of-concept compiler that is able to generate High-Performance accelerators automatically. This project runs in collaboration with Intel Labs, Santa Clara, California.

- Performance-constrained Minimum Area Scheduling for High-Level Synthesis. Traditionally, the compiler scheduling problem is a resource-constrained maximum performance optimization problem. However, in a cloud environment the dual problem of minimizing the resources used under performance constraints is equally important. Therefore, in this project you will develop heuristics to solve the second problem and implement them in the DWARV C-to-VHDL hardware compiler. This project is part of an existing collaboration with IBM T.J. Watson, New York.  

- High-Performance Solvers for Fast Flow Simulations. In this project, you will have to understand existing iterative solvers used in several application domains such as seismic imaging and oil reservoir simulators. Consequently, you will have to design and implement custom FPGA accelerators and evaluate the acceleration potential when comparing with state-of-the-art HPC software libraries such as PETSc. The project is run in collaboration with Big Data Accelerate B.V., a start-up company from the Computer Engineering Laboratory at TU Delft. Internships are possible.

- High-Level Synthesis for Modern Data Centers. DWARV C-to-VHDL High-Level Synthesis (HLS) tool is a research compiler developed at TU Delft. Currently, the compiler is designed to support only embedded accelerators, i.e., it generates a flexible but non-standard hardware interface. In this project, the student will have to understand the HLS steps to extend the interface generation to support modern accelerator interfaces such as IBM’s OpenCAPI and/or Intel’s CCI-P. Adjusting the scheduler according to the type of memory interface targeted is a pre-requisite.  

For further information please contact Razvan Nane ( 

[CE-PRJ-2018-02] Utilization of Quantum Computers to Optimize Machine Learning Algorithms 

Quantum computing promises to revolutionize the way we process information and to provide significant speedups in computational performance compared to conventional computers. A number of organizations, including TUDelft in collaboration with Intel, are already able to manufacture chips that can perform operations on dozens of qubits at the same time. However, the high noise levels associated with reading quantum information pose a difficult challenge to the practical utilization of these systems. In our group, we are investigating possible approaches to use the available computational power of these quantum systems to perform useful data processing despite the high noise levels. One possible field of application is related to optimizing the parameter space of machine learning algorithms using quantum computers. In this project, we will first investigate available quantum machine learning (QML) algorithms published in the literature and identify the challenges to implement them in practice. Then we will investigate the characteristics of the quantum systems developed in TUDelft and identify their suitability to implement QML algorithms. Finally, we will propose adapted QML algorithms that are better suited for practical application in the filed, possibly as a tradeoff for accuracy and/or computational capacity. Interested? Contact Zaid Al-Ars

[CE-PRJ-2018-01] High-Performance Big Data Analytics Using Hardware Acceleration

Big data frameworks such as Spark, Hadoop, etc... and applications are often implemented in a variety of languages (commonly Python, R, Scala, Java) and platforms, all using their own interpreters, virtual machines or other mechanisms to process and store data. A format that is meant to unify the way in which applications and frameworks store data in memory has been recently specified through the Apache Arrow project. This prevents serialization overhead between languages (but also nodes in a cluster), and allows better use of hardware resources (such as vectorization and caches). Some of the advantages that the framework offers are also beneficial in the case of FPGA acceleration of big data applications.

At the Computer Engineering lab, we have recently created a hardware generation framework which takes the Arrow format specification of data-structures, and generates a hardware interface based on this. To obtain the data through this interface, the user has to supply only an index of the data item, which is similar to the way the Arrow software APIs work. This makes it much easier to interface with data stored in the Arrow format, as an FPGA developer does not have to bother working with memory addresses anymore, but can now just use table indices instead. This is currently being tested thoroughly and is already operational on Amazon EC2 F1 class instances.

The generation framework currently only generates a single interface to a single column in a data table. However, to achieve maximum throughput it is more often than not required that multiple instances of processing unit are instantiated, working on multiple items in the same or multiple columns, at the same time. An automated way to generate parallel interfaces to the data is therefore desired.

In this master thesis project, the student will therefore have to: understand the Arrow format thoroughly; understand the hardware interface that is generated by our current framework from Arrow data types; specify a number of use cases where multiple units must operate in parallel. A working example application where tweets are matched to a regular expression is already given. Finally the student will have to design a method for an FPGA programmer to specify the parallel structure of the interface to the data. This will happen by extending the interface generation step to allow not just one interface to the Arrow data to be instantiated, but N interfaces, automatically. For now, we assume that all internal buses used are AXI4.

Interested? Contact Zaid Al-Ars

[CE-PRJ-2017-02] Application-Specific Processor Architecture

Multicore systems are facing increasing limitations in finding sources of thread-level parallelism in applications to effectively utilize their resources. At the same time, the increasing power consumption of multicore systems is forcing a structural reduction of their maximum achievable performance to prevent exceeding their thermal design power budget. One of the main limitations of using multicore systems is the mismatch between the resources available in the architecture (number of cores, memory size, etc.) and the requirements of the applications. This project aim to investigate the opportunities of creating reconfigurable processor architectures on FPGAs that can match the resources needed by the application, thereby ensuring efficient utilization of the available hardware. We will first identify a set of applications with various processing, memory and interconnect requirements and evaluate their architectural needs. Then we will define a number of optimal architectural templates that can process these applications efficiently. Finally, we will merge these templates into a unified reconfigurable architecture and evaluate its performance for the selected applications. Such an architecture promises to strike an optimal trade-off for performance, power-efficiency and programmability of processing systems. Interested? Contact Zaid Al-Ars

[CE-PRJ-2017-01] Multi-GPU Implementation of HPC Brain Simulations

High-performance solutions have become imperative for efficient simulation efforts conducted within Computational Neuroscience. Computational Neuroscience attempts to develop advanced in-silico models of brain functionality as a basis for extensive behavioral studies. Often such biophysically-meaningful networks are very computationally demanding. GPUs have been identified as a crucial HPC technology for use in the field. Even though single-node HPC rudimentary acceleration is relatively easy to implement (especially with GPU-MATLAB integration for example), the performance provided is not often enough for simulating large enough networks that come close to the biological systems under study. In this topic, we want to explore the possibilities that multiple GPU fabrics running simultaneously have to fill this performance requirement. Of special interest are low-latency accelerator interconnect like the NVidia NVLink technology. As a representative benchmark, we use a highly advanced, computationally demanding brain model, developed by Erasmus MC for detailed in-house behavioral studies of the Olivocerebellar system in the human cerebellum. This thesis topic is a joint venture between the Erasmus Medical Center (Neuroscience dept.) and TU Delft (Computer Engineering). Interested? Contact Zaid Al-Ars

[CE-PRJ-2016-04] FPGAs on FPGAs: Enabling FPGA Design Using Intermediate HW Logic Fabrics

At the computer engineering laboratory at TU Delft we are striving to find solutions to implement a heterogeneous cluster or cloud, which consists of nodes with CPUs, GPUs and FPGAs as computing units. This heterogeneous cluster/cloud should make effective use of the power of GPU/FPGA accelerators to solve Big Data problems. The long term vision is to map applications or parts of applications automatically to specific computing units, without the application programmer having to deal with low-level issues. When such a mechanism is available, it is expected that high-level programmers that program in, for example, Java or Scala, on top of frameworks such as Spark, will be able to unlock more of the raw computing power that is made available by the hardware. Especially for FPGAs, HLS tools can already provide some of this functionality, but they still do not solve the problem of having no binary compatibility between different FPGAs, which might be essential in distributing compiled applications in a cluster. For normal CPU's, this is often solved by using a virtual machine such as the JVM. Analogous to this mechanism, FPGAs could implement a sort of course-grain reconfigurable arrays (of basic logic blocks), which are in the context of FPGAs often called Overlay Architectures or Intermediate Fabrics. 

The thesis project will consist of the following parts / answering the following research questions. 1) Find out the current state of the art in overlay architecture / intermediate fabrics. 2) Determine a set of common use-cases within the Big Data context that could benefit from acceleration with such a fabric. 3) Implement such a fabric, hereby gaining detailed understanding of the challenges for such architectures, possibly improving on the state-of-the-art which could result in publishable work. 4) Map the use-case application onto the fabric (manual mapping is acceptable in this stage of the project). 5) Report performance measurements and give recommendations regarding the potential of the implemented platform w.r.t. the long term vision. Interested? Contact Zaid Al-Ars

[CE-PRJ-2016-03] Modelling Processor Execution Time Performance

As a technical leader, ASML would like to promote cutting edge research to find a long term, structural solution to mitigate the project risk of software not being able to meet the worst case execution time (WCET) deadline very early in this project. If this is achieved, the decision about the computing platform will be made very early in the project cycle and will save a considerable amount of software developer effort and project costs. Moreover, it will help bring certainty to project plans. The long term research objective hence is to build a model for the high-accuracy estimation of the workload and WCET performance. A toolset (CARM2G) has already been developed which can accurately predict performance for a multi-processor, multi-core execution platform. The first step in this assignment is to evaluate the CARM2G toolset meets the estimation accuracy requirements and if it is sufficient for modelling the application and the computing platform for one of the subsystem (SPM) developed by ESD. Interested? Contact Zaid Al-Ars

[CE-PRJ-2016-02] Acceleration of Big Data Algorithms for Behavioral Experiments

Tracking of mouth whiskers in many mammals is characteristic of their brain activity, similar to what finger movement is in humans. Neuroscientists can deduce a plethora of information on behavior by mounting whisker-tracking experiments, i.e. experiments where animals (typically, mice, rats) are being tracked for their whisker movements subject to various stimuli such as air puff in their eyes, auditory stimuli, and so on. The Erasmus MC developed an experimental setup which records whisker movements on head-fixed mice. Recording is done through a high-speed camera that generates large amounts of image stacks which are then sent to a computer for post-processing through a powerful yet slow Matlab program ( Current experiment runs generate 15 seconds of whisker-tracking video which occupies 2-4 GB of disk space to store and takes about 2 weeks of post-processing in Matlab. At the moment, dozens of videos are generated per week, which puts high pressure not only on the storage equipment needed but is also detrimental to the fast and efficient analysis of the behavioral experiments. The goal of the student in this thesis is to study the open-source Matlab code and port the compute- and data-intensive parts of it to a high-performance, FPGA-based computing platform (Maxeler). This not only will accommodate experiments in the lab but will also be the first, crucial step for supporting closed-loop behavioral experiments, where specific whisker movements will evoke (in real time) a suitable response by the analysis machine, leveraging a crucial class of neuroscientifically relevant experiments. Interested? Contact Zaid Al-Ars

[CE-PRJ-2016-01] Public Gateway to Massive Neuron Simulations

The modeling of neurons, especially at biological and physiological accuracy has been proven to be a very complex and computationally demanding task. Also, in many cases, command of the neuron-modeling implementation is difficult by non-experts, as is its optimal mapping on the proper computing infrastructure. Moreover, handling lengthy simulations at runtime, is an especially difficult task, given the workload complexity. As a result, it is very desirable to hide the complex details of neuron modeling under a user interface that will accept the parameters of neuron simulation and only provide the respective result. It is noteworthy that communities facing similar computational workloads do follow the paradigm of a centralized gateway to heavy simulations: For instance, the nanotechnology community features the nanohub portal. The envisioned interface is an online gateway that will accept neuron-modeling workloads and return the respective results. The gateway will function as front-end to the greater neuron-modeling infrastructure. The actual workload will be delegated to multi-/many-core or accelerator-based platforms. The student will receive the back-end implementation as legacy code and will be required to develop the online front-end. Specifications will be drawn for the functionality and parametrization of the front-end and a working prototype will be developed. The intention is – by the thesis end – that an alpha version of the gateway is publicly released. Interested? Contact Zaid Al-Ars

[CE-PRJ-2015-05] Multi-FPGA Implementation of Artificial-Cerebellum Computational Model

Over the last decade, an increasing amount of effort is being spent on constructing and, then, simulating powerful brain models that can greatly help in unraveling the mysteries of the human brain (see for instance the EU flagship project: Whereas these models are powerful and constantly come closer to the real brain functionality, however they are typically very computationally intensive, to the point that common platforms such as multicore CPUs fall short of reasonable execution times. We have thus turned to more powerful platforms, FPGAs. While FPGAs receive high marks when it comes to performance acceleration, nevertheless, their limited capacity is not sufficient for implementing large-scale brain simulations comprising (hundreds of) thousands neurons. The subject of this topic is to extend a currently implemented, biologically-accurate, simulation platform (comprising a single FPGA) to incorporate multiple FPGAs. If the inter-FPGA communication challenges are recognized and sufficiently dealt with, this extension is expected to double the achievable real-time brain simulation capabilities with every new FPGA on the stack. The platform is to be used for biophysically-meaningful simulations of Cerebellar microsections in the Neuroscience Department of the Erasmus MC, Rotterdam. The student is expected to analyze the original single-FPGA neural models, identify latency-sensitive sections and potential optimizations and, then, deploy (through use of suitable EDA tools, e.g. Compaan) the original application onto a multi-FPGA arrangement. Interested? Contact Zaid Al-Ars

[CE-PRJ-2015-04] Proposing a computational reference platform for genomics analysis based on the IBM Power 8

In this project, the student is expected to perform a detailed profiling of widely used genomics applications, identify commonalities, and analyze computational bottlenecks of these applications. The student would then optimize these applications to a cutting-edge IBM Power 8 system and construct a hardware cost function for optimal system utilization, including processor, memory, as well as I/O. Based on this information, the student would provide recommendations to modify system architecture and create an optimal reference platform for genomics applications. Interested? Contact Zaid Al-Ars

[CE-PRJ-2015-03] Design of big data model to predict possible disease onset based on biometric measurements data

In this project, the student is presented with a large set of human health data to be analyzed, such as blood tests, heart rate measurements, x-ray images, etc. This information is treated as an unstructured big data problem with that could be scanned for possible data correlations. The student is expected to create automated analysis methods, such as machine learning or genetic algorithms, with specific objective functions that should be able to predict the onset of possible diseases in the data set. Interested? Contact Zaid Al-Ars

[CE-PRJ-2015-02] Acceleration of cancer diagnosis algorithms on super computing FPGA platforms

In this project, the student is expected to analyze computationally expensive cancer diagnostics algorithms and identify time consuming functions suitable for acceleration on FPGA fabrics. The student would then design and implement hardware kernels of these functions and show their computational effectiveness. Furthermore, a system level implementation should be created with the kernels embedded as part of the algorithms to demonstrate practical viability of the FPGA implementation. Interested? Contact Zaid Al-Ars

 [CE-PRJ-2015-01] Evaluation of the accuracy and efficiency of DNA assembly algorithms

This project aims at identifying an optimal cost-accuracy trade-off for different DNA sequencing and assembly strategies. Various possible assembly strategies will be investigate including short and long read sequencing from the Illumina and PacBio platforms. In this project, we will collaborate with the Broad Institute of MIT and Harvard and the US Joint Genome Institute on their Illumina as well as PacBio data to evaluate the accuracy and efficiency of assembly pipelines using completely finished genomes such as E. coli, M. tuberculosis and S. cerevisiae. Interested? Contact Zaid Al-Ars

CE Tweets