Thesis topics in programming languages

The programming languages group UPLANG at the Department of Information Technology is undertaking research on a number of topics / projects on or around programming languages, including:

Data-race freedom in Python and Java [Tobias Wrigstad]
Floating-point verification [Eva Darulova]
Verified compilation of Erlang [Johannes Borgström, Tjark Weber, Lars-Henrik Eriksson]
Memory safety for managed language (Java, Rust), unmanaged languages (C, C++), and infrastructure languages (Verona) [Tobias Wrigstad, Elias Castegren, Luke Cheeseman]
New concurrency/ownership models (meaning ways to write concurrent programs) [Tobias Wrigstad, Elias Castegren, Luke Cheeseman]
Programming language implementation and tooling for programming language implementation [Elias Castegren]
Type systems and gradual typing [Tobias Wrigstad, Elias Castegren]

All these projects have potential openings for excellent, highly motivated, and self-driven students to do project work for their thesis. Some of these projects are in collaboration with companies such as Microsoft, Oracle, and Ericsson.

The projects are typically supervised by a PhD student with a senior researcher serving as subject reviewer.

Ideally, we are looking for students who are interested in actively cooperating with us — not come up with a specification, run away, and eventually materialise with something. We encourage working from the department for at least part of the week so that you have a fast path to talk to us, and so that you can attend our bi-weekly seminars, other meetings relevant to the project, etc.

Necessary skillset

The projects are all relatively complicated and ambitious and involve both technical heavy lifting and some design work, statistics, or theory. We appreciate that most students have not had much (or any) experience with the specifics of the projects and we are happy to help you pick up any skills necessary. However, we do expect you to be able to program and communicate decently in written form, since there is not enough time to learn that during the project. What programming languages you master, or what kind of writing you are used to is less important.

Some information about tentatively necessary skillsets for the various projects are in order:

Projects around data-race freedom typically involve some work in C and Python, or C++ and Java. Prior knowledge about concurrency and memory models will be helpful. Projects in this space typically revolve around some implementation work in on-going projects.

How to do a thesis project with us

If you are interested in doing a thesis project with us, then please send an email to the person(s) who are listed above for the topics you are interested in. We do get a lot of email, so please bear with us if it takes some time to come back to you. Please include:

A short description of yourself and your interests, and what kinds of projects you could be interested in. Please avoid using AI-generated text for this, we want to understand who we would be working with and what you might be a good fit for.
A CV is very welcome, but you don’t need to build one from scratch if you don’t have one already.
A transcript of your grades. This is especially important if you are not a Swedish student for whom we can easily pull the grades. (Also, let’s be clear that grades certainly aren’t everything so don’t stop yourself from sending the email just because your GPA could be higher.)

Some concrete project ideas in 2025

Note that the ideas below are not exclusive, but may give an idea of the kinds of things happening in our different research projects currently.

Efficient Pattern Compilation for Trieste in Trieste

This thesis work is done in the context of Trieste, an experimental domain-specific language for tree rewriting. Trieste provides pattern-matching capabilities similar to that of functional languages, but since it is embedded in C++, data can be updated in place. Trieste also comes with built-in support for generative testing of individual parts of a rewriting toolchain, such as the passes of a compiler.

Because Trieste is based on pattern matching, performing this efficiently is paramount for getting performance out of the tool. Trieste performs some basic optimisations to speed up pattern matching, but due to the current representation of patterns there are limitations to which kind of optimisations can be performed.

The aim of this project is to change Trieste’s internal representation of patterns so that they have a concrete syntax that can be analysed and compiled efficiently. The baseline is reimplementing the existing optimisations, but the end-goal is compiling patterns into more general decision trees. Furthermore, the analysis and compilation can be done in Trieste itself, meaning Trieste will be able to compile its own pattern matching machinery.

Students interested in this project should email Elias.

Prerequisites

Experience with an imperative programming language (C++ will be useful but is not required)
Compiler Design I

Relevant links

Paper describing Trieste: https://eliasc.github.io/papers/sle24.pdf
GitHub repo of Trieste: https://github.com/microsoft/Trieste

Evaluating QuickCheck for Generative Testing of Compilers

The state-of-the-art in generative testing is QuickCheck, written in Haskell. It is mainly used for property-based testing, where expected properties of data structures are tested by generating random data and checking that the properties hold for that data. This has proved to work well for many kinds of structured data. However, compilers work on programs which have data constraints that are not visible from the syntax alone. The chance of randomly generating a program with well-scoped variables and correctly typed expressions is low, especially as the size of programs increases.

This project is about evaluating how well QuickCheck works for generating programs for compilers, but also comparing it to the generative capabilities of Trieste. We will use both tools to generate programs for different passes of several existing compilers written in Trieste and run the passes on them to measure the code coverage achieved.

Students interested in this project should email Elias.

Prerequisites

Experience with an imperative programming language (C++ will be useful but is not required)
Experience with functional programming (Haskell will be useful but is not required)
Compiler Design I

Relevant links

Paper describing Trieste: https://eliasc.github.io/papers/sle24.pdf
GitHub repo of Trieste: https://github.com/microsoft/Trieste
Wikipedia page of QuickCheck: https://en.wikipedia.org/wiki/QuickCheck

Mechanising a Course on Semantics of Programming Languages in Lean

The course Semantics of Programming Languages (1DL311) is taken by third year bachelor students in the computer science programme. In 2025, the course on logic and proofs taken by first-year bachelor students was updated to include use of the theorem prover Lean. This means that in 2027, the majority of the students taking Semantics of Programming Languages will have some experience working with theorem provers.

This project is about preparing Semantics of Programming Languages for these students by mechanising the course material in Lean. This will include the definitions and material in the course book, as well as the seminar exercises and assignments. The project will also include developing definitions and proof tactics tailored for the course material to make the transition as smooth as possible for the students.

Students interested in this project should email Elias.

Prerequisites

A course on programming language semantics
Experience with functional programming. Experience with a theorem prover is explicitly not required. You are expected to learn Lean during the project

Relevant links

Course web page for Semantics of Programming Languages: https://uppsala.instructure.com/courses/108245
Web site for the theorem prover Lean: https://lean-lang.org

Data-race free Python

We have designed an ownership model for Python to allow parallel execution while preventing data races. This model was designed in cooperation with members from Python’s core team. Now we are modifying CPython to create a working prototype to show the vision in practice. We want to pair this ownership model with Behaviour-Oriented Concurrency to allow programmers to easily write concurrent programs. In this project, we are collaborating with Azure Research (formerly Microsoft Research) and a handful of Python core developers (the former Faster CPython team).

This project has several topics which can be worth exploring as part of a thesis project, described in increasing order of “being technical” below:

We have a working version of Behaviour-Oriented Concurrency (BOC) for Python. However, since BOC is a new invention, we are interested in understanding how programmers interact with the model and what things might be confusing to them. This topic would have a technical part and a non-technical part. The technical part is learning BoC to a high degree. The non-technical part is designing a methodology for carrying out ”user tests” on programmers to investigate how they understand and use BoC and then carrying out the methodology, and analysing and writing up the results.
1. This is a good task for a person that combines an interest in programming languages with an interest in human-computer interaction. To carry out this project, we will look for a subject reviewer with skillsets outside of the UPLANG group.
We do not yet have a deep understanding of how common programming patterns in Python interact with our ideas. In particular, we want to understand what the challenges are when migrating typical Python code to use some combination of immutable objects, ownership, and BoC. This project could include the migration of existing modules to be compatible with the prototype we are creating.
1. This is a good task for someone who wants to do an implementation-driven thesis, and who is comfortable writing code in Python, but is not afraid to debug interpreter crashes in the C-layer. In the thesis specification, we will need to specify what is the Python code to be migrated to using what features that we are adding to Python and how.
Our prototype implementation (in a fork of CPython) is under active development. We are extending this code base at the level of core Python (written in C), core libraries (written in a mix of Python and C) and pure Python libraries. In this code base we are looking for help with implementing different features in our immutability design, our ownership design, or BoC. There is room for multiple students doing different parts of the implementation, writing different theses with some (likely) shared parts that are common to all projects.
1. This is a good task for someone comfortable with reading and writing C code, and debugging fairly complex code. When we collaborate on writing the thesis specification we will flesh out exactly what are the tasks to implement, and discuss how to implement them. This code base is under active development so what is on the menu changes constantly.

Students interested in this project should email Tobias.

Relevant papers

Implementing Behaviour-oriented Concurrency in Scala

The Scala programming language has a large set of very interesting features including capabilities and capture tracking that can be used to reason about data-race freedom and isolation. We are interested in developing libraries that make it possible to use behaviour-oriented concurrency in Scala programs, and seeing what useful properties we can establish in terms of data-race freedom, etc. in Scala programs.

Students interested in this project should email Tobias and/or Luke.

Benchmarking in Java on OpenJDK

Benchmarking is part of our work on developing new algorithms for garbage collection in OpenJDK (such as HCSGC, LR-Z, Monk, and mark-scavenge). To understand the performance characteristics of a new algorithm or a new optimisation of an existing implementation, we need to put the algorithm to work by running different programs under varying loads and constraints and study the results. One of the key influential benchmarks for OpenJDK is SPECjbb2015 which is a large simulation of a webstore. This benchmark gradually increases the load on the machine and studies both throughput (how many items of work could we get done in a time window) and latency (what is the smallest time period in which X% of all requests get a response). We are looking for a student who is a skilled Java programmer to help us look into modernisation of SPECjbb2015 to ensure that results obtained with the modernised benchmark are representative of modern Java applications. In particular we want to be able to change the behaviour of the benchmark to test some aspects of garbage collection that currently is not stressed.

Students interested in this project should email Tobias.

Type Systems

We want to look at topics in type classes and numeric towers in Haskell and ponder how that might be ported to a class-based imperative programming language with budding support for immutability and pattern matching. This is a thesis topic for someone theoretically inclined and is familiar with type classes and Haskell but who is not a stranger to looking at how languages like Java work, interfaces, etc.

Students interested in this project should email Tobias.

Efficient Dynamic Analysis of Floating-point Accuracy (MSc thesis topic)

The goal of this project is to investigate and empirically evaluate dynamic analysis (i.e. testing) approaches to estimate floating-point rounding errors for end-to-end applications in terms of their ability to find large error-inducing inputs as well as their efficiency. Floating-point arithmetic can be tricky for developers, because it is only an approximation of real numbers that one typically has in mind when developing numerical code. Because of this approximation, each arithmetic operation will typically incur a small error that can, however, accumulate over the course of especially larger applications. One conceptually simple way to estimate the rounding error is to run a higher-precision version of the program on the same inputs as the target program, and to compare the results – their difference is the rounding error for the given input. This approach is sometimes called shadow execution. However, since rounding errors are highly input dependent, one may need to run both programs on many inputs to get accurate error estimates. Additionally, it is typically unclear how many inputs are needed in the first place. Existing work [1, 2, 3, 4] has proposed different (heuristic) search techniques to find inputs that produce particularly large rounding errors, but these approaches have been evaluated on relatively small programs only, mostly individual functions, and that often came from libraries. This project will evaluate dynamic analysis via fuzzing of rounding errors using shadow execution on larger real-world programs. Fuzzing [5] has been successfully used to find different kinds of bugs in real-world code, but has not been used to estimate floating-point accuracy yet. Where possible, the project will compare this approach to existing work in terms of rounding error estimates (i.e. size of errors detected) and the efficiency of the approach (i.e. running time of the testing campaign).

The project will include a certain amount of coding, so programming experience in some major programming language and experience with basic bash scripting is necessary.

Efficient Generation of Error-Inducing Floating-Point Inputs via Symbolic Execution, ICSE’20
Efficient search for inputs causing high floating-point errors, PPoPP’14
A Genetic Algorithm for Detecting Significant Floating-Point Inaccuracies, ICSE’15
Efficient Global Search for Inputs Triggering High Floating-Point Inaccuracies, APSEC’17
The Fuzzing Book, https://www.fuzzingbook.org/

Students interested in this project should email Eva.

Large Scale Code Study of Formal Specification Usage on Github (MSc thesis topic)

Several programming languages provide ways to specify properties that should hold at certain points in the program, e.g. in assertion statements, or pre- and postconditions at the entry or exit of functions. These properties can be verified statically by a program verifier, or they can validate the correctness of the program at runtime.

This project will extend a recent large scale code study of projects on Github [1] to investigate the usage of such specifications in real-world code. Using the existing infrastructure, one can, for example, compare how frequently used or how complex such specifications are in different programming languages.

The project will involve researching how such specifications are expressed in different programming languages, extension of the existing framework that is written in Rust, and experimental evaluation and data analysis.

A Large-Scale Study of Floating-Point Usage in Statically Typed Languages. CoRR abs/2509.04936 (2025)

Students interested in this project should email Eva.

Type Inference for Partial Programs in Statically Typed Languages (MSc thesis topic)

Type inference allows compilers and development environments to deduce the types of variables and functions without explicit annotations. Many existing type inference techniques were developed with the purpose of inferring types for complete programs where all referenced functions and libraries are available. In practice, however, code retrieved from the web, software repositories, or written by developers inside IDEs, is often incomplete.

This project will investigate techniques for type inference on partial programs [1-4]. It will explore different trade-offs between accuracy and performance, and evaluate the scalability of such techniques on a large corpus of millions of partial programs previously mined from GitHub. Java is the recommended target language, but alternative statically typed languages can be considered. This project is suitable for students with prior programming experience in a statically typed language (e.g., Java, C, Haskell) and who are comfortable reading research papers involving theoretical concepts.

References

Enabling static analysis for partial java programs, OOPSLA’08
JCoffee: Using Compiler Feedback to Make Partial Code Snippets Compilable, ICSM’20
Inference of static semantics for incomplete C programs, POPL’18
Gradual typing with unification-based inference, DLS’08

Students interested in this project should email Andrea and Anders.

Necessary skillset

How to do a thesis project with us

Some concrete project ideas in 2025

Efficient Pattern Compilation for Trieste in Trieste

Prerequisites

Relevant links

Evaluating QuickCheck for Generative Testing of Compilers

Prerequisites

Relevant links

Mechanising a Course on Semantics of Programming Languages in Lean

Prerequisites

Relevant links

Data-race free Python

Relevant papers

Implementing Behaviour-oriented Concurrency in Scala

Benchmarking in Java on OpenJDK

Type Systems

Efficient Dynamic Analysis of Floating-point Accuracy (MSc thesis topic)

Related work

Large Scale Code Study of Formal Specification Usage on Github (MSc thesis topic)

Type Inference for Partial Programs in Statically Typed Languages (MSc thesis topic)

References