Trondheim. Traveling as a PhD student in the REMODEL Staff Exchange – Horizon Europe project

February 6, 2025 celledoni ECMI Nodes, Featured, Mathematics in Technology at NTNU, NTNU, Trondheim, Trondheim

**Håkon Noren Mhyr, PhD student at NTNU**

Arriving at the Cambridge train station on January 7th, I was ready for a one-month secondment as part of the REMODEL Staff Exchange project. This project focuses on the mathematical aspects of deep learning algorithms and their applications, a research topic of key relevance for my PhD work. Cambridge, with its lengthy academic history and numerous beautiful colleges, is an ideal location for a research stay abroad. My host for this visit was Professor Carola-Bibiane Schönlieb, leader of the Cambridge Image Analysis (CIA) Group.

Understanding Large Language Models

Transformers have revolutionized AI, forming the foundation of large language models (LLMs) like ChatGPT. These models rely on self-attention mechanisms, enabling them to process and generate text with remarkable fluency and coherence. However, despite their success, transformers still present several challenges, including unstable training dynamics, poor interpretability, and significant computational costs. A key issue is the way information propagates through transformer layers. Since each layer refines the input embeddings iteratively, this process resembles a dynamical system where data points evolve over multiple steps. However, as the network depth increases, training can become unstable due to vanishing or exploding gradients. This instability can lead to unpredictable behavior in LLMs, causing degraded performance. My research aims to address these issues by viewing transformers through the lens of dynamical systems, providing new insights into their training behavior.

Transformers as Dynamical Systems

By interpreting transformer layers as discretized ordinary differential equations (ODEs), we can analyze their stability and optimize their training. This mathematical perspective offers several advantages:

Preventing Gradient Instabilities: Residual connections in transformers can be understood as discretized time steps in an ODE. By carefully designing these steps, we can prevent gradients from vanishing or exploding, ensuring stable training even in deep networks.

Controlling Model Behavior: Transformers can be formulated using energy-based models, where attention and feed-forward layers correspond to energy functions. This structure allows us to impose constraints on how models generate outputs, reducing issues like biased or harmful content in LLMs.

Enhancing Interpretability: The dynamical systems approach provides a framework to understand how information flows through a network. By identifying conserved quantities or invariant structures, we can gain a deeper understanding of why certain LLMs perform well and how to improve them.

At DAMTP, these ideas fit naturally into ongoing research efforts. Discussions with other researchers in applied mathematics and optimization have already led to new insight on how to continue this research.

Research and Collaboration in Cambridge

Beyond my own project, being in Cambridge has provided an opportunity to engage with a wider research community. I have attended seminars covering diverse topics such as developing new methods for analyzing blood samples, recovering sheet music from ancient books using image analysis, and what happens when large language models are repeatedly trained on generated content.

Trondheim. Traveling as a PhD student in the REMODEL Staff Exchange – Horizon Europe project

Like this:

Upcoming Events

ESGI 197 – Danish Study Group with Industry

ESGI 194 — The Berlin Study Group with Industry

Midnight Sun Summit in Mathematics and Engineering

Commemorative day: Mario Primicerio, Man of Science and Man of Peace

ECMI 2026

FedCSIS 2026

Like this:

Discover more from ECMI