Site icon ECMI

Trondheim. Traveling as a PhD student in the REMODEL Staff Exchange – Horizon Europe project

Håkon Noren Mhyr, PhD student at NTNU

Arriving at the Cambridge train station on January 7th, I was ready for a one-month secondment as part of the REMODEL Staff Exchange project. This project focuses on the mathematical aspects of deep learning algorithms and their applications, a research topic of key relevance for my PhD work. Cambridge, with its lengthy academic history and numerous beautiful colleges, is an ideal location for a research stay abroad. My host for this visit was Professor Carola-Bibiane Schönlieb, leader of the Cambridge Image Analysis (CIA) Group.

 Understanding Large Language Models

Transformers have revolutionized AI, forming the foundation of large language models (LLMs) like ChatGPT. These models rely on self-attention mechanisms, enabling them to process and generate text with remarkable fluency and coherence. However, despite their success, transformers still present several challenges, including unstable training dynamics, poor interpretability, and significant computational costs. A key issue is the way information propagates through transformer layers. Since each layer refines the input embeddings iteratively, this process resembles a dynamical system where data points evolve over multiple steps. However, as the network depth increases, training can become unstable due to vanishing or exploding gradients. This instability can lead to unpredictable behavior in LLMs, causing degraded performance. My research aims to address these issues by viewing transformers through the lens of dynamical systems, providing new insights into their training behavior. 

Transformers as Dynamical Systems

By interpreting transformer layers as discretized ordinary differential equations (ODEs), we can analyze their stability and optimize their training. This mathematical perspective offers several advantages: 

 At DAMTP, these ideas fit naturally into ongoing research efforts. Discussions with other researchers in applied mathematics and optimization have already led to new insight on how to continue this research.

Research and Collaboration in Cambridge 

Beyond my own project, being in Cambridge has provided an opportunity to engage with a wider research community. I have attended seminars covering diverse topics such as developing new methods for analyzing blood samples, recovering sheet music from ancient books using image analysis, and what happens when large language models are repeatedly trained on generated content.

Exit mobile version