Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Control-Theoretic Analysis of Deep State Space Models and Transformers
This project explores the control-theoretic foundations of deep state space models (SSMs) and deep attention-based models, focusing on specific properties of their dynamics and training behavior. By bridging insights from control theory and deep learning, the project aims to generate insights that could pave the way for next-generation large language models (LLMs).
Keywords: Control Theory, Deep State Space Models, Mamba, Transformer
In recent years, deep state space models (SSMs), such as Mamba [1], have gained significant attention due to their ability to achieve subquadratic inference complexity compared to traditional attention-based architectures like Transformers [2]. This computational advantage positions SSMs as promising candidates for next-generation foundation models like ChatGPT. Since SSMs are rooted in linear systems and control theory, they can be readily analyzed from a control theoretic perspective [3,4]. Interestingly, attention mechanisms in Transformers can be reformulated as state-space representations, facilitating direct comparisons with SSMs [4]. This research project offers the opportunity to investigate the potential of SSMs or linearized Transformer variants to replace Transformers as the backbone of second-generation foundation models.
This project aims to investigate the control-theoretic properties of trained SSMs and Transformers, e.g. the stability properteis of the dynamics matrix. The scope includes analyzing the training dynamics of these models, from initialization to convergence, by studying their control theoretic properties at standard initialization and after training on benchmark tasks.
[1] Gu and Dao, "Mamba: Linear-Time Sequence Modeling with Selective State Spaces", 2023, https://arxiv.org/abs/2312.00752
[2] Vaswani et al., "Attention Is All You Need", 2017, https://arxiv.org/abs/1706.03762
[3] Amo Alonso et al., "State Space Models as Foundation Models: A Control Theoretic Overview", 2024, https://arxiv.org/abs/2403.16899
[4] Sieber et al., "Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks", 2024, https://arxiv.org/abs/2405.15731
**Requirements:**
Knowledge of linear systems and control theory; basic knowledge of deep learning models (e.g. SSMs, Transformer, RNNs); experience with Python; ideally experience with deep learning tools like PyTorch or JAX.
In recent years, deep state space models (SSMs), such as Mamba [1], have gained significant attention due to their ability to achieve subquadratic inference complexity compared to traditional attention-based architectures like Transformers [2]. This computational advantage positions SSMs as promising candidates for next-generation foundation models like ChatGPT. Since SSMs are rooted in linear systems and control theory, they can be readily analyzed from a control theoretic perspective [3,4]. Interestingly, attention mechanisms in Transformers can be reformulated as state-space representations, facilitating direct comparisons with SSMs [4]. This research project offers the opportunity to investigate the potential of SSMs or linearized Transformer variants to replace Transformers as the backbone of second-generation foundation models.
This project aims to investigate the control-theoretic properties of trained SSMs and Transformers, e.g. the stability properteis of the dynamics matrix. The scope includes analyzing the training dynamics of these models, from initialization to convergence, by studying their control theoretic properties at standard initialization and after training on benchmark tasks.
[1] Gu and Dao, "Mamba: Linear-Time Sequence Modeling with Selective State Spaces", 2023, https://arxiv.org/abs/2312.00752
[2] Vaswani et al., "Attention Is All You Need", 2017, https://arxiv.org/abs/1706.03762
[3] Amo Alonso et al., "State Space Models as Foundation Models: A Control Theoretic Overview", 2024, https://arxiv.org/abs/2403.16899
[4] Sieber et al., "Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks", 2024, https://arxiv.org/abs/2405.15731
**Requirements:**
Knowledge of linear systems and control theory; basic knowledge of deep learning models (e.g. SSMs, Transformer, RNNs); experience with Python; ideally experience with deep learning tools like PyTorch or JAX.
1. Analysis of trained SSMs or Transformers with respect to specific control theoretic properties either analytically or empirically.
2. Investigation of training dynamics of SSMs or Transformers, focusing on properties at initialization and after training.
3. Translation of analytical or empirical findings into tangible insights, including recommondations for improved robustness or performance of these models.
Note: Not all of these goals have to be accomplished within one project, but might be split among students, projects, etc.
1. Analysis of trained SSMs or Transformers with respect to specific control theoretic properties either analytically or empirically. 2. Investigation of training dynamics of SSMs or Transformers, focusing on properties at initialization and after training. 3. Translation of analytical or empirical findings into tangible insights, including recommondations for improved robustness or performance of these models.
Note: Not all of these goals have to be accomplished within one project, but might be split among students, projects, etc.