Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
End-to-End Compositional Language Modeling
We are looking for a motivated student to work on the master project of end-to-end compositional language modeling. Our aim is to understand the underlying dynamics of language generation as well as the characteristics in long sequence distances.
Keywords: Natural Language Processing, Language Generation, Language Modeling, Machine Learning, Compositional Sampling Strategy, Reinforcement Learning
Sequences contain informative characteristics that encode the underlying generation dynamics. Recent studies show that birdsong vocalization and human speech share lots of commonalities not only in the sequence generation dynamics, but also in pairwise distance uncertainty development. Researchers found that the mutual information decay of pairwise elements follows the exponential law associated with Markovian sampling processes, while for long sequence distances, the power-law dominates the uncertainty development, which is consistent with hierarchical sampling processes. This reveals the fact that vocal elements in long sequences of both birdsong and human speech are governed by similar processes. We, therefore, wonder if the same characteristics also maintain in the natural language generation. Particularly, we are interested in 1) how would the uncertainty in sequences of natural language develop in a long-range 2) how different compositions of sampling strategies would affect the overall language generation 3) how to train the pipeline in an end-to-end manner and how to evaluate the generation results.
Sequences contain informative characteristics that encode the underlying generation dynamics. Recent studies show that birdsong vocalization and human speech share lots of commonalities not only in the sequence generation dynamics, but also in pairwise distance uncertainty development. Researchers found that the mutual information decay of pairwise elements follows the exponential law associated with Markovian sampling processes, while for long sequence distances, the power-law dominates the uncertainty development, which is consistent with hierarchical sampling processes. This reveals the fact that vocal elements in long sequences of both birdsong and human speech are governed by similar processes. We, therefore, wonder if the same characteristics also maintain in the natural language generation. Particularly, we are interested in 1) how would the uncertainty in sequences of natural language develop in a long-range 2) how different compositions of sampling strategies would affect the overall language generation 3) how to train the pipeline in an end-to-end manner and how to evaluate the generation results.
For this master project, you will try out the state-of-the-art text generation models and strengthen your understanding of compositional sampling strategies. You will also learn to do explainable research and evaluate the outcome from multiple perspectives. Most importantly, you will try to decipher the underlying mechanism of computational language generation from another angle, which could inspire the interdisciplinary research of neuroscience and machine learning.
Minimum requirement of this master project:
- Knowledge in machine learning and programming with Python. Knowledge in natural language processing is a plus.
- Interests in computational methods in deep learning and reinforcement learning
- Experience in data analytics and performance evaluation
What we offer:
- Computational resources as well as a personal working place at our institution
- Close supervision and discussion
- Chance to publish research outcomes
For this master project, you will try out the state-of-the-art text generation models and strengthen your understanding of compositional sampling strategies. You will also learn to do explainable research and evaluate the outcome from multiple perspectives. Most importantly, you will try to decipher the underlying mechanism of computational language generation from another angle, which could inspire the interdisciplinary research of neuroscience and machine learning.
Minimum requirement of this master project:
- Knowledge in machine learning and programming with Python. Knowledge in natural language processing is a plus. - Interests in computational methods in deep learning and reinforcement learning - Experience in data analytics and performance evaluation
What we offer:
- Computational resources as well as a personal working place at our institution - Close supervision and discussion - Chance to publish research outcomes
This master project will be supervised by Prof.Richard Hahnloser at the Institute of Neuroinformatics of ETH Zurich. If you are interested in doing this project with us, please send your CV together with your latest transcript to yingqiang.gao@ini.ethz.ch. Let's arrange a talk for detailed information!
We are looking forward to your application!
This master project will be supervised by Prof.Richard Hahnloser at the Institute of Neuroinformatics of ETH Zurich. If you are interested in doing this project with us, please send your CV together with your latest transcript to yingqiang.gao@ini.ethz.ch. Let's arrange a talk for detailed information!