Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
The needle in a haystack: Extracting information out of chemical process descriptions with GPT
To improve chemical processes, we need to predict the process performances and environmental impacts in a very early stage of process design. However, more detailed information about the processes is required to improve those predictions. In this master thesis, you will train a natural language processing model to gather such information from process descriptions.
Keywords: Process engineering, Natural Language Processing, Chemical industry, Process Descriptions
Predicting the performance and environmental impacts of chemical processes requires a lot of information about the process that is usually not available at an early stage of research and development. At EPSE, we try to close this gap of missing data, e.g., through machine learning driven estimation approaches to predict process energy demands and raw material inputs. More information about existing processes can help improve estimation approaches for novel chemical processes. This master thesis aims to adjust existing natural language processing models to extract the relevant information from available process descriptions. Process descriptions are short texts often used in patents, publications or textbooks providing information about the sequence of steps performed within a chemical process.
**What will be your contribution?**
We will provide you with a pretrained natural language model. To fine tune this model for our purpose you will identify relevant process information from a set of available process descriptions, define a machine-readable data format to collect the process information, and extract this information to generate a training set. After you have trained and tested the model, you will explore the usability of the results to improve machine-learning driven estimation approaches.
**What skills do you need?**
• Good understanding of chemical processes and unit operations
• Good programming experience: Python
• Independent and goal-oriented working style
• Above-average grades
• Basic understanding of challenges in early process engineering is a plus
**What do we offer?**
In this master thesis, you can learn about the chemical industry, improve your coding skills and get familiar with Natural Language Processing. Furthermore, you will be a part of a young and motivated team of researchers and students.
Predicting the performance and environmental impacts of chemical processes requires a lot of information about the process that is usually not available at an early stage of research and development. At EPSE, we try to close this gap of missing data, e.g., through machine learning driven estimation approaches to predict process energy demands and raw material inputs. More information about existing processes can help improve estimation approaches for novel chemical processes. This master thesis aims to adjust existing natural language processing models to extract the relevant information from available process descriptions. Process descriptions are short texts often used in patents, publications or textbooks providing information about the sequence of steps performed within a chemical process.
**What will be your contribution?** We will provide you with a pretrained natural language model. To fine tune this model for our purpose you will identify relevant process information from a set of available process descriptions, define a machine-readable data format to collect the process information, and extract this information to generate a training set. After you have trained and tested the model, you will explore the usability of the results to improve machine-learning driven estimation approaches.
**What skills do you need?** • Good understanding of chemical processes and unit operations • Good programming experience: Python • Independent and goal-oriented working style • Above-average grades • Basic understanding of challenges in early process engineering is a plus
**What do we offer?** In this master thesis, you can learn about the chemical industry, improve your coding skills and get familiar with Natural Language Processing. Furthermore, you will be a part of a young and motivated team of researchers and students.
You will develop a natural language processing model to extract machine-readable process information from chemical process descriptions and evaluate the model performance.
You will develop a natural language processing model to extract machine-readable process information from chemical process descriptions and evaluate the model performance.
Tim Langhorst
Doctoral student
CLA F 15.2
Tannenstrasse 3
8092 Zurich, Switzerland
tlanghorst@ethz.ch
Tim Langhorst Doctoral student CLA F 15.2 Tannenstrasse 3 8092 Zurich, Switzerland