Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Training of a neural network for speech recognition in medical debriefings
This thesis focuses on the training of a neural network that processes audio data from video recordings of medical debriefings to detect predefined classes of speech in the communication between debriefers and learners. By comparison to manually labeled video data, the performance of the networks is to be continuously evaluated and improved.
Keywords: machine learning, deep learning, neural network, speech recognition, debriefing
At hospitals, health care professionals can train and improve their skills in patient treatment by guided training where realistic clinical scenarios are simulated. The core element of such a simulation-based team training is the debriefing, which is defined as a guided conversation among participants that aims to explore and understand the relationships among events, actions, thought and feeling processes, and performance outcomes of the simulated situation. To assess and to obtain insights into what works and what does not work during debriefings, segments of speech are assigned to different classes of communications (such as observation, opinion, illustration, inquiry, etc.). However, this process of classification is usually done manually and thus, is very time-consuming. The goal of this thesis is to solve this problem by providing a neural network that is trained to detect coherent speech segments in video data and to assign each segment to the fitting class of communication.
At hospitals, health care professionals can train and improve their skills in patient treatment by guided training where realistic clinical scenarios are simulated. The core element of such a simulation-based team training is the debriefing, which is defined as a guided conversation among participants that aims to explore and understand the relationships among events, actions, thought and feeling processes, and performance outcomes of the simulated situation. To assess and to obtain insights into what works and what does not work during debriefings, segments of speech are assigned to different classes of communications (such as observation, opinion, illustration, inquiry, etc.). However, this process of classification is usually done manually and thus, is very time-consuming. The goal of this thesis is to solve this problem by providing a neural network that is trained to detect coherent speech segments in video data and to assign each segment to the fitting class of communication.
The thesis focuses on the training of a neural network (such as whisper/openAI) that processes audio data from video recordings of medical debriefings to detect predefined classes of speech in the communication between debriefers and learners. The work involves the close collaboration with our clinical partners from University Hospital Zurich (USZ), who will provide the classification scheme (called DE-CODE) as well as a large set of labeled video data from debriefing sessions (in German and Swiss German language). By comparison to manually labeled video data, the performance of the networks is to be continuously evaluated (e.g. by ground truth comparison) and improved (in terms of e.g. accuracy and f1-score).
The thesis focuses on the training of a neural network (such as whisper/openAI) that processes audio data from video recordings of medical debriefings to detect predefined classes of speech in the communication between debriefers and learners. The work involves the close collaboration with our clinical partners from University Hospital Zurich (USZ), who will provide the classification scheme (called DE-CODE) as well as a large set of labeled video data from debriefing sessions (in German and Swiss German language). By comparison to manually labeled video data, the performance of the networks is to be continuously evaluated (e.g. by ground truth comparison) and improved (in terms of e.g. accuracy and f1-score).
- Motivation to apply machine learning in a real-world setting - Interest in the analysis of human speech and communication - Talent to switch between German and English language - Competence to work independently and scientifically
Success in product development depends heavily on the competence and skills of teams and individuals. This is why we dedicate our research to create knowledge that enables the value-adding use of new technologies - and to make this knowledge tangible and teachable. Industrial and clinical needs are the driving forces for our interdisciplinary research. Our work is distinguished by a variety of methods, ranging from simulation to validation of real applications. Our research changes the way we develop products, and our expertise changes the way we create sustainable value.
Success in product development depends heavily on the competence and skills of teams and individuals. This is why we dedicate our research to create knowledge that enables the value-adding use of new technologies - and to make this knowledge tangible and teachable. Industrial and clinical needs are the driving forces for our interdisciplinary research. Our work is distinguished by a variety of methods, ranging from simulation to validation of real applications. Our research changes the way we develop products, and our expertise changes the way we create sustainable value.
- Neural network for speech recognition and classification - Performance evaluation and improvement - Interdisciplinary project with clinical partners