Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Active learning for anomaly detection in environmental data
In situ sensors for environmental monitoring are increasingly used in both engineered and natural systems. Given the growing quantity of data, dedicated tools are needed to ensure high data quality. The main goal of this Master thesis project is to evaluate whether active learning can provide an ade
Keywords: Machine Learning, Active learning, Environmental monitoring, Anomaly detection
In situ sensors for environmental monitoring are increasingly used in both engineered and natural systems. Given the growing quantity of data, dedicated tools are needed to ensure high data quality.
Anomaly detection aims to identify unusual patterns or outliers that do not conform to the expected behavior of the systems under investigation. Automated anomaly detection procedures are focused on automatically flagging data values that do not meet one of several plausibility tests. These procedures differ from manual anomaly detection, which require expert handling and is usually conducted by visual inspection of the raw or pre-processed data and is extensively time consuming.
Hence, automated anomaly detection plays a very important role in environmental monitoring since it can provide time and cost savings as well as immediate and actionable information.
One class of methods in automated anomaly detection for environmental monitoring is based on supervised machine learning algorithms, which rely on labelled data used to train the algorithm. The disadvantage of supervised anomaly detection is that the labelling task is cumbersome and, in turn, expensive and time consuming. Unsupervised machine learning algorithms on the other hand, do not require labelled data, but they might deliver a suboptimal performance relative to supervised methods.
Active learning (also known as weakly or semi-supervised learning) is a way of training of ma-chine learning models by iteratively querying the user for the label of a subset of training in-stances. This approach can be useful since only fraction of the labeled data (or the effort necessary to label it) is needed to train the algorithm. It is hypothesised that active learning enables extremely fast deployment machine learning models in real-world applications.
At this stage of this research, it is key to understand how active learning methods affect the ac-curacy of anomaly detection and how one can optimally design the policy by which the domain expert is queried, and how many samples need to be queried to reach satisfying model performance.
In this project, critical to the evaluation is the presence of ground truth labelled data. Therefore, labelled data from several infrastructures which produce environmental data will be used, in order to provide real case applications and expert knowledge. Here, targeted anomalies are faulty behaviours for engineered systems and natural event detection in natural systems.
In situ sensors for environmental monitoring are increasingly used in both engineered and natural systems. Given the growing quantity of data, dedicated tools are needed to ensure high data quality. Anomaly detection aims to identify unusual patterns or outliers that do not conform to the expected behavior of the systems under investigation. Automated anomaly detection procedures are focused on automatically flagging data values that do not meet one of several plausibility tests. These procedures differ from manual anomaly detection, which require expert handling and is usually conducted by visual inspection of the raw or pre-processed data and is extensively time consuming. Hence, automated anomaly detection plays a very important role in environmental monitoring since it can provide time and cost savings as well as immediate and actionable information.
One class of methods in automated anomaly detection for environmental monitoring is based on supervised machine learning algorithms, which rely on labelled data used to train the algorithm. The disadvantage of supervised anomaly detection is that the labelling task is cumbersome and, in turn, expensive and time consuming. Unsupervised machine learning algorithms on the other hand, do not require labelled data, but they might deliver a suboptimal performance relative to supervised methods.
Active learning (also known as weakly or semi-supervised learning) is a way of training of ma-chine learning models by iteratively querying the user for the label of a subset of training in-stances. This approach can be useful since only fraction of the labeled data (or the effort necessary to label it) is needed to train the algorithm. It is hypothesised that active learning enables extremely fast deployment machine learning models in real-world applications.
At this stage of this research, it is key to understand how active learning methods affect the ac-curacy of anomaly detection and how one can optimally design the policy by which the domain expert is queried, and how many samples need to be queried to reach satisfying model performance.
In this project, critical to the evaluation is the presence of ground truth labelled data. Therefore, labelled data from several infrastructures which produce environmental data will be used, in order to provide real case applications and expert knowledge. Here, targeted anomalies are faulty behaviours for engineered systems and natural event detection in natural systems.
The main goal of this Master thesis project is to evaluate whether active learning can provide an adequate level of accuracy for anomaly detection in environmental systems.
More specifically, the project includes the following tasks:
1. Literature review of current active learning approaches, to acquire understanding of re-search problem and project premises.
2. Benchmarking of existing unsupervised and supervised machine learning models using existing data sets from environmental monitoring.
3. Evaluation of active learning approaches and comparison with unsupervised and supervised approaches. This includes assessment of how many data points are necessary to be queried and what kind of sample selection policy is adequate
4. Document the research in a 30-page report.
Specific information / Requirements
- Interest in environmental monitoring and machine learning
- Motivation and initiative
The main goal of this Master thesis project is to evaluate whether active learning can provide an adequate level of accuracy for anomaly detection in environmental systems.
More specifically, the project includes the following tasks:
1. Literature review of current active learning approaches, to acquire understanding of re-search problem and project premises. 2. Benchmarking of existing unsupervised and supervised machine learning models using existing data sets from environmental monitoring. 3. Evaluation of active learning approaches and comparison with unsupervised and supervised approaches. This includes assessment of how many data points are necessary to be queried and what kind of sample selection policy is adequate 4. Document the research in a 30-page report.
Specific information / Requirements - Interest in environmental monitoring and machine learning - Motivation and initiative
Contact information
Dr. Stefania Russo
Email: Stefania.russo@ea wag.ch
Contact information Dr. Stefania Russo Email: Stefania.russo@ea wag.ch