Register now After registration you will be able to apply for this opportunity online.
Causal Machine Learning and Data Fusion with Experts in the Loop for Spinal Cord Injury (SCI)
Causl Discovery aims to find causal relations from data, being increasingly important in various fields such as health science. Despite the growing amount of work on applying causal discovery methods with expert knowledge to areas of interest, few of them inspect the uncertainty of expert knowledge (what if the expert goes wrong?). This is highly important since in scientific fields, causal discovery with expert knowledge should be cautious and an approach taking expert uncertainty into account will be more robust to potential bias induced by individuals. Therefore, we aim to develop an iterative causal discovery method with experts in the loop to enable continual interaction and calibration between experts and data.
Besides, fusing datasets from different sources is essential for holistic discovery and reasoning. This project will also focus on developing methods of machine learning and data fusion over distinct contexts under the scope of SCI.
Based on the qualifications of the candidates, we can arrange a subsidy/allowance to cover traveling or living costs.
Causal graphical models (CGM) can be determined via randomized controlled experiments (and this is the golden standard to determine the causal relations and quantify the causal effects), learned purely from data[1](i.e., structural learning or causal discovery) or elicited by domain experts based on their knowledge and experiences. It is also feasible to learn the CGM from data with prior expert knowledge when data become not reliable due to insufficient sample size or measurement errors.
Early works in causal discovery utilized constraint-based approaches, such as the PC algorithm and the FCI algorithm. These methods exploit conditional independence relationships among variables to infer causal structures. They can restrict the output graph space through conditional independence assumptions or background knowledge about causal relationships [2].
Score-based approaches, such as the GES (Greedy Equivalence Search) algorithm, aim to find the best causal structure that optimizes a scoring criterion, e.g., the BIC (Bayesian Information Criterion) score. These methods often incorporate expert knowledge through prior specifications or domain-specific constraints on the model parameters [3, 4, 5].
Recent advancements in causal discovery involve hybrid approaches that combine elements of constraint-based and score-based methods. These approaches aim to leverage the strengths of both approaches to improve the accuracy and efficiency of causal structure learning. They may incorporate expert knowledge through the integration of domain-specific constraints and scoring functions.
In addition to observational data, causal discovery methods have also been developed to handle experimental data, where interventions or randomized experiments are conducted. These methods utilize the causal effects of interventions to infer causal relationships. Expert knowledge can be incorporated by designing well-controlled experiments or interventions based on domain expertise.
Despite the growing amount of work on structural learning with expert knowledge in scientific fields, few of them inspect the uncertainty of expert knowledge (what if the expert goes wrong?). This is highly important since in medical applications, causal discovery with expert knowledge should be taken cautiously and sometimes experts can give false knowledge, leading to serious mistakes. An approach considering the expert uncertainty during CGM construction will be more robust to potential bias induced by individuals. This idea is made possible by recent progress toward evaluating and falsifying a given CGM from observational data without ground truth[6]. Therefore, we hope to develop an iterative causal discovery method with experts in the loop for continual interaction and calibration between experts and data. In the meantime, we are also trying to develop an interactive user interface for medical experts to input their knowledge and edit the learned graph.
References
[1] Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of causal discovery methods based on graphical models. Frontiers in genetics, 10, 524.
[2] Flores, M. J., Nicholson, A. E., Brunskill, A., Korb, K. B., & Mascaro, S. (2011). Incorporating expert knowledge when learning Bayesian network structure: a medical case study. Artificial intelligence in medicine, 53(3), 181-204.
[3] O’Donnell, R. T., Nicholson, A. E., Han, B., Korb, K. B., Alam, M. J., & Hope, L. R. (2006). Causal discovery with prior information. In AI 2006: Advances in Artificial Intelligence: 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006. Proceedings 19 (pp. 1162-1167). Springer Berlin Heidelberg.
[4] Perković, E., Kalisch, M., & Maathuis, M. H. (2017). Interpreting and using CPDAGs with background knowledge. arXiv preprint arXiv:1707.02171.
[5] Kleinegesse, S., Lawrence, A. R., & Chockler, H. (2022). Domain Knowledge in A*-Based Causal Discovery. arXiv preprint arXiv:2208.08247.
[6] Eulig, E., Mastakouri, A. A., Blöbaum, P., Hardt, M., & Janzing, D. (2023). Toward Falsifying Causal Graphs Using a Permutation-Based Test. arXiv preprint arXiv:2305.09565.
Causal graphical models (CGM) can be determined via randomized controlled experiments (and this is the golden standard to determine the causal relations and quantify the causal effects), learned purely from data[1](i.e., structural learning or causal discovery) or elicited by domain experts based on their knowledge and experiences. It is also feasible to learn the CGM from data with prior expert knowledge when data become not reliable due to insufficient sample size or measurement errors.
Early works in causal discovery utilized constraint-based approaches, such as the PC algorithm and the FCI algorithm. These methods exploit conditional independence relationships among variables to infer causal structures. They can restrict the output graph space through conditional independence assumptions or background knowledge about causal relationships [2]. Score-based approaches, such as the GES (Greedy Equivalence Search) algorithm, aim to find the best causal structure that optimizes a scoring criterion, e.g., the BIC (Bayesian Information Criterion) score. These methods often incorporate expert knowledge through prior specifications or domain-specific constraints on the model parameters [3, 4, 5].
Recent advancements in causal discovery involve hybrid approaches that combine elements of constraint-based and score-based methods. These approaches aim to leverage the strengths of both approaches to improve the accuracy and efficiency of causal structure learning. They may incorporate expert knowledge through the integration of domain-specific constraints and scoring functions. In addition to observational data, causal discovery methods have also been developed to handle experimental data, where interventions or randomized experiments are conducted. These methods utilize the causal effects of interventions to infer causal relationships. Expert knowledge can be incorporated by designing well-controlled experiments or interventions based on domain expertise.
Despite the growing amount of work on structural learning with expert knowledge in scientific fields, few of them inspect the uncertainty of expert knowledge (what if the expert goes wrong?). This is highly important since in medical applications, causal discovery with expert knowledge should be taken cautiously and sometimes experts can give false knowledge, leading to serious mistakes. An approach considering the expert uncertainty during CGM construction will be more robust to potential bias induced by individuals. This idea is made possible by recent progress toward evaluating and falsifying a given CGM from observational data without ground truth[6]. Therefore, we hope to develop an iterative causal discovery method with experts in the loop for continual interaction and calibration between experts and data. In the meantime, we are also trying to develop an interactive user interface for medical experts to input their knowledge and edit the learned graph.
References
[1] Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of causal discovery methods based on graphical models. Frontiers in genetics, 10, 524.
[2] Flores, M. J., Nicholson, A. E., Brunskill, A., Korb, K. B., & Mascaro, S. (2011). Incorporating expert knowledge when learning Bayesian network structure: a medical case study. Artificial intelligence in medicine, 53(3), 181-204.
[3] O’Donnell, R. T., Nicholson, A. E., Han, B., Korb, K. B., Alam, M. J., & Hope, L. R. (2006). Causal discovery with prior information. In AI 2006: Advances in Artificial Intelligence: 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006. Proceedings 19 (pp. 1162-1167). Springer Berlin Heidelberg.
[4] Perković, E., Kalisch, M., & Maathuis, M. H. (2017). Interpreting and using CPDAGs with background knowledge. arXiv preprint arXiv:1707.02171.
[5] Kleinegesse, S., Lawrence, A. R., & Chockler, H. (2022). Domain Knowledge in A*-Based Causal Discovery. arXiv preprint arXiv:2208.08247.
[6] Eulig, E., Mastakouri, A. A., Blöbaum, P., Hardt, M., & Janzing, D. (2023). Toward Falsifying Causal Graphs Using a Permutation-Based Test. arXiv preprint arXiv:2305.09565.
1. Literature Review on Causal Discovery: You will study classic and state-of-the-art causal discovery methods (preferably on real datasets with mixed-type variables) that incorporate expert knowledge, leading to a systematic review of causal discovery methods in different categories (constraint-based vs. score-based) with different assumptions (e.g., latent variables, causal mechanism/function forms, and cycles) for different settings (e.g., static data or time series).
2. Data Exploration and Analysis: You will explore analyze and datasets collected for HAPI (Hospital-Acquired Pressure Injury) and Hospital Stay after PI Surgeries. The data record hundreds of patients into dozens of mixed-type variables including patients’ demographic features, lab testing values, and health conditions observed during their stay. Apart from clinical data, you will also have access to multivariate time series extracted from bio-signals on SCI people under intervention experiments. We aim to learn the causal relations among these variables of interest from the above data with experts’ knowledge. Additionally, we will probe some public datasets of other fields such as heart diseases and earth science with ground-truth graphs.
3. Methodology Development and Deployment: You will help develop and validate an iterative algorithm for causal discovery with the expert in the loop against benchmarks using the simulated and real datasets. There are two potential exploring directions: 1. How will the correct expert knowledge help reduce the search space of graphs to improve the accuracy and efficiency of the algorithm? 2. How to encode prior knowledge or design a post-modification strategy with uncertainty from experts’ input to increase the robustness of the algorithm? Based on the methodology, an interactive web/application interface for doctors to input their knowledge and edit the learned graph is expected. If possible, you will assist in deriving the mathematical guarantee for the iterative algorithm in terms of convergence and stability properties.
4. Presentation and Documentation: You will prepare a high-quality manuscript for publication with a clear and engaging presentation of the results and methodology.
1. Literature Review on Causal Discovery: You will study classic and state-of-the-art causal discovery methods (preferably on real datasets with mixed-type variables) that incorporate expert knowledge, leading to a systematic review of causal discovery methods in different categories (constraint-based vs. score-based) with different assumptions (e.g., latent variables, causal mechanism/function forms, and cycles) for different settings (e.g., static data or time series). 2. Data Exploration and Analysis: You will explore analyze and datasets collected for HAPI (Hospital-Acquired Pressure Injury) and Hospital Stay after PI Surgeries. The data record hundreds of patients into dozens of mixed-type variables including patients’ demographic features, lab testing values, and health conditions observed during their stay. Apart from clinical data, you will also have access to multivariate time series extracted from bio-signals on SCI people under intervention experiments. We aim to learn the causal relations among these variables of interest from the above data with experts’ knowledge. Additionally, we will probe some public datasets of other fields such as heart diseases and earth science with ground-truth graphs. 3. Methodology Development and Deployment: You will help develop and validate an iterative algorithm for causal discovery with the expert in the loop against benchmarks using the simulated and real datasets. There are two potential exploring directions: 1. How will the correct expert knowledge help reduce the search space of graphs to improve the accuracy and efficiency of the algorithm? 2. How to encode prior knowledge or design a post-modification strategy with uncertainty from experts’ input to increase the robustness of the algorithm? Based on the methodology, an interactive web/application interface for doctors to input their knowledge and edit the learned graph is expected. If possible, you will assist in deriving the mathematical guarantee for the iterative algorithm in terms of convergence and stability properties. 4. Presentation and Documentation: You will prepare a high-quality manuscript for publication with a clear and engaging presentation of the results and methodology.
1. Gain unique access and first-hand experience in one of the leading institutions on long-term health management - At the Swiss Paraplegic Center at Nottwil.
2. Learn and apply state-of-the-art research methods in Health Data Science.
1. Gain unique access and first-hand experience in one of the leading institutions on long-term health management - At the Swiss Paraplegic Center at Nottwil. 2. Learn and apply state-of-the-art research methods in Health Data Science.
1. Strong interest and background in (probabilistic) graphical models and causal discovery.
2. Knowledge of virtual environments (conda / docker).
3. Strong experience with Python (preferred).
4. Structured and reliable working style.
5. Ability to work independently on a challenging topic.
1. Strong interest and background in (probabilistic) graphical models and causal discovery. 2. Knowledge of virtual environments (conda / docker). 3. Strong experience with Python (preferred). 4. Structured and reliable working style. 5. Ability to work independently on a challenging topic.
Host: Dr. Diego Paez (SCAI-Lab, ETHZ | SPZ)
Supervision: Dr. Diego Paez & Yanke Li (SCAI-Lab, ETHZ | SPZ)
Please send your CV and the latest transcript of records from my studies to Yanke Li (yanke.li@hest.ethz.ch)
Host: Dr. Diego Paez (SCAI-Lab, ETHZ | SPZ) Supervision: Dr. Diego Paez & Yanke Li (SCAI-Lab, ETHZ | SPZ)
Please send your CV and the latest transcript of records from my studies to Yanke Li (yanke.li@hest.ethz.ch)