Register now After registration you will be able to apply for this opportunity online.
Master Thesis / Internship / Semester Project: Digitization of large 12-lead ECG Image database
12-lead electrocardiograms (ECGs) are still solely documented on paper in many hospitals, especially in the Global South. These physical paper records provide a multitude of conditions recorded in many different countries. Our lab has access to a dataset with more than 8000 patient’s ECG photos / scans of 12-lead signals printed onto physical paper sheets. The dataset comprises 12-lead ECG image records from more than 35 hospital sites across Europe. The primary objective of this project is to develop an automated digitization pipeline from raw image scan in .png format towards 12 vectorized ECG time series in WFDB format.
Keywords: Spinal Cord Injury, Computer Vision, CV, Machine Learning, Deep Learning, AI, Signal Processing, ECG, Medical Data, Healthcare
**Problem Definition**
12-lead electrocardiograms (ECGs) are still solely documented on paper in many hospitals, especially in the Global South. These physical paper records provide a multitude of conditions recorded in many different countries. Our lab has access to a dataset with more than 8000 patient’s ECG photos / scans of 12-lead signals printed onto physical paper sheets. The dataset comprises 12-lead ECG image records from more than 35 hospital sites across Europe. The primary objective of this project is to develop an automated digitization pipeline from raw image scan in .png format towards 12 vectorized ECG time series in WFDB format.
The 12 lead ECG scans are usually noisy and contain a red or black grid depending on the photo format (B / BW). Some ECG images were taken by a camera while the sheet was held in the hand. For these images, the paper is often tilted and the image contains shadow artifacts or wrinkles. Furthermore, the 12 lead signals were not printed in a standardized format. Often, the signals are printed in a 3x4 matrix format onto paper with the lead identifier letters written next to the start loca-tion of the signal. Sometimes, all leads are printed in full length above each other, however. A robust solution for these prob-lems must be developed and a preliminary draft of the pipeline steps is explained in the ‘Your Tasks’ section.
**Your Tasks**
1. Conduct extensive literature research on State-of-the-Art ECG digitization systems.
2. Develop an automatic image rotation algorithm for horizontal alignment.
3. Code a network for grid removal. In this step, the grid size should be extracted in pixels for later ECG signal rescaling purposes.
4. Propose a solution for denoising the images after grid-removal. The remaining image should only consist of the lead signals and the lead identifier letters.
5. Apply a pretrained character recognition CV algorithm for character recognition.
6. Detect the relevant image regions for each individual ECG lead signal.
7. Develop an algorithm for conversion of the signal into physical units and potential subsequent post-processing steps.
8. Test the performance of the digitization compared to manually extracted features that have been recorded in the hospital.
9. Digitize the whole dataset including images of more than 8000 patients recorded at 30+ hospital sites across Eu-rope.
10. Make the tool accessible online / via app. (optional)
**Your Profile**
- Strong experience with Python
- Strong background in Computer Vision (preferred)
- Background in Signal Processing and Filtering (preferred)
- Structured and reliable working style
- Ability to work independently on a challenging problem
**Problem Definition**
12-lead electrocardiograms (ECGs) are still solely documented on paper in many hospitals, especially in the Global South. These physical paper records provide a multitude of conditions recorded in many different countries. Our lab has access to a dataset with more than 8000 patient’s ECG photos / scans of 12-lead signals printed onto physical paper sheets. The dataset comprises 12-lead ECG image records from more than 35 hospital sites across Europe. The primary objective of this project is to develop an automated digitization pipeline from raw image scan in .png format towards 12 vectorized ECG time series in WFDB format.
The 12 lead ECG scans are usually noisy and contain a red or black grid depending on the photo format (B / BW). Some ECG images were taken by a camera while the sheet was held in the hand. For these images, the paper is often tilted and the image contains shadow artifacts or wrinkles. Furthermore, the 12 lead signals were not printed in a standardized format. Often, the signals are printed in a 3x4 matrix format onto paper with the lead identifier letters written next to the start loca-tion of the signal. Sometimes, all leads are printed in full length above each other, however. A robust solution for these prob-lems must be developed and a preliminary draft of the pipeline steps is explained in the ‘Your Tasks’ section.
**Your Tasks**
1. Conduct extensive literature research on State-of-the-Art ECG digitization systems.
2. Develop an automatic image rotation algorithm for horizontal alignment.
3. Code a network for grid removal. In this step, the grid size should be extracted in pixels for later ECG signal rescaling purposes.
4. Propose a solution for denoising the images after grid-removal. The remaining image should only consist of the lead signals and the lead identifier letters.
5. Apply a pretrained character recognition CV algorithm for character recognition.
6. Detect the relevant image regions for each individual ECG lead signal.
7. Develop an algorithm for conversion of the signal into physical units and potential subsequent post-processing steps.
8. Test the performance of the digitization compared to manually extracted features that have been recorded in the hospital.
9. Digitize the whole dataset including images of more than 8000 patients recorded at 30+ hospital sites across Eu-rope.
10. Make the tool accessible online / via app. (optional)
**Your Profile**
- Strong experience with Python - Strong background in Computer Vision (preferred) - Background in Signal Processing and Filtering (preferred) - Structured and reliable working style - Ability to work independently on a challenging problem
The main goal of this project is to develop a pipeline for automated 12 lead ECG image digitization. This pipeline would be deployed to digitize the whole dataset comprising multiple scans or photographs for more than 8000 patients. The validity of the digitized signals can be tested based on features which have been manually measured in the hospital and which can be automatically extracted from the digitized ECG signals as well, for comparison. Finally, the project could optionally be de-ployed in an online interface or as an app such that everyone can upload a .png file and can download the digitized version of the signal as WFDB signal file. This would make digital ECG signal analysis possible in countries where printed ECG signals are still the standard and would open the possibility for earlier and refined disease detection for a large percentage of the global population with little access to modern ECG analysis tools.
The main goal of this project is to develop a pipeline for automated 12 lead ECG image digitization. This pipeline would be deployed to digitize the whole dataset comprising multiple scans or photographs for more than 8000 patients. The validity of the digitized signals can be tested based on features which have been manually measured in the hospital and which can be automatically extracted from the digitized ECG signals as well, for comparison. Finally, the project could optionally be de-ployed in an online interface or as an app such that everyone can upload a .png file and can download the digitized version of the signal as WFDB signal file. This would make digital ECG signal analysis possible in countries where printed ECG signals are still the standard and would open the possibility for earlier and refined disease detection for a large percentage of the global population with little access to modern ECG analysis tools.
Host: Dr. Diego Paez (SCAI-Lab, ETHZ | SPZ)
Please send your CV and latest transcript to: Dr. Diego Paez-Granados (diego.paez@hest.ethz.ch)
Host: Dr. Diego Paez (SCAI-Lab, ETHZ | SPZ)
Please send your CV and latest transcript to: Dr. Diego Paez-Granados (diego.paez@hest.ethz.ch)