Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Removing Markers in Images for Hand Pose Estimation
We want to investigate if we can remove optical markers in hand images and how this removal affects a downstream hand pose estimation pipeline.
Keywords: Hand pose estimation, Optical Tracking, Deep Learning, Image Infilling
In recent years, estimating hand poses from single or multiple RGB images has become popular. With the advent of Deep Learning, vast improvements were realized. However, the topic remains challenging.
One of the reasons that make hand pose estimation difficult is that ground-truth labels are hard to come by. For example, we could attach reflective markers to a user’s hand and then estimate the hand pose via established optical tracking systems. While this results in acceptable pose estimations, it unfortunately renders the resulting images useless because of the visible markers.
Visible markers in the images are problematic because they are likely a very strong cue for any neural network deployed for the recognition task. Hence, the network will heavily rely on markers being present in the input images and thus not generalize well to images in-the-wild (where people usually don’t wear reflective markers).
In this thesis we want to investigate if we can train a neural network that removes markers from the images. This can be seen as an image infilling task at which neural architectures have recently excelled. However, even if the infilling is successful, it is not clear how the filled-in picture affects a downstream pose estimation pipeline. For example, it is possible that a neural network removes the markers in such a way that still leaves traces which a pose estimation network can pick up. This in turn can lead to that network perform poorly on images that were not corrected.
In recent years, estimating hand poses from single or multiple RGB images has become popular. With the advent of Deep Learning, vast improvements were realized. However, the topic remains challenging.
One of the reasons that make hand pose estimation difficult is that ground-truth labels are hard to come by. For example, we could attach reflective markers to a user’s hand and then estimate the hand pose via established optical tracking systems. While this results in acceptable pose estimations, it unfortunately renders the resulting images useless because of the visible markers.
Visible markers in the images are problematic because they are likely a very strong cue for any neural network deployed for the recognition task. Hence, the network will heavily rely on markers being present in the input images and thus not generalize well to images in-the-wild (where people usually don’t wear reflective markers).
In this thesis we want to investigate if we can train a neural network that removes markers from the images. This can be seen as an image infilling task at which neural architectures have recently excelled. However, even if the infilling is successful, it is not clear how the filled-in picture affects a downstream pose estimation pipeline. For example, it is possible that a neural network removes the markers in such a way that still leaves traces which a pose estimation network can pick up. This in turn can lead to that network perform poorly on images that were not corrected.
The task of the student is to 1) use (or potentially re-train) a state-of-the-art image infilling network to produce “clean” hand images 2) study how the infilling affects performance of hand pose estimation on the “clean” hand images. Depending on time and intermediate results new architectures for hand pose estimation from cleaned images can pe explored. Also, the project likely involves capturing some data in the motion capture lab. Experience with deep learning frameworks (pytorch or tensorflow) is required.
The task of the student is to 1) use (or potentially re-train) a state-of-the-art image infilling network to produce “clean” hand images 2) study how the infilling affects performance of hand pose estimation on the “clean” hand images. Depending on time and intermediate results new architectures for hand pose estimation from cleaned images can pe explored. Also, the project likely involves capturing some data in the motion capture lab. Experience with deep learning frameworks (pytorch or tensorflow) is required.