This opportunity is not published. No applications will be accepted.

3D Human Body Estimation from Monocular Videos using a Generative Motion Prior

The major challenge of human body estimation is the lack of appropriate 3D labeled data for videos. To overcome this issue, we would like to explore motion priors learned from large collection of motion capture datasets ie. AMASS (https://amass.is.tue.mpg.de/), Adobe Mixamo (https://www.mixamo.com/)

Keywords: 3D human body estimation, motion priors, computer vision, computer graphics, computer animation, machine learning, deep learning.

Description
**Abstract:** Human motion is fundamental to understanding human behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. Besides, these methods cannot estimate the global translation, and they are brittle to occlusion. To address these problems, we propose to leverage a generative motion model to be used as a prior for human motion estimation from videos. We plan to train this model on large-scale motion capture datasets, i.e., AMASS and Adobe Mixamo. These datasets provide accurate annotations for human body pose and shape, ground plane contact, the global translation of the body, and fine-grained action labels, but lack realistic/in-the-wild images. We aim to combine this generative motion model with a video pose estimation method, e.g., VIBE (https://github.com/mkocabas/VIBE), to enable body motion estimation from in-the-wild videos. We expect the final model to be able to predict body pose and shape, translation, and ground contact from an input video. **Why 3D bodies?** 3D human body estimation is a pathway to (1) real-world interactive AR/VR applications such as Microsoft's HoloLens & Facebook's Oculus Rift and (2) computer animation which is used in video games & movie industry. **Tasks:** In particular, the student will develop a generative model of human motion which will be trained and tested on motion capture datasets eg. AMASS, Adobe Mixamo. This motion prior will be integrated to a video human pose estimation method eg. VIBE. **Requirements:** We are looking for independent and highly motivated students who 1) have taken a recognized deep learning or a modern computer vision course (preferably, Machine Perception); 2) are skilled in Python and PyTorch. **Optional:** Prior knowledge and experience with generative models eg. VAE, GANs, Normalizing Flows. The projects are research-oriented, and we encourage students to submit to top-tier computer vision conferences. We work closely with students during their projects and the master thesis is a great introduction to PhD positions in our lab. ** Project starting/ending dates are flexible.**
**Abstract:** Human motion is fundamental to understanding human behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. Besides, these methods cannot estimate the global translation, and they are brittle to occlusion. To address these problems, we propose to leverage a generative motion model to be used as a prior for human motion estimation from videos. We plan to train this model on large-scale motion capture datasets, i.e., AMASS and Adobe Mixamo. These datasets provide accurate annotations for human body pose and shape, ground plane contact, the global translation of the body, and fine-grained action labels, but lack realistic/in-the-wild images. We aim to combine this generative motion model with a video pose estimation method, e.g., VIBE (https://github.com/mkocabas/VIBE), to enable body motion estimation from in-the-wild videos. We expect the final model to be able to predict body pose and shape, translation, and ground contact from an input video.

**Why 3D bodies?** 3D human body estimation is a pathway to (1) real-world interactive AR/VR applications such as Microsoft's HoloLens & Facebook's Oculus Rift and (2) computer animation which is used in video games & movie industry.

**Tasks:** In particular, the student will develop a generative model of human motion which will be trained and tested on motion capture datasets eg. AMASS, Adobe Mixamo. This motion prior will be integrated to a video human pose estimation method eg. VIBE.

**Requirements:** We are looking for independent and highly motivated students who 1) have taken a recognized deep learning or a modern computer vision course (preferably, Machine Perception); 2) are skilled in Python and PyTorch.

**Optional:** Prior knowledge and experience with generative models eg. VAE, GANs, Normalizing Flows.
The projects are research-oriented, and we encourage students to submit to top-tier computer vision conferences. We work closely with students during their projects and the master thesis is a great introduction to PhD positions in our lab.
**
Project starting/ending dates are flexible.**
Goal
Not specified
Contact Details
Not specified

Calendar

Earliest start	2021-07-08
Latest end	2022-07-04

Location

Advanced Interactive Technologies (ETHZ)

Labels

Master Thesis
CLS Student Project [managed by Max Planck ETH Center for Learning Systems]
ETH Zurich (ETHZ)

Topics

Information, Computing and Communication Sciences