Many people rely on mobile navigation applications when they are visiting a new area. Such applications, however, fail to deliver the same service for indoor environments. This is mostly due to the attenuation of GPS signals inside buildings. We aim to replace GPS with image-based localization.
Keywords: localization, navigation, Android app development, deep learning, computer vision
Vision-based localization strategies can be broadly divided into two categories: Approaches that use/build a 3D model (structure from motion, SLAM) and topological approaches that localize a new image by retrieving its closest match from a database of geotagged reference images. In our project, we work with the second category. In order to compare query and reference images, global image descriptors are extracted with the help of deep neural networks (VGG16 + NetVLAD). In previous work, we have established a framework, which chooses a suitable set of reference images from a (too) large set of potential references and which uses temporal information to better match sequences of query and reference images.
The next step in the project is to incorporate and adapt the existing localization and navigation algorithms into a mobile application. This application is then to be evaluated, potential areas of improvement are to be determined and addressed.
Mobile platforms generally have limited computational resources. It may, therefore, be necessary to change the type of global image descriptors from VGG16 based features to a more mobile-friendly network architecture such as MobileNet. It may also be interesting to investigate the effectiveness of different types of losses, in order to better fine-tune such a network for its localization purpose.
The following skills are an advantage when working on this project:
- Experience with a deep learning framework such as Tensorflow or PyTorch
- Experience with Android programming
- Experience with Matlab and/or Python
- Experience with OpenCV
Vision-based localization strategies can be broadly divided into two categories: Approaches that use/build a 3D model (structure from motion, SLAM) and topological approaches that localize a new image by retrieving its closest match from a database of geotagged reference images. In our project, we work with the second category. In order to compare query and reference images, global image descriptors are extracted with the help of deep neural networks (VGG16 + NetVLAD). In previous work, we have established a framework, which chooses a suitable set of reference images from a (too) large set of potential references and which uses temporal information to better match sequences of query and reference images. The next step in the project is to incorporate and adapt the existing localization and navigation algorithms into a mobile application. This application is then to be evaluated, potential areas of improvement are to be determined and addressed.
Mobile platforms generally have limited computational resources. It may, therefore, be necessary to change the type of global image descriptors from VGG16 based features to a more mobile-friendly network architecture such as MobileNet. It may also be interesting to investigate the effectiveness of different types of losses, in order to better fine-tune such a network for its localization purpose.
The following skills are an advantage when working on this project: - Experience with a deep learning framework such as Tensorflow or PyTorch - Experience with Android programming - Experience with Matlab and/or Python - Experience with OpenCV
The goal of the project is to produce, evaluate and improve a running prototype of an indoor navigation mobile application.
The goal of the project is to produce, evaluate and improve a running prototype of an indoor navigation mobile application.
Please send an E-Mail with CV and grade transcript to: jthoma@vision.ee.ethz.ch
Please send an E-Mail with CV and grade transcript to: jthoma@vision.ee.ethz.ch