Horizon CDT Research Highlights

Research Highlights

Relative Geometry-Aware Siamese Neural Network for 6DOF Camera Relocalization

  Qing Li (2015 cohort)   www.nottingham.ac.uk/~psxql2


Worldwide approximately 285 million persons are visually impaired. About 39 million are completely blind and 246 million have low vision capacity[1]. A localization system is of significant importance to the visually impaired people in their daily lives by helping them localize themselves and further navigate them in the world environments. Thanks to the development of GPS, self-positioning is solved accurately and efficiently in an outdoor environment. But in indoor environments, GPS won’t work as the GPS signals are greatly weakened even blocked so that they are not capable to be used for positioning[2]. Many image-based methods have been proposed to complement GPS. They provide position and orientation information based either on image retrieval [3], [4], [5], [6], [7] or 3D model reconstruction [8]. However, these methods face many challenges, including high storage overheads, low computational ef?ciency and image variations, especially for large scenes. Recently, rapid progress in machine learning, particularly deep learning, has produced a number of deep learning-based methods [9], [10], [11], [12]]. They have attained good performances in addressing the aforementioned challenges but their accuracies are not as good as traditional methods. Another severe problem of deep learning based methods is that they fail to distinguish two different locations that have similar objects or scenes.

Research Question

  1. How to localize the users accurately in real-time through taking a single image as input?
  2. How to reduce the storage requirement without decreasing localization accuracy badly?
  3. How to distinguish locations having similar appearance?


In this project, we present a novel relative geometry-aware Siamese neural network, which explicitly exploits the relative geometry constraints between images to regularize the network. We improve the localization accuracy and enhance the ability of the network to distinguish locations with similar images. It is achieved with three key new ideas: 1) We design a novel Siamese neural network that explicitly learns the global poses of a pair of images. We constrain the estimated global poses with the actual relative pose between the pair of images. 2) We perform multi-task learning to estimate the absolute and relative poses simultaneously to ensure that the predicted poses are correct both globally and locally. 3) We employ metric learning and design an adaptive metric distance loss to learn feature representations that are capable of distinguishing the poses of similar visual images of different locations thus improving the overall pose estimation accuracy.


  1. World Health Organization: Visual impairment and blindness. (2011). from http://www.who.int/mediacentre/factsheets/fs282/en/.
  2. Ifthekhar, M. S., Saha, N., & Jang, Y. M. (2014). Neural Network Based Indoor Positioning Technique in Optical Camera Communication System. Paper presented at the Proceedings of the International Conference on Indoor Positioning and Indoor Navigation.
  3. Murillo, A. C., & Kosecka, J. (2009, September). Experiments in place recognition using gist panoramas. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops (pp. 2196-2203). IEEE.
  4. Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012, September). Image Retrieval for Image-Based Localization Revisited. In BMVC (Vol. 1, No. 2, p. 4).
  5. Ulrich, I., & Nourbakhsh, I. (2000). Appearance-based place recognition for topological localization. In Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065) (Vol. 2, pp. 1023-1029). Ieee.
  6. Wolf, J., Burgard, W., & Burkhardt, H. (2005). Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization. IEEE transactions on robotics, 21(2), 208-216.
  7. Wolf, J., Burgard, W., & Burkhardt, H. (2002). Robust vision-based localization for mobile robots using an image retrieval system based on invariant features. In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292) (Vol. 1, pp. 359-365). IEEE.
  8. Kukelova, Z., Bujnak, M., & Pajdla, T. (2013). Real-time solution to the absolute pose problem with unknown radial distortion and focal length. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2816-2823).
  9. Weyand, T., Kostrikov, I., & Philbin, J. (2016, October). Planet-photo geolocation with convolutional neural networks. In European Conference on Computer Vision (pp. 37-55). Springer, Cham.
  10. Kendall, A., Grimes, M., & Cipolla, R. (2015). Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision (pp. 2938-2946).
  11. Kendall, A., & Cipolla, R. (2016, May). Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA)(pp. 4762-4769). IEEE.
  12. Melekhov, I., Ylioinas, J., Kannala, J., & Rahtu, E. (2017). Image-based localization using hourglass networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 879-886).

This author is supported by the Horizon Centre for Doctoral Training at the University of Nottingham (RCUK Grant No. EP/L015463/1) and Shenzhen University.