Horizon CDT Research Highlights

Research Highlights

Image-based Indoor localization system for the visually impaired people

  Qing Li (2015 cohort)   www.nottingham.ac.uk/~psxql2


Worldwide approximately 285 million persons are visually impaired. About 39 million are completely blind and 246 million have low vision capacity[1]. A localization system is of significant importance to the visually impaired people in their daily lives by helping them localize themselves and further navigate them in the world environments.Thanks to the development of GPS, self-positioning is solved accurately and efficiently in an outdoor environment. But in indoor environments, GPS won’t work as the GPS signals are greatly weakened even blocked so that they are not capable to be used for positioning[2]. Computer vision technologies enable the blind to detect and decode the location information without touching and searching the notification, which also exploits memorization of features of the location. But instead of using human brain, image-based indoor localization system will execute the remembering and recall tasks. With a camera at hand and connected with the system, visually impaired people can position themselves as normal people. Another advantage of computer vision technologies is that they not only provide accurate position information but orientation [3]as well, which is essential for navigation.

Related work

Visual landmarks can be divided into two categories: natural landmarks and artificial landmarks. Artificial landmarks are purposefully designed to be salient in the environment, and they have many advantages. They are easy to precisely detect since they are manufactured based on prior rules. These rules allow them to stay robust to the challenges of varying illuminations, viewpoints and scale in images. The landmark position can also be coded into the landmark appearance. However, deploying artificial landmarks changes the building decoration which might not be feasible due to economic reason or owners’ tastes. Natural landmarks avoid changing indoor surface by exploiting physical objects or scenes in the environment. Common objects such as doors, elevators, fire extinguishers and locations of interest such as corners and turns are good natural landmarks. Such objects remain unchanged over a relatively long period and are common in the indoor environment. Many methods have been proposed to detect natural landmarks. Some of these methods are based on handcrafted features, which are devised to make use of color, gradient and geometric information. In [4], planar and quadrangular objects are viewed as landmarks and detected them based on geometric rules. In [5], indoor objects such as doors, elevators and cabinets are recognized by judging whether detected lines and corners satisfy indoor object shape constraints. SIFT [6] feature are used to perform landmark recognition in [7] and [8]. SURF [9] feature and line segments are leveraged to detect landmark according to [10] and the method performs well in detecting doors, stairs and tags in the environment. we view landmark detection as a classification problem. We not only choose indoor objects but also indoor scene of locations of interest such as corners and turns as landmarks. Unlike the previous approaches that recognize indoor objects with handcrafted features, a convolutional neural network (CNN) is trained to detect both the indoor objects and locations of interest together. The CNN is able to learn the key features to distinguish the target objects. The key features are not derived from a single space, but a combination of color, gradient and geometric space. In addition, with a proper training dataset, this approach is robust to landmark variation caused by illumination and other deformations. The CNN is selected due to its good performance in image classification [11] and indoor scene recognition [12] where CNN-based methods have outperformed approaches based on handcrafted features.

Research Question

  1. How to distinguish the similar indoor scene from the images or videos?
  2. How to filter the disturbing objects like moving pedestrian and chairs?
  3. How to achieve continuous localization in real-time accurately from the videos?

Aims and objectives

My project aims to develop an indoor system for the visually impaired that relies on the camera embedded in the wearable device. The system takes as input recorded video and indoor floor plan map and installed on the wearable device. The system has three main components: user relocalization, continuous localization and obstacle detection and threat alarming. The three modules correspond to three objectives. User relocalization tries to solve the scene ambiguity problem occurring in the indoor environment. Continous localization module exploits the adjacent frames of video to track user's trajectory. Obstacle and detection and alarming deals dynamic environment and threats.


Landmarks have been proved to be useful for localization in the environment. They are visually steady and can be exploited to achieve users relocalization. Currently, deep learning based object detection methods allow us to recognize landmarks with high accuracy. For the continuous localization, adjacent images can be used to compute the pose change. Instead of traditional method that requires large storage and long time, CNN-based pose regression will be exploited to decrease the storage requirement. Several types of obstacles will be defined and their distance to users will also be estimated by CNN based regression method. Obstacles types will be recognized by CNN-based classification.


  1. World Health Organization: Visual impairment and blindness. (2011). from http://www.who.int/mediacentre/factsheets/fs282/en/.

  2. Ifthekhar, M. S., Saha, N., & Jang, Y. M. (2014). Neural Network Based Indoor Positioning Technique in Optical Camera Communication System. Paper presented at the Proceedings of the International Conference on Indoor Positioning and Indoor Navigation.

  3. Kohoutek, T. K., Mautz, R., & Donaubauer, A. (2010). Real-time indoor positioning using range imaging sensors. Paper presented at the SPIE Photonics Europe.

  4. Hayet, J. B., Lerasle, F., & Devy, M. (2002). A visual landmark framework for indoor mobile robot navigation. IEEE International Conference on Robotics and Automation, 2002. Proceedings. ICRA (Vol.4, pp.3942-3947 vol.4). IEEE.

  5. Tian, Y., Yang, X., Yi, C., & Arditi, A. (2013). Toward a computer vision-based wayfinding aid for blind persons to access unfamiliar indoor environments. Machine Vision & Applications, 24(3), 521-535.

  6. Lowe, D. (1999). Object recognition from scale-invariant keypoints. Iccv.

  7. Chen, K. C., & Tsai, W. H. (2010). Vision-based autonomous vehicle guidance for indoor security patrolling by a sift-based vehicle-localization technique. IEEE Transactions on Vehicular Technology, 59(7), 3261-3271.

  8. Bai, Y., Jia, W., Zhang, H., Mao, Z. H., & Sun, M. (2014). Landmark-based indoor positioning for visually impaired individuals. International Conference on Signal Processing (Vol.2014, pp.678). Int Conf Signal Process Proc.

  9. Bay, H., Ess, A., Tuytelaars, T., & Gool, L. V. (2008). Speeded-up robust features (surf). Computer Vision & Image Understanding, 110(3), 346-359.

  10. Serrão, M., Rodrigues, J. M. F., Rodrigues, J. I., & Buf, J. M. H. D. (2012). Indoor localization and navigation for blind persons using visual landmarks and a gis. Procedia Computer Science, 14(4), 65-73.

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the CVPR, pp. 580–587.

  12. Zhou B, Garcia A L, Xiao J, et al. Learning Deep Features for Scene Recognition using Places Database[J]. 2014.

This author is supported by the Horizon Centre for Doctoral Training at the University of Nottingham (RCUK Grant No. EP/L015463/1) and Shenzhen University.