Horizon CDT Research Highlights

Research Highlights

Automatic Semantic Image Understanding Based on Deep Learning and Random Forest

  Fei Yang (2015 cohort)   www.nottingham.ac.uk/~psxfy

Introduction

Object detection and image labeling have been the basic research problems in computer vision area. And the semantic image understanding is part of the further step, which means to make computer understand the logical meaning of the objects within images. A lot of work has been done in the object detection research, feature engineering, classifiers for image classification and labeling, which are all based on huge human pre-labeling training data and great computing power. Hardware technology will give us increasingly faster computing machine, but we still have a long run in the algorithm design to enhance the recognition performance of a machine to come to a human level. Understanding the image content is even harder.

Research

Our research is in the computer vision area, using machine learning, in particular deep learning methodology, to develop new algorithms implemented in the new system, contributing to artificial intelligence. The focus research topic is the semantic image understanding in a real time efficient way for the computer to come closer to the human performance, following the research of deep learning feature, feature-based classification, image labeling and image annotation. The important part for the feature detection and feature extraction is based on deep learning method. The deep learning network structure will also be part of our research. By modifying the neural network, we can let the model adjust to different application situations, fit for different scenario application purposes. In this way, we can achieve better performance. Good feature and high accurate classification are the base for image annotation and semantic analysis.

Methodology

Deep learning feature is good to represent the inner feature and random forest is good to give out the classification possibility. Semantic image understanding is based on the semantic tags assignments and semantic distance calculating. Here we talk about the semantic interpretation processing direction on image-to-semantic understanding. Features extracted from deep learning network are used to represented the image. By designing good deep learning networks, good representation could be got. The focusing problem could be to modify and redesign the frame of the deep learning network in order to perform better on certain problems. To do this, we need to use the GPU computing [43], especially CUDA programming on NVIDIA GPU [52], to provide the computing power for the network. And use the deep learning programming tool, like caffe [35], theano [46], to accomplish the net structure design.

References

  1. Geoffrey E. Hinton, Simon Osindero , Yee Whye Teh. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, MIT Press. (2006)

  2. Yu Kai, Jia Lei, Chen Yuqiang, and Xu Wei. Deep Learning: Yesterday, Today and Tomorrow. Journal of Computer Research and Development. (2013)

  3. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence. (2013)

  4. Andrew Ng, Invited talk on deep learning. (2014) video available: https://www.youtube.com/watch?v=W15K9PegQt0?

  5. Hao Fu, Qian Zhang, and Guoping Qiu. Random Forest for Image Annotation. ECCV (2012)

  6. Ming Liang, Xiaolin Hu. Recurrent Convolutional Neural Network for Object Recognition. CVPR, IEEE conference on computer vision and pattern recognition. (2015)

  7. Raina R, Madhavan A, Ng A Y. Large-scale deep unsupervised learning using graphics processors[C]//Proceedings of the 26th annual international conference on machine learning. ACM, 2009: 873-880.

This work was carried out at the International Doctoral Innovation Centre (IDIC). The authors acknowledge the financial support from Ningbo Education Bureau, Ningbo Science and Technology Bureau, China's MOST, the University of Nottingham, and Shenzhen University. The work is also partially supported by EPSRC grant no EP/L015463/1.