In recently years, the machine learning community has shown an increasing interest in deep learning: modeling high-level abstraction in data with model architectures, which are composed of multiple non-linear operations. Some major state-of-the-art results in computer vision are achieved by using deep convolutional neural networks (CNNs). However, training a deep CNN needs large amounts of data that is not always available in practice and it is very difficult to train a very large CNN from scratch. My research focuses on the applications and new models of deep CNNs in computer vision, and using well-trained CNNs for specific vision problem to help solve other computer vision problems based on the transfer learning concept.
Basis of CNNs
Unlike traditional fully connected Artificial Neural Networks, CNNs restrict the connection between hidden units and input units, i.e., locally connected, allowing each hidden units only connect part of input units. CNNs architectures make the explicit assumption that the inputs are images, and the neurons in CNNs are arranged in 3 dimensions: height, width and depth. Each layer of CNNs transforms one volume of activation to another. Usually Convolutional layer (CONV), Pooling layer (POOL) and Fully Connected (FC) layer are stacked together to form a full CNNs architecture.
LeNet -- The first successful application of CNNs was developed by LeCun et al. (1998) , and used to read zip codes and digits.
AlexNet -- Created by by Krizhevsky et al. (2012), the winner of ImageNet ILSVRC challenge in 2012, and it popularized CNNs in computer vision.
ZFNet -- ILSVRC 2013 winner introduced by Zeiler and Fergus, (2014),it is based on AlexNet and tweak the architecture hyperparameter, especially expanding the size of middle convolutional layers.
GoogLeNet -- The ILSVRC 2014 winner, created by by Szegedy et al. (2014), and the parameters are dramatically reduced in their model.
VGGNet -- Runner up in ILSVRC 2014, introduced from Simonyan and Zisserman (2014), it is the most preferred choice when extracting features from images, despite it is slightly bad classification performance.
Razavian et al. (2014) extracted features from the OverFeat network (Sermanet et al., 2013) and used these features as generic representation to tack different recognition tasks with linear SVM classifier. Donahue et al. (2013) reported the similar results and released DeCAF, an open-source implementation of deep convolutional activation features. Yosinski et al. (2014) studied the transform performance of deep CNNs features and reported some unintuitive finding on layer co-adaptations.
Though there exist some insights on transfer learning from deep convolutional networks, there are still some questions needed to answer.
- In terms of using CNNs as fixed feature extractor, what kind of features should be decided for new computer vision tasks? And what kind of algorithm should be chosen to apply to new tasks?
- Another possible way of transfer learning is directly to fine-tune pre-trained deep DNNs with new datasets by continuing the backpropagation. The performance and tradeoff of this approach should be explored more.
- It is lack of theoretic analysis to support the generalization of transfer learning.
- Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E. and Darrell, T. (2013) 'Decaf: A deep convolutional activation feature for generic visual recognition', arXiv preprint arXiv:1310.1531.
- Krizhevsky, A., Sutskever, I. and Hinton, G. E. 'Imagenet classification with deep convolutional neural networks'. Advances in neural information processing systems, 1097-1105.
- LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) 'Gradient-based learning applied to document recognition', Proceedings of the IEEE, 86(11), pp. 2278-2324.
- Razavian, A. S., Azizpour, H., Sullivan, J. and Carlsson, S. 'CNN Features off-the-shelf: an Astounding Baseline for Recognition'. Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on: IEEE, 512-519.
- Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. and LeCun, Y. (2013) 'Overfeat: Integrated recognition, localization and detection using convolutional networks', arXiv preprint arXiv:1312.6229.
- Simonyan, K. and Zisserman, A. (2014) 'Very deep convolutional networks for large-scale image recognition', arXiv preprint arXiv:1409.1556.
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A. (2014) 'Going deeper with convolutions', arXiv preprint arXiv:1409.4842.
- Zeiler, M. D. and Fergus, R. (2014) 'Visualizing and understanding convolutional networks', Computer Vision–ECCV 2014: Springer, pp. 818-833.
- Yosinski, J., Clune, J., Bengio, Y. and Lipson, H. 'How transferable are features in deep neural networks?'. Advances in Neural Information Processing Systems, 3320-3328.