Horizon CDT Research Highlights

Research Highlights

Histopathology Image Analysis

  Jingxin Liu (2013 cohort)

Introduction and Motivation

Histopathology is diagnosing disease by examining tissue in the medical specialty. In traditional pathology, histopathologists use optical or electron microscopes to examine glass slides containing thin sections of human tissue [1], for example to diagnose colorectal or prostate cancer. Histopathology consider numerous structures (such as nuclei, stroma, and cytoplasm) and tissues (for example, epithelium, glands, and lymphatic vessels) distributed in the images. Manual analysis is the primary way to identify and determine the cancerous tissues even in current days, and it heavily depends on the experience of histopathologists [2]. Such manual work has obvious disadvantages, it is time consuming, and subject to mistakes.

With the recent advent of whole slide digital scanners, histopathology tissues can now be computerized and stored in digital image form. In the past few decades, dramatic development in computer science and improvement in image processing algorithms have promoted the computer-assisted image analysis development approaches to microscopic data [3]. Therefore, histopathological computerised image analysis has now become achievable and it is an inter-discipline of computer vision, machine learning techniques and pathology.

Computer-assisted diagnosis (CAD) has begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. CAD promises significant benefits in the delivery of patient care. For example:

  • Manual qualitative analysis can be transformed into quantitative analysis in an objective way. The subjective devastating mistakes would be significantly reduced by different users under the same constraints.
  • CAD can automatically provide reliable quantitative measurements and can reduce time, cost and increase the accuracy of diagnosis.
  • Previous slides can easily be accessed when they are needed, and it is store conveniently.

Histopathology Image Processing and Analysis

After fixation and tissue processing, the tissues can be dyed with stains for visualizing under microscope [2]. Different stains can give different colours and label different components of tissues. Below, three major types of stains will be discussed: Hematoxylin and Eosin (H&E), Immunohistochemistry (IHC), and Immunouorescence (IF) (see Figure 1).

Figure 1: Image examples for (a) Hematoxylin and Eosin, (b) Immunohistochemistry, and (c) Immunouorescence. Then the stained tissues can be digitized for CAD. A typical CAD system contains image preprocessing, structure segmentation, feature extraction, feature dimension reduction and feature-based classification [2]. The first three steps would use image processing techniques, while machine learning would be utilized in the last two steps. The order of these procedures may be changed in reality, and some applications may focus one or two procedures omitting the rest procedures. For instance, [4] introduces a new approach for segmenting overlapping cell nuclei without classification and analysis.

Similar to other computer vision applications, image preprocessing is significantly important, which can be utilized to reduce the computational cost [2]. Image enhancement techniques can adjust the images (e.g. contrast stretching, histogram equalization) so that those images can be more suitable for later image analysis. Image segmentation is employed to segment the tissue according the structures and extract objects of interest. Those entities can be used for disease identification or further classification. Image segmentation is one of the most important procedures in the whole CAD system, as the performance would directly extract feature the quality of extraction and the accuracy of feature-based classification. It can be classified into two groups: colour based and morphology based. However, those methods may become inaccurate when detecting overlapping nuclei, which is still a big challenge today. Further, for different stained images, the methods would be different. The analysis of gland architecture can also reflects the cancer stage, and it has evolved into an important aspect of cancer detection. In recent years, machine learning techniques have been proposed to increase the accuracy and robustness, such as support vector machine (SVM) [5], decision tree, hidden Markov model (HMM), and random field [6] of supervised algorithms or K-means and fuzzy c-means of unsupervised algorithms [2].

After applying those procedures discussed above, the objects of interest can be highlighted for pathologists. However, if we want computer identify the disease or grade the cancer, feature extraction, feature dimension reduction and feature-based classification should be implemented. Researchers have presented various feature extraction and description algorithms, such as SIFT, HOG, LBP and MROGH. Finally, classification algorithms can be implemented based on the data which have remove or combine features that have redundant correlation.


The procedures of CAD have been discussed in section 2. As an inter-discipline of computer vision, machine learning techniques and pathology, most algorithms and methods in computer vision and machine learning are essential as well as image processing methods. Those techniques can be divided into two groups: Firstly, Image processing techniques such as colour model transformation and multicolour separation mentioned above are used to detection. Colour normalization is used to reduce the effects in histopathological images due to variation in staining and scanning. Besides, texture is also a very important characteristic. Therefore, the techniques such as edge detection, histogram-based methods, and watershed transformation are necessary. In addition, image preprocessing and feature extraction are also belong to this group.

Dimensionality reduction aim to reduce the feature dimensionality based on some criterion. Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) are three most commonly used methods. Support Vector Machine (SVM), Decision tree C4.5, naive Bayes has been introduced for stained colour detection. In addition, random field and deep learning can also be used for gland structure detection.


  1. A. Mescher, Junqueira's Basic Histology: Text and Atlas, Thirteenth Edition. Basic Histology, McGraw-Hill Education, 2013. [2. L. He, L. R. Long, S. Antani, and G. Thoma, \Computer assisted diagnosis in histopathology," Sequence and Genome Analysis: Methods and Applications, pp. 271{ 287, 2010.
  2. L. Ib_a~nez, W. Schroeder, L. Ng, and J. Cates, The ITK Software Guide. Kitware, 2003.
  3. J. Shu, H. Fu, G. Qiu, P. Kaye, and M. Ilyas, \Segmenting overlapping cell nuclei in digital histopathology images," in Engineering in Medicine and Biology Society (EM- BC), 2013 35th Annual International Conference of the IEEE, pp. 5445{5448, IEEE, 2013.
  4. P. Foggia, G. Percannella, P. Soda, and M. Vento, \Benchmarking hep-2 cells classification methods," 2013.
  5. H. Fu, G. Qiu, M. Ilyas, and J. Shu, "Glandvision: A novel polar space random field model for glandular biological structure detection," in Proceedings of the British Machine Vision Conference, pp. 42.1{42.12, BMVA Press, 2012.

This work was carried out at the International Doctoral Innovation Centre (IDIC). The authors acknowledge the financial support from Ningbo Education Bureau, Ningbo Science and Technology Bureau, China's MOST, and the University of Nottingham. The work is also partially supported by EPSRC grant no EP/G037574/1.