Background

Who am I? One’s identity is a puzzling thing, and is something that we only learn about slowly throughout our life. What exactly constitutes one’s identity is perhaps not entirely known, but we do know that part of it is one’s personality. In the past decades, personality has been conceptualized from various theoretical perspectives, and at different levels of breadth or abstraction [1, 2]. Traditionally, people’s personality evaluation require extensive participation from experienced psychologists and an understanding of the individual’s psychological testing records, history, self-reporting, and assessment during interviews [3]. This is often a lengthy procedure, and relevant data or experts may not always be accessible. As a result, there is an increasing demand for shorter and simper personality measurements [4]. Nowadays, the increasing number of video channels from the internet allows us to store a myriad of spontaneous nonverbal cues extracted from our physical appearance [5]. Thus, it is interesting to see if an automatic system can be built, which can incrementally learn non-verbal cues from facial expressions and audio signals, and predict different traits of people’s personalities

Research questions

The main research question of this project is that how can we use non-verbal information from video and audio to predict personalities only using machine learning-based systems. To be more specific, we want to explore:

the relationship among facial expressions (facial action units (AUs)) and each personality trait;
how to construct a set a useful features for personality prediction;
which machine learning models are most useful for our task and how to combine them;
how to extend this technique to medical applications.

Methodology

In this project, several state-of-the-art machine learning technologies will be utilized, including Deep Learning [6], Cooperative Learning [7], Bi-directional Long Short-Term Memory Neural Networks [8], Probabilistic Graphical Networks et al.. Meanwhile, feature selection methods will be used to select different audio and video features that correlate to each traits of the personality and depression.

Contributions to knowledge

The expected contributions of this PhD project can be summarized as following:

Explore the relationship between each trait of the personality and each combination of AUs.
Create an algorithm to predict people’s personalities from facial expressions and speech signals using the state-of-the-art machine learning technologies.
Explore the relationship between the depression and each combination of AUs.
Create an algorithm to diagnose people’s depression from facial expressions and speech signals using the state-of-the-art machine learning technologies.
Collect a video and audio database for further automatic personality prediction research.

Highlighted Research Output

An fully automatic depression analysis system based fourier transform, which can detect people's depression status from non-verbal facial behaviours. (https://ieeexplore.ieee.org/document/8976305)
An automatic face-based personality traits analysis system using deep learned person-specific representations.
Two AI challgenges (AVEC 2019 and EmoPain 2020) held in International conferences (USA and Argentina).

References

[1] O. P. John, S. E. Hampson, and L. R. Goldberg, "The basic level in personality-trait hierarchies: studies of trait use and accessibility in different contexts," Journal of personality and social psychology, vol. 60, p. 348, 1991.

[2] P. S. Macadam and K. A. Dettwyler, Breastfeeding: biocultural perspectives: Transaction Publishers, 1995.

[3] T. Yingthawornsuk, H. K. Keskinpala, D. M. Wilkes, R. G. Shiavi, and R. M. Salomon, "Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech," in INTERSPEECH, 2007, pp. 766-769.

[4] B. Rammstedt and O. P. John, "Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German," Journal of research in Personality, vol. 41, pp. 203-212, 2007.

[5] L. E. Buffardi and W. K. Campbell, "Narcissism and social networking web sites," Personality and social psychology bulletin, vol. 34, pp. 1303-1314, 2008.

[6] S. Jaiswal and M. Valstar, "Deep learning the dynamic appearance and shape of facial action units," in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, 2016, pp. 1-8.

[7] Zhang, Zixing, et al. "Cooperative learning and its application to emotion recognition from speech." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23.1 (2015): 115-126.

[8] Graves, Alex, and Jürgen Schmidhuber. "Framewise phoneme classification with bidirectional LSTM and other neural network architectures." Neural Networks 18.5 (2005): 602-610.

Research Highlights