Horizon CDT Research Highlights

Research Highlights

A Sociotechnical Evaluation of Differentially Private Risk Assessment Models in the Consumer Credit

  Ana Rita Pena (2019 cohort)   www.linkedin.com/in/ana-rita-pena-2b02a3205

Machine learning algorithms (ML) are being adopted to automate a variety of tasks, from credit loan decisions to health diagnostics, among many others. More recently in the area of credit risk assessment, due to the advances in Machine Learning and a bigger importance of risk assessment due to the 2008 financial crisis, there has been  a rise in the implementation of ML algorithms and use of alternative data sources in the area. These implementations are meant to create more accurate and efficient methods, and by using different data sources they are also able to score people that previously were excluded form the credit industry. However, as these technologies become ever more complex there has been a demand for more transparency regarding models. If companies are required to share their models (either with the regulator or wider public) they still have a duty to protect their consumers privacy. One way to guarantee this is to implement a differentially private machine learning model. Differential Privacy is a state-of-the-art Privacy Enhancing Technology which allows one to gather aggregated information without risking individual's privacy, however it comes at the cost of a privacy accuracy trade-off.

General Research Question: What is the impact of the potential implementation of Differential Privacy in Credit Risk assessment on consumer credit applications?  

My approach to this research question takes in consideration all stakeholders involved while still having a user/consumer/human centred approach, as these are the most affected and less powerful stakeholders. In order to start answering the question above the following studies were design each with research questions of their own to start gaining knowledge on the industry, its impact and the technology. 

Attitudes and Experiences with Loan Applications 

In this study we aim to understand participant’s sensemaking of their experiences when applying for loans, as well as their attitudes regarding automation, data sharing and fairness of the process. In this context automation encompasses processes from the statistical and ML methods used for decision making, to data gathering making use of different information systems, to the automation of customer service, as well as application process itself (for example short online forms).  

Our study focuses specifically on the UK consumer credit industry. This contribution differs from existing literature regarding algorithmic sensemaking as there is a lack of agency on the part of the user in the process. It also addresses the lack of users’ perspective on the role of technology in financial services.

UK Consumer Credit Industry Stakeholder Consultation 

This interview-based study with participants who work or have worked within or with the UK Consumer Credit Industry aimed to ground informal knowledge on the workings of the consumer credit industry on participant’s data. The interview was divided into two parts, the first aimed to gain better understanding of the Consumer Credit Ecosystem, including gaining a better awareness and understanding of the role and inner workings of the different stakeholders, and interactions between them. As well as understanding the process of new tech implementation in the industry: which stakeholders are involved and how? What are the power differences between stakeholders and how does this affect tech implementation? Which external factors are at play?  

The second part of the interview was aimed at understanding the importance and current practices regarding privacy in the industry as well as future directions and gather Stakeholders attitudes towards Differential Privacy and potential impacts of its implementation in the industry. 

Differentially Private Decision Tree based Models: exploratory inquiry  

The Differentially Private Decision Tree based Model study is of an exploratory nature and consists of the implementation of different DP models on three credit-related open-source datasets to compare each algorithm’s effect on accuracy and subgroup accuracy.  

 As literature has exposed a disparate accuracy loss over different subgroups of the training dataset when implementing a private version of Stochastic Gradient Descent, ​the aim of this study is to enquire into the existence of a disparate accuracy drop in different DP algorithms all based on Decision Trees (commonly used in practise and in the Credit Industry). A Smooth random Forest and different configurations of Differentially Private Gradient Boosting Machine were trained with three different datasets and compared to a differentially private logistic regression and a non-private GBM (using the library LightGBM).  

Consumer Exploration of Differentially Private Sociotechnical Credit Imaginaries 

The Differentially Private Consumer Credit Imaginaries Study consists of a focus group activity which aims to understand how users/consumers perceive the implementation of Differential Privacy in different scenarios, generated by a board game type of activity and following discussion. 

The study involved involve an in-person focus-group where a game board style activity served as a starting point to discuss participant’s attitudes towards Differential Privacy in Credit.   

This author is supported by the Horizon Centre for Doctoral Training at the University of Nottingham (UKRI Grant No. EP/S023305/1).