Horizon CDT Research Highlights

Research Highlights

Evolution of interaction networks in developing countries for social intervention.

  Maddy Ellis (2016 cohort)   www.nottingham.ac.uk/~psxme6

The latest comprehensive data on global poverty in 2013 showed that there are 767 million people estimated to be in below the poverty line. 10.7% of people in the world are in poverty, that is almost 11 people in every 100 people in the world. [1] Despite the number of people in poverty falling globally between 2012 and 2013, poverty in Africa is still widespread, poverty levels in Africa are high relative to all other regions of the world. [1] Poverty is threatening the lives and well-being of an unacceptable proportion of our population.

The data indicating these poverty levels in Africa has been historically bad, in 1990 only 20 countries had data allowing measurements of poverty. [2] Household surveys initially provided some insight into wealth distribution, however these surveys omit a significant proportion of the poorest people, making it poor indicator for poverty. [3] Since then DHS (Demographic and Health Surveys), income and expenditure surveys have been introduced, this has drastically improved the data situation in these developing countries. However there are still massive deficiencies in this data, such surveys are often too infrequent and take too long to have much value. [4]. Surveyed data is hard to obtain as it is labour and cost intensive it is therefore scarce. [5] The deficiency in reliable data explaining local poverty in developing countries restrains the impact of local policy makers, governments and aid organizations. [4] , [5] Accurate estimates of population characteristics such as poverty are critical to development [6] There has been serious concerns for the reliability of quantitative data in developing countries for researchers, National statistics on economic production for example may be off by as much as 50% in Africa. [7]

Satellite data provides a more time efficient approach to investigating the poverty of different areas than traditional surveying. [8] High-resolution satellite imagery, is now increasingly inexpensive and reliable. [5]The increase in satellite data availability has contributed to the study of geo-spatial information with broad applications across many areas including the distribution of poverty [4] [9] Poverty stricken regions however are also the ones which are more likely to have less internal funding, more civil-wars, poor infrastructure and inadequate government resourcing available for research such as surveys and satellite data, hence there are still vast gaps in the collection of reliable data which could be used to describe poverty. [4]

There are however increasingly new sources of collecting data on individuals such as mobile phone and internet records which are enabling new approaches to demographic profiling and opening an exciting field of potential analysis. [10] Data from a communication network of mobile phones and business landlines for example were used to show that communication diversity is a strong indicator for the economic health of communities in the UK. [11] In developing countries there are fewer sources of big data, however mobile phone use is becoming increasingly ubiquitous in these regions and providing a fruitful source of data [6] In regions where resources such as time, labour and money our scarce for such research, this approach creates a method for gathering information on individuals at a fraction of the cost of traditional methods such as surveys and satellite images. [6] The diversity of individuals relationships is a key indicator of social and economic life, until recently this was not so widely quantifiable, we now have data on networks of peoples behaviours which allow us to draw conclusions at a population levels. [11]

Although the ubiquitous use of mobile phones in developing countries creates data with a lot of potential to help organization who are currently struggling to identify the poorest parts of regions, as this is a new form of data there is a lot of work to be done in finding different ways to assess this data for useful results. [12] There is remarkably little known about the demographics of CDR data in developing countries. [14] Such data will serve as a basis for this study of evolving interactions across unprivileged communities The geographic distribution of poverty and wealth can be used to make decisions about resource allocation and has a high potential impact. [13]

This research will be reviewing and aiming to evolve and extend upon existing methods of data analysis using interaction network data such as block models and point processes. Initially, with aims to address demographic studies in Africa with the use of CDR data available from the University of Nottingham N-LAB. They have obtained samples of all customer records from the second largest cell phone company in Tanzania. The stochastic block model produces a useful measure for the task of exploring the community structure within network data. The model produces subsets or communities which are defined by the pattern of connections between the nodes in the network. [16] It has previously been used in the application of fields such as disease transmission or demographics. [15] The block model provides a platform to provide more information on the state and demographics of social network structures.

In addition point processing is the modelling of mathematical objects to represent temporally structured events. [19] Temporally structured phenomena analysed in this way can refer to interactions across communities, individuals or entities. [17] Event series are continuous, irregular and often highly sparse and application areas of such modelling are diverse and could include, earthquake forecasting and health and financial event predictions.[18] Events such as the arrival of air resources and finances can be modelled in this way to learn more about the network of communities in developing regions and countries.

Work done on stochastic block model and point processing could be used to shed some light on the problem and expanded and combined to analyse the dynamic data of the social network.

Along side the network analysis elements of this project I will be collecting ground truth data in Dar es Salaam to validate different approached and underpin this project and other projects within N/Lab. The data collection is apparent in two ways, a grid survey and a street survey. During the street survey we will be asking facilitators to go into subwards and engage with 10 participants per subward asking them to complete our survey. The survey includes 3 sections. A section on bias, checking things such as age, gender and other personal information on our participants to encourage fair results. The next section is around the indicators for vulnerability and poverty. This section includes questions on characteristics of the subward such as education, road quality, household arrangements and employment levels. Each indicator being investigated by the survey is asked via a number of different questions worded in different ways to create results with higher confidence. Finally there is a participant confidence section. This includes questions which refer to the participants knowledge of the area, including questions such as "Do you live in this area, how long have you lived in this area, do you work in this area? The results of these questions gives each participants a confidence score from 1 - 12 which we will then use to weight the value of their results in our analysis. The goal of this survey is to get a fine grained demographic picture of Dar es Salaam. The facilitators of this survey are made up of a group of translators and researchers at the University in Dar who work in collaboration with my professional partner HOT Humanitarian OpenStreetMapping Team. There has been a lot of preparation for this survey, firstly I needed to research literature into three different top- ics; How data on poverty is traditionally surveyed in developing countries and what are the limits and flaws of these, what are the indicating factors of poverty which need to be reflected in the survey and further what is vulnerability? The survey will be a fine-grained demographic illustration of indicating factors of poverty in Dar es Salaam at a sub-ward level. There are 550 subwards in Dar es Salaam and we will be collecting at least 10 results per sub-ward for the street survey. Hence we will be gathering at 5500 results (Each having over 30 answers in them about the area) to paint a picture of the ground truths in the region from the street survey alone.

The goal of the grid survey is to create a map of the affluence of the subward regions of Dar es Salaam. Unlike the Street survey this survey only focuses on the affluence of the regions, without collecting details on any other demographics. The grid survey was completed using an online questionnaire platform with a questionnaire with an interface similar to traditional Q-sort and pile sort methods. Traditional card sort methods such as pile sort and Q-sort can be closed or open. They require participants to partition a selection of cards into categories, when closed the number of categories is decided upon before the survey. Open pile sorts allow the participants to also determine the number of categories they want to cluster the cards into. Pile sorting can be moved onto online platforms allowing participants to drag their cards on screen. This is a great method for bridging the gap between quan- titative and qualitative data research allowing participants to show their opinions in a way easily analyzed by numerically using similarity matrix’s and standardization grids. The participants for this survey were made up of local experts. We are defining local experts as people who are familiar with the areas, hence I have been building relationships with taxi companies, local researchers in the universities to find knowledge- able participants. Another way of increasing the confidence in out results is an initial filter. When the participants started the workshop they were first presented with a map of Dar es Salaam and asked to highlight the areas which they are familiar with. This then automatically filters out any regions they don’t know only asking them to sort areas they are familiar in to see what the area looks like, where the area is in the city and what amenities are available. The participants were then asked not to sort these subwards into piles as with the Q-sort method but to shuffle sort the subwards. For example given 4 subwards on screen sort these subwards into order based on their level of affluence. This will produce a number of partial ordered lists from participants. We will then be creating an algorithm to combine these lists to create an overall spread of the poverty levels of the subwards. The subwards were shown in random order to each participant to prevent bias of participants becoming more tired towards the end of their completion of the survey.

Multidisciplinary Statement.

A mathematical and analytical approach to humanitarian and socio-technical is- sues with the use of digital technologies. This topic in- volves the combination of multiple disciplines including: mathematics, computing, computational sociology, geospatial and human factors and the digital economy.

Horizon Relevance.

This PhD will be linking into some of the key areas of Horizons focuses: Global Impacts, Data Science, Large Data, Digital economy and Public Engagement.


[1] Timm Bönke Soumya Chattopadhyay Shaohua Chen Will Durbin María Eugenia Genoni Aparajita Goyal Christoph Lakner Terra Lawson-Remer Maura K. Leary Renzo Massari Jose Montes David Newhouse Stace Nicholson Espen Beer Prydz Maika Schmidt José Cuesta, Mario Negre and Ani Silwal. Poverty and shared prosperity. Technical report, Taking on Equality World Bank, 2016

[2] Finn Tarp Channing Arndt, Andy McKay Growth and poverty in Sub-Saharan Africa. Oxform University Press, United Stated of America, 198 Madison Avenue, New York, NY 10016, 2016

[3] Roy Carr-Hill. Improving population and poverty estimates with citizen surveys: Evidence from east africa. World Development, 93:249 – 259, 2017

[4] Stefano Ermon George Azzari Marshall Burke Anthony Perez, Swetava Ganguli and David Lobell. Semi-supervised multitask learning on multispectral satellite images using wasserstein generative adversarial networks (gans) for predicting poverty. Technical report, Stanford University, 2016.

[5] Michael Xie, Neal Jean, Marshall Burke, David B. Lobell, and Stefano Ermon. Transfer learning from deep features for remote sensing and poverty mapping. CoRR, abs/1510.00098, 2015

[6] Joshua Blumenstock, Gabriel Cadamuro, and Robert On. Predicting poverty and wealth from mobile phone metadata. Science, 350(6264):1073–1076, 2015

[7] M. Jerven Poor Numbers: How We Are Misled by African Development Statistics and What to Do About It. Cornell Univ. Press, 2013.

[8] Gary R. Watmough, Peter M. Atkinson, Arupjyoti Saikia, and Craig W. Hutton. Understanding the evidence base for poverty environment relationships using remotely sensed satellite data: An example from assam, india.World Development, 78:188 – 203, 2016

[9] Neal Jean, Marshall Burke, Michael Xie,W. Matthew Davis, David B. Lobell, and Stefano Ermon. Combining satellite imagery and machine learning to predict poverty. Science, 353(6301):790–794, 2016.

[10] Gary King. Ensuring the data-rich future of the social sciences. Science, 331(6018):719-721,2011.

[11] Nathan Eagle, Michael Macy, and Rob Claxton.Network diversity and economic development. Science, 328(5981):1029–1031, 2010

[12] Chris Smith-Clarke and Licia Capra. Beyond the baseline: Establishing the value in mobile phone based poverty estimates. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, pages 425–434, Republic and Canton of Geneva, Switzerland, 2016. International World Wide Web Conferences Steering Committee.

[13] Gary S. Fields. Changes in poverty and inequality in developing countries. The World Bank Research Observer, 4(2):167–185, 1989.

[14] J.E.Blumenstock, D.Gillick, N. (2010). Whos calling? de- mographics of mobile phone use in rwanda.

[15] Carrington, P. J., Scott, J., and Wasserman, S. (2005). Models and methods in social network analysis, volume 28. Cam- bridge university press.

[16] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077–1087

[17] Kingman, J. F. C. (1993). Poisson processes. Wiley Online Library.

[18] Goulding, J., Preston, S., and Smith, G. (2016). Event se- ries prediction via non-homogeneous poisson process modelling. In Data Mining (ICDM), 2016 IEEE 16th In- ternational Conference on, pages 161–170. IEEE.

[19] Doob, J. L. (1953). Stochastic processes, volume 7. Wiley New York.

This author is supported by the Horizon Centre for Doctoral Training at the University of Nottingham (RCUK Grant No. EP/L015463/1) and Humanitarian OpenStreet Mapping.