A “new” type of data for development

AidData’s next-generation measures of important development outcomes use machine learning to help policy makers overcome data gaps.

November 13, 2020

John Custer, Seth Goodman

This map shows the predicted conflict fatality risk for 2019, ranging from low (green) to high (red). Green points show locations with no deaths from known conflict events, while red points show locations with greater than 0 deaths from known conflict events. View the full map at: https://www.aiddata.org/outcome-measures

Let's say you're a development agency official and you have this question: "Is it safe to send aid workers into an area that urgently needs aid but is known to be prone to conflict?" You need to answer this question if you’re to allocate resources in the next 12 months, proactively assess risk levels, and calibrate your contingency plans. If you overestimate risk to your aid workers, communities that need help won't get it. If you underestimate risk, you could be putting people in harm's way.

What kind of information might you have at your disposal? You’ll likely have reports from your embassy’s security office and non-governmental organizations working in the region, but it can still be hard to draw accurate forecasts from these sources, especially when projecting 6 or even 12 months out. How would your decision-making workflow change if you could supplement your normal sources of data with a map that forecasted, with 80% accuracy, the likelihood of fatalities on the ground from a conflict event in the next 6-12 months?

This kind of map is what’s called a "predicted surface," and AidData is working to create and share these predicted surfaces, and the data behind them, to show how the poverty or security situation for a geographic area changes over time. In partnership with USAID, we are developing an innovative approach that uses machine learning algorithms to find features from daytime satellite imagery that are predictive of conflict deaths in Nigeria. This is an experimental approach that can be applied to other countries and contexts. Our first wave of research and findings from this work was recently published in Transactions in GIS.

Predicted surfaces can be generated for many types of development outcome measures—such as poverty, child mortality, or deforestation—where direct data may be lacking. In partnership with the Millennium Challenge Corporation, we are applying a similar machine learning-based approach to generate more accurate estimates of changes in average household wealth from satellite imagery. In a broader geospatial impact evaluation of MCC programs to improve rural road networks in Tanzania and Ghana, we use these estimates to compare changes in poverty over time between regions around road improvement projects and control locations that were not near improved roads.

Listen: Seth Goodman joins Humanitarian AI Today podcast

These new types of outcome measures are produced using a type of machine learning algorithm known as convolutional neural networks (ConvNets, or CNNs). ConvNets are a kind of "computer vision" algorithm that learns what features, or patterns, in a set of images correspond to labels for what these images really are.

While this approach is common in other image recognition applications—such as recognizing handwriting, or telling cats and dogs apart—a lack of training data and more complex research questions have made it challenging to extend machine learning approaches to problems faced by international development practitioners and policymakers.

Geospatial data presents a unique opportunity for the application of ConvNets, as widely available satellite imagery can be labeled with any existing data, such as from surveys, using their locations. Instead of detecting patterns associated with cats or dogs, ConvNets using satellite imagery can detect patterns in the landscape such as urban areas, crop fields, infrastructure, or other features that may be associated with poverty or conflict. By identifying those features, limited on-the-ground survey data can be extrapolated using satellite imagery to additional geographic areas or points in time.

Predicted surface data and machine learning methods such as ConvNets do not replace existing data sources such as surveys—they supplement it by filling in crucial data gaps. Likewise, machine learning algorithms do not replace subject matter expertise or human judgment—they enhance it. This “augmented intelligence” approach is especially useful for scenarios when predictions can be analyzed by an expert in the field to assess trends, or when project evaluators might be missing a key indicator on an outcome of interest at a critical point in time such as baseline data at the start of a project.

In many ways, the development of machine learning-based outcome measures is a natural extension of earlier methodological innovations at AidData, such as geospatial impact evaluations (GIEs). We’ve employed GIEs to show how geospatial data can be used for resource allocation and to conduct rigorous evaluations of development project impacts. Our GIEs have incorporated geospatial and other forms of data for a wide variety of evaluations, from analyzing agriculture in Afghanistan to the governance of communes in Cambodia and child mortality rates in the Democratic Republic of Congo.

For these projects, the impetus to innovate came from a common set of challenges faced by many development practitioners. Survey data and other outcome measures are valuable, but often in short supply and extremely limited even when available. What’s more, large data collection efforts don't always provide a good return on investment and can divert resources from important public service delivery to data collection. To innovate around some of these barriers in our GIEs, AidData has in the past used proxy measures, such as measuring economic activity by how light or dark an area appears in nighttime satellite imagery or pinpointing landscape features in daytime satellite imagery that correspond with economic wellbeing or environmental health.

With the use of ConvNets to produce outcome measures that otherwise wouldn’t be available, we can expand the kinds of questions that can be answered using GIEs and other evaluation methods. With support from the Cloudera Foundation, we are working to incorporate new data such as predicted surfaces into GeoQuery, our free tool that lets users find and integrate geospatial data for the specific geographic area they need. We are also working with Microsoft AI for Earth to make the algorithms behind our machine learning-generated outcome measures available on a cloud platform, so that development organizations and researchers in Africa and around the world have the computational resources to apply pre-trained neural networks to their specific problem sets.

John Custer is AidData's Deputy Director of Communications and Data Analytics.

Seth Goodman is a Research Scientist at AidData.