AidData uses AI to code 2.7 million projects to the Sustainable Development Goals

Analysis of a first-of-its-kind dataset shows many goals are neglected, with low-income countries falling behind.

September 27, 2023

Sariah Harmer

The Sustainable Development Goals are presented during a celebration for the International Day of Rural Women. Photo by Happiraphael via Wikimedia, licensed under (CC BY-SA 4.0).

“Business-as-usual approaches will not achieve the Sustainable Development Goals by 2030 or even 2050.” - The 2023 Global Sustainable Development Report (GSDR)

Last week, as the United Nations General Assembly convened the 2023 SDG Summit, the world looked to the future—and the past. Eight years have passed since the official “Agenda 2030 for Sustainable Development” was originally announced. While time ticks past the halfway mark, top of the agenda is how to renew critical interest and spark meaningful acceleration in efforts to achieve the 17 ambitious goals set in 2015. If we are to collectively achieve the Sustainable Development Goals (SDGs) by 2030, something must change—but without clear, comparable data, it’s difficult to identify areas of genuine progress or backsliding.

The largest knowledge gap is around financing: who is spending how much on which activities to accomplish what goals? To answer this, AidData has capitalized on cutting-edge machine learning and natural language processing tools to develop an “SDG Autocoder” that has resulted in a dataset of over 2.7 million development projects mapped to individual SDG targets—the largest of its kind ever produced. The results are analyzed in a new policy report published last week, Financing Agenda 2030: Are donors missing the mark on the Sustainable Development Goals?. Below, we preview some findings and explain how AidData was able to assemble this novel data source.

Explore interactive graphics and additional findings from our new report, Financing Agenda 2030

‍

Using machine learning to link development projects to the SDGs, at a granular level

While official targets to measure progress toward the SDGs were set in 2017, accounting for the financial flows that back them faces three main challenges.

First, while the Creditor Reporting System (CRS) of the Organisation for Economic Co-operation and Development (OECD) tracks official development assistance and private philanthropy, it predates the Sustainable Development Goals. As a result, its purpose codes do not correspond neatly with the SDGs.

Second, under the current reporting system, donors are not required to report whether projects contribute to the goals, let alone at the specific targets for each goal. Recently, it has grown more common for donors to report on whether projects aim for specific SDG targets, but this reporting remains voluntary and relies on donor discretion. In 2021, only 44% of projects received an SDG code, with no projects in 2017 or earlier receiving such classification. As a result, the OECD CRS data has significant inconsistency across donors and throughout time, which limits analysis.

Third, in order to assess whether or not the goals kick-started meaningful change in funding levels and allocation, researchers need a way to compare SDG projects to those with similar attributes under the auspices of the previous set of global goals, the Millennium Development Goals. Taking a comprehensive picture requires sorting, analyzing, and labeling millions of projects representing the global undertakings of over two decades—an extremely time- and labor-intensive task, impossible for humans to do alone.

While advances in machine learning and natural language processing make it easier than ever to parse large amounts of text quickly, those tools require structured datasets from which to learn. These training datasets must contain enough information to cover the breadth of all 17 SDGs, from “No Poverty” to “Gender Equality” to “Life Below Water.” With a dataset of around 100,000 projects hand-coded across two years by a faculty-student team of AidData researchers and 36 William & Mary student research assistants, AidData was uniquely positioned to fill this gap.

Using the hand-coded dataset, AidData trained three machine learning models (fasttext, keras and sklearn) and developed an autocoder to compare the results of the different models and determine the correct set of attributes for each project.

“The scope of the data coded to the SDGs is vast,” said Bryan Burgess, an AidData Program Manager who led the development of the SDG Autocoder and is lead author of the report. “So far, we’ve coded 2.7 million projects. It took two years to produce AidData’s original SDGs-coded dataset, which covered only about 100,000 projects from 2014-2016. By comparison, researchers using our SDG Autocoder can code ten years of data in a single day, with an accuracy rate of 85-95%,” said Burgess. As with all AidData research, the forthcoming dataset and SDG Autocoder methodology will be made freely available, as a public good.

Ahead of the release of the dataset, AidData has now published a policy report on Financing Agenda 2030. It draws on the data to answer three key questions about progress—or lack thereof—in financing the Sustainable Development Goals:

How did donors deploy limited resources and attention to the breadth of the SDGs agenda?
How resilient are the SDGs as a unifying agenda for development in the face of global shocks like COVID-19?
To what extent does financing for the SDGs vary by geography, and which countries are on track to getting the funds required versus those at risk of being left behind?

“Cross-comparable project-level data like this is key to help measure progress, crowd in funding to priority areas, and allow policymakers to make more informed decisions,” said Samantha Custer, AidData’s Director of Policy Analysis and a co-author of the report. “This information helps us move beyond what donors say they want to achieve, and more closely scrutinize whether and how they spend their money in ways that advance the SDGs,” she continued.

Donor financing is missing the mark

The researchers find that several goals attracted significantly more attention overall. Health, governance, and economic growth brought in the most financing, while inequality and environmental agendas were consistently under-resourced. “In 2015, when the SDGs were adopted, all but about a dozen donors already allocated more than 50% of their funding to projects that could be classified as SDG-related,” said John Custer, AidData’s Deputy Director of Communications and Data Analytics and a co-author of the report. “In other words, the SDGs largely represented what donors were already funding in 2015. Goals that received less attention before the SDGs were adopted—namely those related to environment, equity, and equality—did not receive proportional increases in funding after the SDGs were adopted,” he continued.

In analyzing the financing of the SDGs, AidData examined how the shock of the COVID-19 pandemic affected overall development spending. The authors found that countries shifted money away from infrastructure and economic growth but towards health, poverty, and governance.

‍

However, not all countries had equal access to problem-stopping aid. There is a widening gap between financial inputs and what local needs actually require. The report finds that donors are increasingly focused on middle-income countries and are spending less on low-income countries. As the goal of SDG 1 is to eliminate extreme poverty, we are unlikely to achieve it unless we can shift that pattern.

‍

Geographically, different regions tended to attract different kinds of thematic funding. African countries attracted more financing related to health and agriculture. Health spending in sub-Saharan Africa accounted for a significant portion of overall health spending, tipping SDG 3 (Good Health and Well-being) towards the top thematic financing spot. On the other hand, South and Central Asia attracted more funding towards SDG 11, Sustainable Cities and Communities, and SDG 7, Affordable and Clean Energy.

In contrast to other thematic areas, climate funding was more evenly geographically distributed but tended to be concentrated in larger countries. An interesting example of this trend can be seen in the Western hemisphere, which received the greatest share of funding dedicated to climate action. A single country, Mexico, received over a third of the region’s funding ($2 billion of the $5.8 billion spent in the Western Hemisphere), while regional funds directed to the Caribbean and Central America, a region particularly vulnerable to climate change, amounted to just under half a billion ($479.8 million). Even then, money appears to be directed towards larger countries, while the small island nations facing the most immediate climate risk received less; for example, only $756,000 was directed towards Saint Kitts and Nevis.

‍

Given the fact that SDGs largely continued trends from the MDGs and did not drastically change behavior, more needs to be done than to simply shift resources to different sectors. An overall increase in development financing is needed to fill the gap.

The UN’s SDG Summit last week represents an important step towards goal 17, Partnership for the Goals, but true commitment needs more than nominal agreement. As the halfway point towards Agenda 2030 passes by, it is increasingly important to coordinate our efforts, identify areas for improvement, and put observations into action.

“This dataset allows us to more easily and comprehensively monitor, evaluate, and compare project portfolios across donors in a unified fashion,” said Samantha Custer. “We hope that the report, datasets, and methods will be a useful tool for leaders in the Global South and Global North to assess current circumstances, identify areas for modification and improvement, and make informed choices about where investments can do the most good to advance the SDGs in practice for a better future,” she added.

Sariah Harmer was previously a Communications Associate at AidData.