AidData’s promise to users of its data products

Today marks an important milestone for AidData: we’re going public with our Data Management Plan (DMP).

November 9, 2015

Brooke Russell, Scott Stewart

Today marks an important milestone for AidData: we’re going public with our Data Management Plan (DMP). Why is this significant? It’s a public commitment from AidData to our users on what they can reasonably expect of us in terms of when and how we will collect, clean, standardize, and update data. The DMP allows users to ‘look under the hood’ of AidData’s data production processes to see what we do between the initial sourcing of data and final publication. We are committed to open data, and an important part of that commitment is being transparent about how we collect and produce data.

Why are we doing this? The DMP is the public-facing piece of a much larger, behind-the-scenes effort to enhance and streamline data collection, production, publication, usability and uptake. Several years ago, AidData transitioned away from a model of updating one central database and towards the regular production a broad suite of data products. We made this change to more effectively address “revealed demand” from our users. However, it is also true that many users find the distinctions between these data products, their sources, and how to use them confusing. The DMP seeks to correct this problem; it details the full range of data products that AidData generates and the underlying processes used for each product.

Our Data Team handles hundreds of requests each year (most come in via data@aiddata.org), and we hope that the DMP addresses many of the questions that we hear on a consistent basis. The DMP brings more transparency to our sourcing, standardization, and quality assurance processes, and the ways in which we add new value to our data products (e.g. through geocoding and activity coding).

Developing and publishing a data management plan is one of several steps we are undertaking to streamline our work and drive greater efficiency into our data production and quality assurance processes. We’ve also developed a new internal system that allows us upload data, execute all of our value addition activities (such as geocoding and activity coding), and publish directly to the core database available at aiddata.org. This new technology allows us to generate high-quality data products more quickly than before. So far in 2015 alone, we have released nine new geocoded datasets and updated three datasets from 2014. We have also imported new data through 2013 for most donors and have increased the number of activity-coded projects to over 951,000, representing over 63% of the core dataset.

To better understand the various elements of AidData’s data production and quality assurance processes, we invite you to dive into our Data Management Plan. Here is the quick, helicopter tour of what you will find:

Sources

(Where we find them, how we standardize them, and what products they are used in)

Donor Systems: Official data made available by the donor agencies themselves (including non-DAC bilateral donors and multilateral donors)
Recipient Systems: Official data from the Aid Information Management Systems (AIMS) of partner countries
Reporting Systems: Data from official reporting systems, such as the OECD’s Creditor Reporting System
Pure Aggregates: Data available at the country-year level from various official sources that provide a broader view of the total resource envelope available for developing countries

Products

(What they are, what data is included, and how to access them)

Portal: Datasets uploaded and made searchable through the interface at www.aiddata.org, including the following:
Core, Project-level Data: This is the main repository of official, project-level data from over 90 bilateral and multilateral institutions, standardized into one dataset tracking over $7.1 trillion dollars in international development finance. It is searchable by going to aiddata.org/dashboard and clicking “Advanced Search”
Pure Aggregate Data: Aggregate data that provides a broader view of the total resource envelope available for developing countries, including variables such as remittance inflows and outflows, FDI inflows and outflows, etc. It is searchable by going to aiddata.org/dashboard and clicking “Aggregate Search”
Core Research Releases: A snapshot of the core, project-level data available on the Portal with a flat table structure, made available at regular intervals and posted to http://aiddata.org/aiddata-research-releases
Geocoded Research Releases: Data acquired from a partner country’s AIMS with fully geocoded locations and posted as static datasets to aiddata.org/geocoded-datasets. You can use the portal to visualize and search some of this data by going to aiddata.org/maps
IATI Research Releases: Data from our other products transformed into the IATI standard. Users can export core data from the Portal in the IATI format or access it at the IATI registry at http://dashboard.iatistandard.org/publisher/aiddata.html

Data Policies and Dictionaries

The DMP also includes several overarching documents such as a dictionary for all data fields used in any standard AidData product and a glossary of common AidData terms and concepts.

We are excited to put Version 1.0 of our Data Management Plan in the public domain, and we hope that it allows our users to take a closer look at what is happening behind the curtain here at AidData. We regard the Data Management Plan as a living set of documents that will be updated and adapted as the organization and the scope and breadth of our data grows, so be sure to send us your feedback on how we can improve future versions at data@aiddata.org.

Scott Stewart and Brooke Russell are members of AidData’s Data Team and they are the lead authors of Version 1.0 of AidData’s Data Management Plan.

Brooke Escobar is Interim Director of AidData's Tracking Underreported Financial Flows (TUFF) Unit.