How to help improve our global Chinese official finance data

‍

Get involved in improving the data

While AidData’s Global Chinese Official Finance, Version 1.0 dataset was built by our own staff and researchers, it has benefited from the input of dozens of independent contributors. This public resource was created in anticipation of the fact that others who are knowledgeable about specific Chinese official finance activities could help improve the accuracy, scope and depth of the database over time.

We welcome you to help us improve this data by identifying errors and omissions, and by suggesting alternative sources of information. If you have additional information for a project, or believe our current information is incorrect, please let us know! Email us your comment or suggestion at china@aiddata.org, along with any additional sources of information you would like to submit for our consideration.

How we measure data improvement

"Health of record" methodology:

The purpose of our "Health of Record" methodology is to rate the completeness and verifiability of each project record. The methodology produces a source triangulation and a field completeness score. Our team uses these scores to prioritize project records that require further investigation and validation; they can also be used by external users to isolate and analyze project records with varying levels of data quality.

The public disclosure of these data quality scores is part of a larger effort at AidData to be as transparent as possible about the data we produce through our Tracking Underreported Financial Flows (TUFF) methodology. For more information on how you can help improve the “health” of a particular project record, please see the section above on improving the data.

Source Triangulation Score:

This score, which varies from 0 to 20 (with higher scores representing better-sourced project records) is designed to capture the diversity and quality of sources and source types used to construct individual project records. These sources not only include those codified in the TUFF methodology (e.g. media reports, government documents, and scholarly articles), but also sources gained via ground-truthing efforts.

Base Score: The base score is determined by the number of media reports used to source a project. It is informed by the actual distribution of sources in the database.

Projects receive 1 point for each additional media report (2 and above).
Points are capped at 4, because of the diminishing value of additional media sources (due to repetition of information).

Value Added Score: This score awards extra points to project records that are sourced from other, more credible sources. Extra points are awarded for each source type that informs a project record. Project records do not receive additional points for more than one source within each category; rather, this score is used to assess the diversity of source types attached to a project record.

Official government sources (donor/recipient): 3
Other official sources (non-donor/non-recipient): 3
Implementing agency source: 2
Academic journal articles and other academic sources: 2
NGO/civil society/advocacy source: 1
Social media, including unofficial blogs: 1

Bonus Points: Additional points are awarded for ground-truthed or sky-truthed projects. Evidence of such a procedure is found in multimedia content uploaded to the page of a project record.

Successfully ground-truthed: 4 points

Field Completeness Score:

This score assesses a project record’s level of completeness ("completeness" defined as having all of its fields populated). It varies from 0 to 9; higher values represent project records with more populated fields. We prioritize the presence of 7 key fields defined below; if any of these fields are missing information, a project record’s completeness score is reduced by 1 point. Additionally, a project record earns an extra point when any "high-value" field (defined below) is populated. In order to ensure that the field completeness score only assumes positive values, all project records start base value of 8 before deductions begin. The theoretical max of this score is therefore 9.

High-value fields:

Transaction Amount: Projects with missing financial amounts receive a 1 point deduction.
Commitment Year: Project without a commitment year or tagged “year uncertain” receive a 1 point deduction.
Flow Class: “Vague” records receive 1 point deduction.
Flow Type: Vague-TBD/unset records receive a 1 point deduction.
Sector: Unallocated/unspecific projects receive a 1 point deduction.

Status:

To identify records that merit an additional round of searchers to see if new information is available, the completeness score will take status into account. It is reasonable to assume that completed or cancelled projects will not receive additional media coverage whereas pipeline, implementing, or suspended projects could receive additional coverage.

Projects that are marked as completed or cancelled will receive 1 point since we can be confident that additional information will not be forthcoming.
Projects that are marked pipeline or implementation’ receive 0 points.

Other fields:

Implementing/Accountable Agency: Projects without an implementing or accountable agency also lose a point.

Related Pages

How to help improve our global Chinese official finance data

Get involved in improving the data

How we measure data improvement