The largest update yet to GeoQuery, AidData’s free spatial data platform

New data on Chinese development finance, and data at monthly intervals for frequently-used datasets.

November 15, 2021
Sasan Faraj, Seth Goodman

Sasan Faraj '23 is majoring in Data Science with a concentration in Algorithms at William & Mary. He previously served as a Research Assistant on the Partnerships and Communications Team from 2019-2020, and his current internship is focused on using machine learning methods to predict wealth at the subnational level in the Philippines using open source geospatial data.

AidData’s GeoQuery platform has just received its largest update yet since its launch in 2017. The latest additions include entirely new datasets on population and Chinese development finance, extensions of our most popular datasets through 2020, new options for monthly (not just yearly) data, and improved administrative boundaries.

GeoQuery is a free, ground-breaking tool that provides effortless access to open-source geospatial data. It allows anyone to find and aggregate dozens of high-quality datasets for a geographic area of interest into a single spreadsheet. Users can select from 77 satellite, economic, health, conflict, and other datasets that span decades and are available for over 200 countries and territories. 

GeoQuery and related work at AidData is supported by partnerships with organizations including the Patrick J. McGovern Foundation, the William & Flora Hewlett Foundation, and USAID. "In our work with nonprofit organizations we see a high demand for geospatial data to support a wide range of use cases, for example to guide humanitarian activities or conservation projects. We are excited to see how improved access to easy-to-use geospatial data can help nonprofits and social impact organizations tackle pressing challenges both local and global,” notes Claudia Juech, Vice President of Data and Society for the Patrick J. McGovern Foundation. “These exciting updates to GeoQuery will allow more people—not just data and analytics experts—to build technical capacity and data-use cultures within their organizations.”

Since 2017, GeoQuery has served a broad community of thousands of researchers, analysts, and decision makers in need of a fast, reliable, and user-friendly source of research-ready geospatial data. By handling the quality control, preprocessing, and merging of terabytes of geospatial datasets, GeoQuery allows users to focus on their work, rather than finding and preparing data. Through GeoQuery, users can request data ranging from nighttime light to carbon dioxide concentrations to forest cover loss for anywhere in the world and get results in a simple spreadsheet format in minutes.

To date, GeoQuery has fulfilled over 25,000 requests for geospatial data spanning nearly 5,500 users across more than 1,000 organizations, ranging from academic institutions and government agencies to NGOs and private companies, in both the Global North and Global South. Average monthly requests through GeoQuery have steadily trended upward from 2017 to 2020, with the full count for 2021 not yet available, but on track to exceed 2020.

Citations of GeoQuery have demonstrated that users apply data from it to a wide range of topics, particularly within low- and middle-income countries. Uses of GeoQuery range from researchers who relied on annual precipitation data from GeoQuery to predict famine intensity in Mali to the World Bank Group’s analysis of nighttime lights data in Myanmar to create an economic forecast. Three of the countries most commonly included in requests are Nigeria, Republic of Congo, and Afghanistan. (Due to the ongoing situation in Afghanistan and requests from our data partners, in order to protect the safety of individuals involved, data for Afghanistan is temporarily unavailable through GeoQuery). 

For the full list of over 70 academic citations, see the Google Scholar page, and keep an eye out for AidData’s forthcoming review of all publications supported by GeoQuery. With the current update, GeoQuery aims to enhance its support for research and evaluation efforts such as these and others around the world. 

New and improved datasets

One of the highlights of this update is the availability of data through 2020 for many of our most frequently requested datasets. We are also excited to include a number of additions which were highly requested by users. These include offering monthly data for many datasets; several new datasets, such as more granular population estimates from WorldPop and LandScan; and easy access to the latest geospatial data from AidData's Global Chinese Development Finance Dataset, Version 2.0, the most comprehensive dataset on China’s overseas lending activities. The release of this dataset and an associated research report in late September, 2021, headlined coverage in more than 300 media stories worldwide, including by BBC TV and World Service, The Economist, The Financial Times, The Wall Street Journal, The Guardian, Times of India, South China Morning Post, Le Monde, and resulted in extended high-level policy debate in Pakistan and Indonesia.

The new and updated datasets cover a broad range of geospatial measurements, including nighttime lights, temperature, precipitation, population, land cover, NDVI (a measure of plant greenness), carbon dioxide concentrations, forest loss, and Chinese development projects. Most updates not only extended the time coverage to additional years, but also include newer versions of the underlying source data. For example, the yearly nighttime lights data from VIIRS in GeoQuery is now based on an updated version 2 product, which incorporates a new and improved data processing methodology. We also now offer a supplemental related dataset, on the count of cloud free observations used to produce the nighttime lights dataset.

The majority of the updated datasets, which were previously only available at an annual level, can now be accessed at monthly time steps. As one of the most frequently requested improvements to GeoQuery, the inclusion of monthly data broadens the scope of research possible through GeoQuery, allowing users to make more precise analyses. 

The table below provides the full list of updated datasets, improved temporal coverage, and availability of monthly data.

In addition to the updated datasets, GeoQuery also now offers several completely new datasets. These include annual MODIS nighttime land surface temperatures and population estimates from both WorldPop and LandScan, which provide more refined population estimates over time than previously available through GeoQuery. 

One of the most exciting new datasets to be offered through GeoQuery is the commitment values of geocoded Chinese projects from AidData's Global Chinese Development Finance Dataset, Version 2.0. The latest version of this dataset—tracking Chinese development projects around the world—is the most comprehensive dataset of its kind and offers unprecedented views into Chinese projects in developing countries. The dataset available through GeoQuery provides commitment values (USD 2017) associated with over 3,000 projects with precise geospatial locations identified. The data is available as four datasets which include: (1) commitments across all sectors, as well as commitments specifically for (2) energy, (3) transport, and (4) industrial projects. 

The full list of new datasets available in GeoQuery can be seen in the table below.

Updated and new boundaries

The administrative and geographic boundary data offered through GeoQuery is a critical component of our data pipeline, and the quality of that data impacts downstream research, analysis, and decision-making. Given this importance, we are pleased to now offer the latest data from geoBoundaries as our primary boundary data. 

geoBoundaries is a product of William & Mary’s geoLab and provides open-source political boundary data for 200 countries and territories. As many different administrative levels (e.g., provinces, districts, counties, municipalities) are available for a single country, this adds up to more than 1 million boundaries in total available in GeoQuery. 

The latest release of geoBoundaries, version 4, received over 500 updates from version 3 and includes more accurate boundaries all across the world, higher measurement precision, and a fully open API which will improve future updates. geoBoundaries is available completely free of charge and is entirely open source. For ease of access, the underlying boundary files used to process a GeoQuery request are automatically included along with the dataset results.  

In addition to these improvements to core boundary data, we also now include in GeoQuery the boundaries of over 3,000 Chinese development project sites, along with varying buffer sizes (0.5km, 2.5km, and 5km) around the projects. With this, researchers can now easily explore trends over time around these project sites, such as changes in nighttime lights, CO2 concentrations, and more. 

Looking ahead

GeoQuery is a key component of AidData’s long-term commitment to support researchers, policymakers, and practitioners around the world, and in particular in developing countries, with better data and evidence to support improved decision-making. By providing geospatial data in a free and easy-to-use format, we hope to lower the barrier to leveraging geospatial data to derive new insights that help solve the world’s toughest development challenges. 

As part of this update, we have also made major improvements to the underlying open-source code used to process the geospatial data made available through GeoQuery. While free and accessible data is a critical piece of supporting our users, we also believe in making our methods and data transparent and replicable. 

The code used to prepare data for GeoQuery, available on GitHub, has been improved for every dataset included in this update to make it easier to read and run across environments with differing levels of data storage and processing power. We hope that this will further reduce barriers to others accessing and using this geospatial data. At AidData, these improvements will help make future data processing pipelines easier to build and enable us to provide more frequent updates for GeoQuery.

Interested in partnering with AidData to leverage data from GeoQuery or explore new geospatial research? Are you using GeoQuery to conduct research, inform decision making, or make insightful visualizations? Do you have requests or recommendations for datasets, boundaries, or other features you would like to see in GeoQuery? We would love to hear from you! Get in touch with us by email using geo@aiddata.org.

Seth Goodman is a Research Scientist at AidData.