peanutButter - A web application to produce rapid-response gridded population estimates from building footprints

WorldPop Research Group
University of Southampton

23 February 2021

Overview

The peanutButter web application allows you to produce gridded population estimates using your own estimates of people per building. This is a quick and simple approach that utilizes high resolution maps of building footprints, assuming that the same number of people live in each building across entire regions or settlement types (e.g. urban and rural). WorldPop partners with Maxar Technologies, Ecopia.AI, and the Bill and Melinda Gates Foundation for recent maps of buildings extracted from Maxar's Vivid Imagery Mosaic (Ecopia.AI and Maxar Technologies 2020; Dooley et al 2020, 2021).

Applying a user-defined estimate of 5.9 people per building to create gridded population estimates from building footprints in this urban area (darker red = more people).

Applying a user-defined estimate of 5.9 people per building to create gridded population estimates from building footprints in this urban area (darker red = more people).

The peanutButter application has two tools that can be used to produce gridded population estimates: aggregate and disaggregate. The "aggregate"" tool will apply your estimates of people per building in urban and rural areas to every building and then aggregate buildings to estimate population sizes for every ~100 m grid cell. The "disaggregate" tool starts with user-defined population totals for administrative units (or other areas) and disaggregates the population evenly among buildings.

The prefered approach will depend on the information that you have available:

Advantages

Disadvantages

Code for the peanutButter R package is openly available from WorldPop on GitHub: https://github.com/wpgp/peanutButter.

Quick Start: Aggregate Tool

  1. Use the sliders to explore population parameters until you find a combination that produces reasonable estimates (shown in the 'Results' table) of total population, urban population, and rural population for the country as a whole.

  2. Adjust the age-sex sliders to select the demographic group(s) that you would like your gridded population estimates to represent. Note: this will not change the population totals shown in the 'Results' table.

  3. Use the "Gridded Population Estimates" button to generate a gridded population map (a geotiff raster) that is produced by applying your population parameters to building footprints in each approximately 100 m grid cell across the country.

  4. (optional) Use the "Settings" and/or "Source Files" button(s) to save the input data.

Quick Start: Disaggregate Tool

  1. Upload a GeoJson file containing polygons and their associated population totals. Click the "Browse" button to select your GeoJson file and use the drop down list to select the column with your population totals. Ensure that you have the correct country selected.

  2. Adjust the age-sex sliders to select the demographic group(s) that you would like your gridded population estimates to represent.

  3. Use the "Gridded Population Estimates" button to generate a gridded population map (a geotiff raster) that is produced by disaggregating a polygon's population total evenly across the buildings inside the polygon.

  4. (optional) Use the "Source Files" button to save the input data.

More Info: Aggregate Tool

Method

The aggregate tool requires you to provide estimates of three population characteristics for urban and rural settlement types using expert opinion:

  1. Mean number of people per housing unit
  2. Mean number of housing units per building
  3. Proportion of buildings that are residential

These three parameters will be multiplied together to estimate people per building footprint.

Areas have been pre-classified as 'urban' based on two simple rules that are consistent across all countries. A ~100 meter grid cell is classified as 'urban' if it is part of a grouping of contiguous cells with: 1) 1,500 or more cells containing at least one building, and; 2) 5,000 or more total buildings (Dooley et al 2020). All grid cells containing building centroids that do not meet the 'urban' criteria, are classified as 'rural'.

The population is estimated using the following formula:

Population = Pop_Urban + Pop_Rural

Pop_Urban = Buildings_Urban x PropResidential_Urban x UnitsPerBuilding_Urban x PeoplePerUnit_Urban

Pop_Rural = Buildings_Rural x PropResidential_Rural x UnitsPerBuilding_Rural x PeoplePerUnit_Rural

You can supply your own inputs (i.e. raster of building count, raster of urban/rural classification) to produce gridded population estimates using the "aggregator" function of the peanutButter R package.

Population Parameters

Mean people per housing unit
This parameter defines the average number of people in a housing unit and it can be defined for urban and rural areas separately. References for determining these values could include:

Mean housing units per building
This parameter defines the average number of housing units per residential building. This number would be expected to be near 1 for rural areas and slightly higher in urban areas.

Proportion residential buildings
This parameter defines the proportion of building footprints that represent residential structures out of the total counts of building footprints from Dooley and Tatem (2020). Some buildings are non-residential like factories, shops, barns, and sheds. In addition, there are some erroneous building footprints that do not represent buildings and others that may represent more than one building.

Default Values

We provide moderately informative default values for population parameters, but we strongly urge users to modify the defaults based on a thorough investigation of available data sources. Our process for selecting default values was:

  1. We defined mean people per housing unit based on UN 2019 or UN 2017. We assumed that household sizes were equal in urban and rural areas. If no estimate was availabe, we assumed an average of 5 people per housing unit.

  2. Assume there was an average of 1.1 housing units per residential building in urban areas and 1 housing unit per residential building in rural areas.

  3. With the other parameters fixed at their default values, we defined the proportion of building footprints that are residential so that the national population total matched the estimate for the year 2020 from the United Nations World Population Prospects (2019) using the medium-variant projections.

  4. For the advanced controls, the default value for average number of people per hectare of building footprint area was defined so that the total populations in urban and rural areas matched those derived from the default values described above.

More Info: Disaggregate Tool

Method

The disaggregate tool produces gridded population estimates by disaggregating user-defined population totals evenly among buildings within user-defined polygon boundaries (e.g. administrative boundaries). There are three steps involved in this process:

  1. Calculate the total number of buildings in each polygon using the source data from Dooley and Tatem (2020) that were derived from high resolution building footprints (Ecopia.AI and Maxar Technologies 2020).

  2. Calculate the average number of people per building by dividing the user-provided population total for each polygon by the total number of buildings in that polygon.

  3. Produce gridded estimates by multiplying the average people per building by the total number of buildings in each ~100 m grid cell.

User-provided polygons will be ignored if they are too small to overlap the centroid of at least one 100-m grid cell from the building raster (see Source Data). If this happens, the total population of the gridded population estimates (i.e. summing all cells) will exclude the populations from these small polygons.

If there are zero buildings within any user-provided polygons, then the gridded population estimates will contain NA for these grid cells.

Areas of your polygons that exceed the national boundaries used in the building footprints (see Source Data) will not contain gridded population estimates.

You can supply your own inputs (i.e. raster of building counts) to produce gridded population estimates using the "disaggregator" function of the peanutButter R package.

Age-sex

Gridded population estimates for specific demographic group(s) are produced the same way for the top-down and bottom-up methods using ~100 m gridded estimates of the proportion of population in each age-sex group (WorldPop et al. 2018). There are 2 steps involved:

  1. Sum the individual age-sex proportions for each grid cell across the age-sex groups selected using the sliders.

  2. Multiply the resulting grid of proportions by the gridded estimates of total population from the bottom-up or top-down method.

To produce gridded population estimates for specific age-sex groups using your own inputs (i.e. population raster, or region-specific age-sex proportions), you can use the "demographic" function from the peanutButter R package. For an example of the age-sex information needed, please refer to the 'agesex' files downloaded when clicking on the "source files" button for any country.

Advanced Controls

Advanced controls allow you to screen out the smallest and largest buildings prior to calculating populations and to distribute population numbers based on building area rather than building counts (the default). These advanced settings apply to the "Aggregate" tab and the "Disaggregate" tab.

Size Thresholds for Buildings

You can choose to assume that no people live in the buildings with the smallest and/or largest building footprints.

Some building footprints are so small that they are unlikely to be residential structures. These building footprints could be sheds, outbuildings, animal pens, or other small structures. You can use the slider to define a threshold for the minimum building footprint area that you would like to assign population numbers to. The peanutButter app will then remove building footprints smaller than your threshold prior to calculating populations.

On the other hand, there are some building footprints that are so large that they are unlikely to be residential structures. These building footprints could be stadiums, industrial buildings, or even solar panel fields. You can use the slider to define a threshold for the maximum building footprint area that you would like to assign population numbers to. The peanutButter app will then remove building footprints larger than your threshold prior to calculating populations.

Unit of Analysis

The population can be estimated based on the count of buildings or the total area of buildings. If you change the unit of analysis to "area" rather than "count" then the controls on the "Aggregate" tab will change. The area-based controls require you to define the average number of people per hectare of building footprint area, whereas the count-based controls require you to define the average number of people per building.

The area-based method will assign a larger portion of the population to very large buildings, whereas the point-based method will assign a larger portion of the population to very small buildings. Because of this, the results of the area-based method may be sensitive to the maximum size threshold that you define for residential buildings and the point-based method may be sensitive to the minimum size threshold that you define for residential buildings.

Source Data

In the peanutButter app, there are several source datasets working behind the scenes that you can download using the "Download Source" button.

There are two data sets describing building patterns (Dooley et al 2020) that were derived from building footprints (Ecopia.AI and Maxar Technologies 2020):

  1. The count of buildings in each ~100 m grid cell across the country,
  2. A classification of each ~100 m grid cell as urban or rural.

There are also two source datasets that provide the proportion of population in each demographic group for every ~100 m grid cell (WorldPop et al 2018, Pezullo et al 2017, Carioli et al 2019). The age-sex source files include:

  1. A spreadsheet with age-sex proportions for each region,
  2. A region ID for every 100 m grid cell.

Year of Population Estimates

The time period associated with gridded population estimates produced by peanutButter depends on several factors. The spatial distribution of the population across the country depends on the spatial distribution of building footprints. The age-sex breakdown depends on the source demographic data. The population totals depend on the population data that you provide.

Building footprints
The satellite imagery used to create the building footprints is from Maxar's Vivid imagery mosaics. The exact date depends on the specific satellite image used in the mosaic at a given location. The average mosaic image is less than 20 months old (Ecopia.AI and Maxar Technologies 2020). When you select a country, a footnote will provide the percentage coverage of building footprints in the country that were mapped based on satellite imagery from 2019, 2018, 2017, 2016, and 2015 or earlier. You can also click the "Source Files" button to download a map (geotiff raster) of imagery years for building footprints across the country.

Demographic data
The demographic data (WorldPop et al. 2018) describing age-sex proportions and their spatial distribution are based on projections that represent the year 2020, but they do not account for potential effects of population displacement and migration.

Population data
Aggregate tool: The time-point represented by the gridded population estimates is influenced by the date of your settings for "people per housing unit", "housing units per residential building", and "proportion of buildings that are residential". The date reflected by our default population totals is 2020. For more information, refer to our sources described in the sub-section "Default Values" from [More Info: Bottom-up].

Disaggregate tool: The time-point of the gridded population estimates is influenced by the date of the population totals in your uploaded geojson file.

Version History

Version 1.0.0
10 June 2021. Testing period ended and progressed to non-beta version 1 release.

Version 0.3.2
Patch on 22 February 2021 to update the building footprints for 21 countries: BDI, BEN, BFA, BWA, CAF, CMR, COG, DJI, ESH, GHA, GIN, GMB, LSO, MWI, NAM, NER, SSD, TCD, TZA, UGA, and ZMB.

Version 0.3.1
Patch on 30 January 2021 to update the WorldPop logo and to fix a glitch in the source data for Eritrea.

Version 0.3.0
Minor version update on 14 September 2020 to include advanced controls for filtering out buildings based on their area and to distribute population based on building area or building count. Modified default values to match population totals for the year 2020 from the United Nations World Population Prospects (2019; medium-variant projections).

Version 0.2.1
Patch on 9 July 2020 to include building footprints from additional countries for a total of 51 countries.

Version 0.2.0
Minor version update on 3 June 2020 to change terminology from "bottom-up" to "aggregate" and from "top-down" to "disaggregate".

Version 0.1.0
Initial beta release on 27 May 2020.

Contributing

The peanutButter R package was developed by the WorldPop Research Group within the Department of Geography and Environmental Science at the University of Southampton. Funding was provided by the Bill and Melinda Gates Foundation (INV-002697). Ecopia.AI and Maxar Technologies (2020) provided high resolution building footprints based on recent satellite imagery. Gridded age-sex data were provided by the WorldPop Global High Resolution Population Denominators Project led by Alessandro Sorichetta with funding from the Bill and Melinda Gates Foundation (OPP1134076). Development of the peanutButter R package was led by Doug Leasure. Claire Dooley developed the source rasters of building counts and urban/rural settlements. Maksym Bondarenko maintains WorldPop's Shiny server. Professor Andy Tatem provides oversight of the WorldPop Research Group.

License

You are free to redistribute and modify the peanutButter software under the terms of the GNU General Public License v3.0 (GNU GPLv3). The source code is openly available on GitHub.

The source data for each country is available from the "Source Files" button in the application and the license terms for those datasets are included in the documentation with each download.

References

Carioli A, Tejedor-Garavito N, Nilsen K, Pezzulo C, Hanspal S, Hilber T, Nieves JJ, Hornby G, Kerr D, Chamberlain H, Ves N, Bondarenko M, Lloyd C, Pistolesi L, Yetman G, Adamo S, Mills J, Gaughan A, Stevens F, James W, Linard C, Sorichetta A, Tatem AJ. 2019. Adolescent fertility: A multi-temporal sub-national perspective across low and middle-income countries. Population Association of America. http://paa2019.populationassociation.org/uploads/192031

Ecopia.AI and Maxar Technologies. 2020. Digitize Africa. Ecopia.AI and Maxar Technologies.

Dooley CA, Boo G, Leasure DR, and Tatem AJ. 2020. Gridded maps of building patterns throughout sub-Saharan Africa, version 1.1. WorldPop, University of Southampton. Source of building footprints: Ecopia Vector Maps Powered by Maxar Satellite Imagery (C) 2020. doi:10.5258/SOTON/WP00677

Dooley CA, Leasure DR, Boo G, and Tatem AJ. 2021. Gridded maps of building patterns throughout sub-Saharan Africa, version 2.0. WorldPop, University of Southampton. Source of building footprints: Ecopia Vector Maps Powered by Maxar Satellite Imagery (C) 2020. doi:10.5258/SOTON/WP00712

Pezzulo C, Hornby GM, Sorichetta A, Gaughan AE, Linard C, Bird TJ, Kerr D, Lloyd CT, Tatem AJ. 2017. Sub-national mapping of population pyramids and dependency ratios in Africa and Asia. Sci. Data 4:170089 doi:10.1038/sdata.2017.89

WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project - Funded by the Bill and Melinda Gates Foundation (OPP1134076).