IPUMS API#

ipumspy provides a framework for users to submit extract requests and download IPUMS data via the IPUMS API.

API Assets#

The IPUMS API provides two asset types:

IPUMS extract endpoints can be used to submit extract requests for processing and download completed extract files.
IPUMS metadata endpoints can be used to discover and explore available IPUMS data as well as retrieve codes, names, and other extract parameters necessary to form extract requests.

Supported IPUMS Collections#

IPUMS consists of multiple collections that provide different data products. These collections fall into one of two categories:

Microdata collections distribute data for individual survey units, like people or households.
Aggregate data collections distribute summary tables of aggregate statistics for particular geographic units, and may also provide corresponding GIS mapping files.

Not all IPUMS collections are currently supported by the IPUMS API. The table below summarizes the available features for all collections currently supported by the API:

Supported data collections#
IPUMS data collection	Data type	Collection ID	Request & download data	Browse metadata
IPUMS USA	Microdata	`usa`	X
IPUMS CPS	Microdata	`cps`	X
IPUMS International	Microdata	`ipumsi`	X
IPUMS ATUS	Microdata	`atus`	X
IPUMS AHTUS	Microdata	`ahtus`	X
IPUMS MTUS	Microdata	`mtus`	X
IPUMS NHIS	Microdata	`nhis`	X
IPUMS MEPS	Microdata	`meps`	X
IPUMS NHGIS	Aggregate data	`nhgis`	X	X
IPUMS IHGIS	Aggregate data	`ihgis`	X	X

Note that ipumspy may not necessarily support all the functionality currently supported by the IPUMS API. See the API documentation for more information about its latest features.

Get an API Key#

Before you can interact with the IPUMS API, you’ll need to make sure you’ve obtained and set up your API key.

You can then initialize an API client using your key. (The following assumes your key is stored in the IPUMS_API_KEY environment variable as described in the link above.)

import os
from pathlib import Path
from ipumspy import IpumsApiClient, MicrodataExtract, save_extract_as_json

IPUMS_API_KEY = os.environ.get("IPUMS_API_KEY")

ipums = IpumsApiClient(IPUMS_API_KEY)

Extract Objects#

To request IPUMS data via the IPUMS API, you need to first create an extract request object, which contains the parameters that define the content, format, and layout for the data you’d like to download.

IPUMS extract requests can be constructed and submitted to the IPUMS API using either

The MicrodataExtract class (for microdata collections)
The AggregateDataExtract class (for aggregate data collections)

For instance, the following defines a simple IPUMS USA extract request for the AGE, SEX, RACE, STATEFIP, and MARST variables from the 2018 and 2019 American Community Survey (ACS):

extract = MicrodataExtract(
    collection="usa",
    description="Sample USA extract",
    samples=["us2018a", "us2019a"],
    variables=["AGE", "SEX", "RACE", "STATEFIP", "MARST"],
)

IPUMS Metadata#

Microdata Collections#

Currently, comprehensive IPUMS API metadata is only available for IPUMS NHGIS. For microdata collections, only sample information is available. You can obtain a dictionary of sample codes with get_all_sample_info().

Aggregate Data Collections#

You can use the IPUMS API metadata endpoints to identify the codes you can use to include particular data sources in your extract request.

The IPUMS API provides access to two different types of metadata. The first provides a listing of all available data sources of a given type (see the table below for supported types). These records can be accessed with get_metadata_catalog().

This method returns a generator of metadata pages, allowing you to iterate through and search for particular data sources. For instance, to identify all available IPUMS NHGIS data tables that contain data referring to “Urban Population”, we could do the following:

urb_dts = []

# Identify all data tables referring to "Urban Population"
for page in ipums.get_metadata_catalog("nhgis", metadata_type="data_tables"):
    for dt in page["data"]:
        if "Urban Population" in dt["description"]:
            urb_dts.append(dt)

The IPUMS API also provides access to detailed metadata about individual data sources. Request this metadata by using an IpumsMetadata object to indicate the individual data source for which to retrieve metadata. For instance, to request metadata for IPUMS NHGIS time series table “A00”:

tst = TimeSeriesTableMetadata("nhgis", "A00")

Submit the request to the IPUMS API with get_metadata(). The returned object will contain the metadata obtained for the requested data source:

ipums.get_metadata(tst)

tst.description
#> 'Total Population'

The following table summarizes the currently available metadata endpoints. Endpoints listed in the Metadata type column can be used with the indicated collection in get_metadata_catalog(). Classes listed in the Detailed metadata class column can be used to obtain detailed metadata for individual data sources of that type.

Supported metadata endpoints#
Collection	Metadata type	Detailed metadata class
NHGIS	`datasets`	`NhgisDatasetMetadata`
NHGIS	`data_tables`	`NhgisDataTableMetadata`
NHGIS	`time_series_tables`	`TimeSeriesTableMetadata`
NHGIS	`shapefiles`

IHGIS	`datasets`	`IhgisDatasetMetadata`
IHGIS	`data_tables`	`IhgisDataTableMetadata`
IHGIS	`tabulation_geographies`

Submit an Extract Request#

Once you’ve created an extract object, you can submit it to the IPUMS servers for processing:

ipums.submit_extract(extract)

If the extract is succesfully submitted, it will receive an ID number:

print(extract.extract_id)
#> 1

You can use this extract ID number along with the data collection name to check on or download your extract later if you lose track of the original extract object.

Download an Extract#

It may take some time for the IPUMS servers to process your extract request. You can check the current status of a request:

print(ipums.extract_status(extract))
#> started

Instead of repeatedly checking the status, you can explicitly wait for the extract to complete before attempting to download it:

ipums.wait_for_extract(extract)

At this point, you can safely download the extract:

DOWNLOAD_DIR = Path("<your_download_dir>")
ipums.download_extract(extract, download_dir=DOWNLOAD_DIR)

Extract Status#

If you lose track of the extract object for any reason, you may check the status and download the extract using only the name of the collection and the extract_id.

# Check the extract status
extract_status = ipums.extract_status(extract=1, collection="usa")
print(f"extract is {extract_status}")
#> extract is started

You can also wait for and download an extract using this unique identifier:

ipums.wait_for_extract(extract=1, collection="usa")
ipums.download_extract(extract=1, collection="usa")

Expired Extracts#

While IPUMS retains all of a user’s extract definitions, after a certain period, the extract data and syntax files are purged from the IPUMS cache—these extracts are said to be “expired”. Importantly, if an extract’s data and syntax files have been removed, the extract is still considered to have been completed, and extract_status() will return “completed.”

# Extract number 1 has expired, but status listed as completed
extract_status = ipums.extract_status(extract=1, collection="usa")

print(extract_status)
#> completed

You can confirm whether an extract has expired with the following:

is_expired = ipums.extract_is_expired(extract=1, collection="usa")

print(is_expired)
#> True

For extracts that have expired, the data collection name and extract ID number can be used to re-create and re-submit the old extract.

Attention

Note that re-creating and “re-submitting” an expired extract results in a new extract with its own unique ID number!

# Create a MicrodataExtract object from the expired extract definition
renewed_extract = ipums.get_extract_by_id(collection="usa", extract_id=1)

# Submit the renewed extract to re-generate the data and syntax files
resubmitted_extract = ipums.submit_extract(renewed_extract)

print(resubmitted_extract.extract_id)
#> 2

Extract Histories#

ipumspy offers two ways to peruse your extract history for a given IPUMS data collection.

get_previous_extracts() can be used to retrieve your most recent extracts for a given collection. By default, it retrieves your previous 10 extracts, but you can adjust the limit argument to retrieve more or fewer records:

from ipumspy import IpumsApiClient

ipums = IpumsApiClient("YOUR_API_KEY")

# get my 10 most-recent USA extracts
recent_extracts = ipums.get_previous_extracts("usa")

# get my 20 most-recent CPS extracts
more_recent_extracts = ipums.get_previous_extracts("cps", limit=20)

Alternatively, the get_extract_history() generator makes it easy to filter your extract history to pull out extracts with certain features (e.g., variables, file formats, etc.). By default, this generator returns pages of extract definitions of the maximum possible size of 500 extract definitions per page. Page size can be set to a lower number using the page_size argument.

Here, we filter our history to identify all our CPS extracts containing the STATEFIP variable:

extracts_with_state = []

# Get pages with 100 CPS extracts per page
for page in ipums.get_extract_history("cps", page_size=100):
    for ext in page["data"]:
        extract_obj = extract_from_dict(ext["extractDefinition"])
        if "STATEFIP" in [var.name for var in extract_obj.variables]:
            extracts_with_state.append(extract_obj)

Browsing your extract history is a good way to identify previous extracts and re-submit them.

Tip

Specifying a memorable extract description when defining an extract object can make it easier to identify the extract in your history in the future.

IPUMS API#

API Assets#

Supported IPUMS Collections#

Get an API Key#

Extract Objects#

IPUMS Metadata#

Microdata Collections#

Aggregate Data Collections#

Submit an Extract Request#

Download an Extract#

Extract Status#

Expired Extracts#

Sharing Extract Definitions#

Using JSON#

Using YAML#

Extract Histories#