IPUMS API#
ipumspy provides a framework for users to submit extract requests and download IPUMS data via the IPUMS API.
API Assets#
The IPUMS API provides two asset types:
IPUMS extract endpoints can be used to submit extract requests for processing and download completed extract files.
IPUMS metadata endpoints can be used to discover and explore available IPUMS data as well as retrieve codes, names, and other extract parameters necessary to form extract requests.
Supported IPUMS Collections#
IPUMS consists of multiple collections that provide different data products. These collections fall into one of two categories:
Microdata collections distribute data for individual survey units, like people or households.
Aggregate data collections distribute summary tables of aggregate statistics for particular geographic units, and may also provide corresponding GIS mapping files.
Not all IPUMS collections are currently supported by the IPUMS API. The table below summarizes the available features for all collections currently supported by the API:
IPUMS data collection |
Data type |
Collection ID |
Request & download data |
Browse metadata |
---|---|---|---|---|
Microdata |
|
X |
||
Microdata |
|
X |
||
Microdata |
|
X |
||
Microdata |
|
X |
||
Microdata |
|
X |
||
Microdata |
|
X |
||
Microdata |
|
X |
||
Microdata |
|
X |
||
Aggregate data |
|
X |
X |
Note that ipumspy may not necessarily support all the functionality currently supported by the IPUMS API. See the API documentation for more information about its latest features.
Get an API Key#
Before you can interact with the IPUMS API, you’ll need to make sure you’ve obtained and set up your API key.
You can then initialize an API client using your key. (The following assumes your
key is stored in the IPUMS_API_KEY
environment variable as described in the link above.)
import os
from pathlib import Path
from ipumspy import IpumsApiClient, MicrodataExtract, save_extract_as_json
IPUMS_API_KEY = os.environ.get("IPUMS_API_KEY")
ipums = IpumsApiClient(IPUMS_API_KEY)
Extract Objects#
To request IPUMS data via the IPUMS API, you need to first create an extract request object, which contains the parameters that define the content, format, and layout for the data you’d like to download.
IPUMS extract requests can be constructed and submitted to the IPUMS API using either
The
MicrodataExtract
class (for microdata collections)The
AggregateDataExtract
class (for aggregate data collections)
For instance, the following defines a simple IPUMS USA extract request for the AGE, SEX, RACE, STATEFIP, and MARST variables from the 2018 and 2019 American Community Survey (ACS):
extract = MicrodataExtract(
collection="usa",
description="Sample USA extract",
samples=["us2018a", "us2019a"],
variables=["AGE", "SEX", "RACE", "STATEFIP", "MARST"],
)
See also
The available extract definition options vary across collections. See the microdata extracts and aggregate data extracts pages for more information about the available extract parameters for each type.
IPUMS Metadata#
Microdata Collections#
Currently, comprehensive IPUMS API metadata is only available for IPUMS NHGIS.
For microdata collections, only sample information is available. You can obtain a dictionary
of sample codes with get_all_sample_info()
.
Aggregate Data Collections#
You can use the IPUMS API metadata endpoints to identify the codes you can use to include particular data sources in your extract request.
The IPUMS API provides access to two different types of metadata. The first provides a listing of all
available data sources of a given type (see the table below for supported types).
These records can be accessed with get_metadata_catalog()
.
This method returns a generator of metadata pages, allowing you to iterate through and search for particular data sources. For instance, to identify all available IPUMS NHGIS data tables that contain data referring to “Urban Population”, we could do the following:
urb_dts = []
# Identify all data tables referring to "Urban Population"
for page in ipums.get_metadata_catalog("nhgis", metadata_type="data_tables"):
for dt in page["data"]:
if "Urban Population" in dt["description"]:
urb_dts.append(dt)
The IPUMS API also provides access to detailed metadata about individual data sources. Request
this metadata by using an IpumsMetadata
object to indicate the individual data source
for which to retrieve metadata. For instance, to request metadata for IPUMS NHGIS time series table “A00”:
tst = TimeSeriesTableMetadata("nhgis", "A00")
Submit the request to the IPUMS API with get_metadata()
. The returned object will contain the
metadata obtained for the requested data source:
ipums.get_metadata(tst)
tst.description
#> 'Total Population'
The following table summarizes the currently available metadata endpoints:
Metadata type |
Supported collections |
Detailed metadata class analog |
---|---|---|
|
IPUMS NHGIS |
|
|
IPUMS NHGIS |
|
|
IPUMS NHGIS |
|
|
IPUMS NHGIS |
Submit an Extract Request#
Once you’ve created an extract object, you can submit it to the IPUMS servers for processing:
ipums.submit_extract(extract)
If the extract is succesfully submitted, it will receive an ID number:
print(extract.extract_id)
#> 1
You can use this extract ID number along with the data collection name to check on or download your extract later if you lose track of the original extract object.
Download an Extract#
It may take some time for the IPUMS servers to process your extract request. You can check the current status of a request:
print(ipums.extract_status(extract))
#> started
Instead of repeatedly checking the status, you can explicitly wait for the extract to complete before attempting to download it:
ipums.wait_for_extract(extract)
At this point, you can safely download the extract:
DOWNLOAD_DIR = Path("<your_download_dir>")
ipums.download_extract(extract, download_dir=DOWNLOAD_DIR)
Extract Status#
If you lose track of the extract
object for any reason, you may check the status
and download the extract using only the name of the collection
and the extract_id
.
# Check the extract status
extract_status = ipums.extract_status(extract=1, collection="usa")
print(f"extract is {extract_status}")
#> extract is started
You can also wait for and download an extract using this unique identifier:
ipums.wait_for_extract(extract=1, collection="usa")
ipums.download_extract(extract=1, collection="usa")
Expired Extracts#
While IPUMS retains all of a user’s extract definitions, after a certain period, the extract data and syntax
files are purged from the IPUMS cache—these extracts are said to be “expired”. Importantly, if an extract’s data and
syntax files have been removed, the extract is still considered to have been completed, and
extract_status()
will return “completed.”
# Extract number 1 has expired, but status listed as completed
extract_status = ipums.extract_status(extract=1, collection="usa")
print(extract_status)
#> completed
You can confirm whether an extract has expired with the following:
is_expired = ipums.extract_is_expired(extract=1, collection="usa")
print(is_expired)
#> True
For extracts that have expired, the data collection name and extract ID number can be used to re-create and re-submit the old extract.
Attention
Note that re-creating and “re-submitting” an expired extract results in a new extract with its own unique ID number!
# Create a MicrodataExtract object from the expired extract definition
renewed_extract = ipums.get_extract_by_id(collection="usa", extract_id=1)
# Submit the renewed extract to re-generate the data and syntax files
resubmitted_extract = ipums.submit_extract(renewed_extract)
print(resubmitted_extract.extract_id)
#> 2
Extract Histories#
ipumspy offers two ways to peruse your extract history for a given IPUMS data collection.
get_previous_extracts()
can be used to retrieve your most recent extracts for a
given collection. By default, it retrieves your previous 10 extracts, but you can adjust
the limit
argument to retrieve more or fewer records:
from ipumspy import IpumsApiClient
ipums = IpumsApiClient("YOUR_API_KEY")
# get my 10 most-recent USA extracts
recent_extracts = ipums.get_previous_extracts("usa")
# get my 20 most-recent CPS extracts
more_recent_extracts = ipums.get_previous_extracts("cps", limit=20)
Alternatively, the get_extract_history()
generator makes it easy to filter your extract history to
pull out extracts with certain features (e.g., variables, file formats, etc.). By default, this
generator returns pages of extract definitions of the maximum possible size of 500 extract definitions
per page. Page size can be set to a lower number using the page_size
argument.
Here, we filter our history to identify all our CPS extracts containing the STATEFIP
variable:
extracts_with_state = []
# Get pages with 100 CPS extracts per page
for page in ipums.get_extract_history("cps", page_size=100):
for ext in page["data"]:
extract_obj = extract_from_dict(ext["extractDefinition"])
if "STATEFIP" in [var.name for var in extract_obj.variables]:
extracts_with_state.append(extract_obj)
Browsing your extract history is a good way to identify previous extracts and re-submit them.
Tip
Specifying a memorable extract description
when defining an extract object
can make it easier to identify the extract in your history in the future.