Getting Started#
Installation#
This package requires that you have at least Python 3.9 installed.
Install with pip:
pip install ipumspy
Install with conda:
conda install -c conda-forge ipumspy
Read an IPUMS extract#
For microdata collections, ipumspy provides methods to parse DDI xml codebooks and load data files into
pandas DataFrame objects. Both fixed-width and csv files are supported.
For example:
from ipumspy import readers, ddi
ddi_codebook = readers.read_ipums_ddi(ddi/xml/file path/)
ipums_df = readers.read_microdata(ddi_codebook, data/file/path)
IPUMS API Wrappers for Python#
ipumspy provides an easy-to-use Python wrapper for IPUMS API endpoints.
Get an API Key#
To interact with the IPUMS API, you’ll need to register for access with the IPUMS project you’ll be using. If you have not yet registered, you can find the link to register for each project at the top of its website, which can be accessed from the IPUMS homepage.
Once you’re registered, you’ll be able to create an API key.
Attention
For security reasons, we recommend storing your key in an environment variable rather than including it in your code.
The Conda documentation provides
instructions for saving environment variables
in conda environments for different operating systems. The example code on this page assumes that the
API key is stored in an environment variable called IPUMS_API_KEY.
A Simple Example#
To request IPUMS data via API, initialize an API client using your API key:
import os
from pathlib import Path
from ipumspy import IpumsApiClient, MicrodataExtract, readers, ddi
# This assumes you have set up an environmental variable called
# "IPUMS_API_KEY" to store your IPUMS API key
IPUMS_API_KEY = os.environ.get("IPUMS_API_KEY")
ipums = IpumsApiClient(IPUMS_API_KEY)
Next, create an extract definition that contains the specifications for the data you wish to request and download. For instance, we can request 2012 Puerto Rico Community Survey data for age and sex from IPUMS USA with the following:
# Create an extract definition
extract = MicrodataExtract(
collection="usa",
description="Sample USA extract",
samples=["us2012b"],
variables=["AGE", "SEX"],
)
See also
The IPUMS API client page contains more detailed information on supported data collections and available extract definition parameters.
Submit the extract to the IPUMS servers. After waiting for the extract to finish processing, you can download the data:
# Submit the extract request
ipums.submit_extract(extract)
print(f"Extract submitted with id {extract.extract_id}")
#> Extract submitted with id 1
# Wait for the extract to finish
ipums.wait_for_extract(extract)
# Download the extract
DOWNLOAD_DIR = Path(<your_download_dir>)
ipums.download_extract(extract, download_dir=DOWNLOAD_DIR)
For microdata collections, you can load your data using ipumspy readers described above:
# Get the DDI
ddi_file = list(DOWNLOAD_DIR.glob("*.xml"))[0]
ddi = readers.read_ipums_ddi(ddi_file)
# Get the data
ipums_df = readers.read_microdata(ddi, DOWNLOAD_DIR / ddi.file_description.filename)
Aggregate data collection data can be loaded with other python libraries. See Reading IPUMS Aggregate Data Extracts for examples.
For additional information about the IPUMS API as well as technical documentation, visit the IPUMS developer portal.