Getting Started#

Installation#

This package requires that you have at least Python 3.8 installed.

Install with pip:

pip install ipumspy

Install with conda:

conda install -c conda-forge ipumspy

Read an IPUMS extract#

The following code parses an IPUMS extract DDI xml codebook and data file and returns a pandas data frame. Both fixed-width and csv files are supported.

from ipumspy import readers, ddi

ddi_codebook = readers.read_ipums_ddi(ddi/xml/file path/)
ipums_df = readers.read_microdata(ddi_codebook, data/file/path)

IPUMS API Wrappers for Python#

ipumspy provides an easy-to-use Python wrapper for IPUMS API endpoints.

Quick Start#

Once you have created a user account for your data collection of interest generated an API key. This quick start example uses IPUMS USA. Note that not all IPUMS data collections are available via API. For an up-to-date list of available collections and links to sample and variable information, see the IPUMS data collections metadata resources.

from pathlib import Path

from ipumspy import IpumsApiClient, UsaExtract, readers, ddi

IPUMS_API_KEY = your_api_key
DOWNLOAD_DIR = Path(your_download_dir)

ipums = IpumsApiClient(IPUMS_API_KEY)

Note that for security reasons it is recommended that you store your IPUMS API key in an environment variable rather than including it in your code.

To define an IPUMS USA extract, you need to pass a list of sample IDs and a list of IPUMS USA variable names.

IPUMS USA sample IDs can be found on the IPUMS USA website.

IPUMS USA variables can be browsed via the IPUMS USA extract web UI.

Source variables can be requested using their short or long form variable names. Short form source variable names can be viewed by clicking Display Options on the Select Data page and selecting the short option under Source variable names.

# Submit an API extract request
extract = UsaExtract(
    ["us2012b"],
    ["AGE", "SEX"],
)
ipums.submit_extract(extract)
print(f"Extract submitted with id {extract.extract_id}")

# wait for the extract to finish
ipums.wait_for_extract(extract)

# Download the extract
ipums.download_extract(extract, download_dir=DOWNLOAD_DIR)

# Get the DDI
ddi_file = list(DOWNLOAD_DIR.glob("*.xml"))[0]
ddi = readers.read_ipums_ddi(ddi_file)

# Get the data
ipums_df = readers.read_microdata(ddi, DOWNLOAD_DIR / ddi.file_description.filename)

If you lose track of the extract object for any reason, you may check the status and download the extract using only the name of the collection and the extract_id.

# check the extract status
extract_status = ipums.extract_status(extract=[extract_id], collection=[collection_name])
print(f"extract {extract_id} is {extract_status}")

# when the extract status is "completed", then download
ipums.download_extract(extract=[extract_id], collection=[collection_name])

Specifying an Extract as a File#

A goal of IPUMS-py is to make it easier to share IPUMS extracts with other researchers. For instance, we envision being able to include an ipums.yml file to your analysis code which would allow other researchers to download exactly the extract that you utilize in your own analysis.

To pull the extract we specified made above, create a file called ipums.yml that contains the following:

description: Simple IPUMS extract
collection: usa
api_version: beta
samples:
  - us2012b
variables:
  - AGE
  - SEX

Then you can run the following code:

import yaml
from ipumspy import extract_from_dict

with open("ipums.yml") as infile:
    extract = extract_from_dict(yaml.safe_load(infile))

Alternatively, you can utilize the CLI.

For more information on the IPUMS API, visit the IPUMS developer portal.