IPUMS-py can be used to read extracts made via the IPUMS web interface into python. This page discusses how to request an IPUMS extract via API using IPUMS-py.
An extract is defined by:
A data collection name
A list of IPUMS sample IDs from that collection
A list of IPUMS variable names from that collection
IPUMS metadata is not currently accessible via API. Sample IDs and IPUMS variable names can be browsed via the data collection’s website. See the table below for data collection abreviations and links to sample IDs and variable browsing. Note that not all IPUMS data collections are currently available via API. The table below will be filled in as new IPUMS data collections become accessible via API.
IPUMS data collection
Each IPUMS data collection that is accessible via API (currently just IPUMS USA and IPUMS CPS) has its own extract class. Using this class to create your extract object obviates the need to specify a data collection.
extract = UsaExtract( ["us2012b"], ["AGE", "SEX"], )
instantiates a UsaExtract object for the IPUMS USA data collection that includes the us2012b (2012 PRCS) sample, and the variables AGE and SEX.
Users also have the option to specify a data format and an extract description when creating an extract object.
extract = UsaExtract( ["us2012b"], ["AGE", "SEX"], data_format="csv", description="My first IPUMS USA extract!" )
Once an extract object has been created, the extract must be submitted to the API.
from ipumspy import IpumsApiClient, UsaExtract IPUMS_API_KEY = your_api_key DOWNLOAD_DIR = Path(your_download_dir) ipums = IpumsApiClient(IPUMS_API_KEY) # define your extract extract = UsaExtract( ["us2012b"], ["AGE", "SEX"], ) # submit your extract ipums.submit_extract(extract)
Once an extract has been submitted, an extract ID number will be assigned to it.
returns the extract id number assigned by the IPUMS extract system. In the case of your first extract, this code will return
You can use this extract ID number along with the data collection name to check on or download your extract later if you lose track of the original extract object.
After your extract has been submitted, you can check its status using
While IPUMS retains all of a user’s extract definitions, after a certain period, the extract data and syntax files are purged from the IPUMS cache. Importantly, if an extract’s data and syntax files have been purged, the extract is still considered to have been completed, and
extract_status() will return “completed.”
# extract number 1 has been purged ipums.extract_status(collection="usa", extract="1")
If an extract has been purged:
For extracts that have had their files purged, the data collection name and extract ID number can be used to resubmit the old extract. Note that resubmitting a purged extract results in a new extract with its own unique ID number!
resubmitted_extract = ipums.resubmit_purged_extract(collection="usa", extract="1") resubmitted_extract.extract_id
Not all features available through the IPUMS extract web UI are currently supported for extracts made via API. For a list of currently unsupported features, see the developer documentation. This list will be updated as more features become available.