Change Log#

All noteable changes to this project are documented on this page. This project adheres to Semantic Versioning.



  • Bug Fixes

    • Updated the minimum required version for pyYAML





  • Breaking Changes

    • This release marks the beginning of support for IPUMS API version 2 and ipumspy no longer supports requests to version 1 or version beta of the IPUMS API. This means that extract definitions created and saved to files using previous versions of ipumspy can no longer be submitted as-is to the IPUMS API using this library! These definitions can be modified for use with v0.3.0 of ipumspy and IPUMS API version 2 by changing the data_format key to dataFormat and the data_structure key to dataStructure. More information on versioning of the IPUMS API and breaking changes in version 2 can be found at the IPUMS developer portal.

    • The resubmit_purged_extract() method has been removed; use submit_extract() instead.

    • The extract_was_purged() method has been renamed to extract_is_expired().

    • The CollectionInformation class has been removed. To retrieve information about available samples in a collection, use get_all_sample_info()

    • The define_extract_from_ddi() method has been removed.

    • The retrieve_previous_extracts() method has been renamed to get_previous_extracts()

  • New Features

    • Support for IPUMS API version 2 features!

      • Added attach_characteristics()

      • Added select_cases()

      • Added add_data_quality_flags()

      • Added optional data_quality_flags keyword argument to IPUMS extract classes to include all available data quality flags for variables in the extract

      • Added optional select_case_who keyword argument to IPUMS extract classes to specify that the extract should include all individuals in households that contain a person with the specified select_cases() characteristics.

      • Added support for requesting hierarchical extracts: {"hierarchical": {}} is now an acceptable value for data_structure

      • Added IpumsiExtract class to support IPUMS International extract requests

      • Added get_extract_history() generator to allow for perusal of extract histories

    • Added get_extract_by_id() which creates a new (unsubmited) extract object from an IPUMS collection a previously submitted extract id number

    • Added support for reading hierarchical extract files in read_hierarchical_microdata()

  • Bug Fixes



  • New minimum python version: Python 3.8

  • Officially support Python 3.11



  • Officially support Python 3.10



  • Update requirement to beautifulsoup4 instead of bs4



  • New minimum python version: Python 3.7.1

  • Added support for IPUMS CPS extracts with CpsExtract

  • Added CollectionInformation class to access collection-level information about IPUMS data.

  • Added ability to download Stata, SPSS, SAS, and R command files with data files download_extract().

  • Added extract_to_dict() and extract_from_dict() method to enable easy exporting of extract objects to dictionary objects and creation of extract objects from dictionaries.

  • Added define_extract_from_ddi() method to re-create an IPUMS extract object from a DDI codebook.

  • Added convenience method save_extract_as_json() to save IPUMS extract definition to json file.

  • Added convenience method define_extract_from_json() to read an IPUMS extract definition from a json file.

  • Added IpumsExtractNotSubmitted() exception. This will be raised when attempting to retrieve an extract id or download link from a extract that has not been submitted to the IPUMS extract engine.

  • Added get_all_types() method to access all types of ddi codebook variables in an easy way.

  • Added parameter string_pyarrow to get_all_types() method. If this parameter is set to True and used in conjunction with parameter type_format=”pandas_type” or type_format=”pandas_type_efficient”, then the string column dtype (pandas.StringDtype()) is overriden with pandas.StringDtype(storage=”pyarrow”). Useful for users who want to convert an IPUMS extract in csv format to parquet format. The dictionary returned by this method can then be used in the dtype argument of read_microdata() or read_microdata_chunked().

  • Added pandas_type_efficient(). This type format is more efficient than pandas_type and is a sort of mix between pandas_type and numpy_type. Integer and float variables are coded as numpy.float64, string as pandas.StringDtype().



  • This is the initial version of ipumspy.