ipumspy.readers.read_hierarchical_microdata(ddi, filename=None, encoding=None, subset=None, dtype=None, as_dict=True, **kwargs)[source]#

Read in microdata as specified by the Codebook. Both .dat and .csv file types are supported.

  • ddi (Codebook) – The codebook representing the data

  • filename (Union[str, Path, IOBase, None]) – The path to the data file. If not present, gets from ddi and assumes the file is relative to the current working directory

  • encoding (Optional[str]) – The encoding of the data file. If not present, reads from ddi

  • subset (Optional[List[str]]) – A list of variable names to keep. If None, will keep all

  • dtype (Optional[dict]) – A dictionary with variable names as keys and variable types as values. Has an effect only when used with pd.read_fwf or pd.read_csv engine. If None, pd.read_fwf or pd.read_csv use type ddi.data_description.pandas_type for all variables. See ipumspy.ddi.VariableDescription for more precision on ddi.data_description.pandas_type. If files are csv, and dtype is not None, pandas converts the column types once: on pd.read_csv call. When file format is .dat or .csv and dtype is None, two conversion occur: one on load, and one when returning the dataframe.

  • as_dict (Optional[bool]) – A flag to indicate whether to return a single data frame or a dictionary with one data frame per record type in the extract data file. Set to True to recieve a dictionary of data frames.

  • kwargs – keyword args to be passed to the engine (pd.read_fwf, pd.read_csv, or pd.read_parquet depending on the file type)

Return type:

Union[DataFrame, Dict]


pandas data frame or a dictionary of pandas data frames