- ipumspy.readers.read_microdata_chunked(ddi, filename=None, encoding=None, subset=None, chunksize=None, dtype=None, **kwargs)¶
Read in microdata in chunks as specified by the Codebook. As these files are often large, you may wish to filter or read in chunks. As an example of how you might do that, consider the following example that filters only for rows in Rhode Island:
iter_microdata = read_microdata_chunked(ddi, chunksize=1000) df = pd.concat([df[df['STATEFIP'] == 44]] for df in iter_microdata])
Codebook) – The codebook representing the data
dict]) – A dictionary with variable names as keys and variable types as values. Has an effect only when used with pd.read_fwf or pd.read_csv engine. If None, pd.read_fwf or pd.read_csv use type ddi.data_description.pandas_type for all variables. See ipumspy.ddi.VariableDescription for more precision on ddi.data_description.pandas_type. If files are csv, and dtype is not None, pandas converts the column types once: on pd.read_csv call. When file format is .dat or .csv and dtype is None, two conversion occur: one on load, and one when returning the dataframe.
kwargs – keyword args to be passed to pd.read_fwf
An iterator of data frames
- Return type