ipumspy.ddi.Codebook.get_all_types#
- Codebook.get_all_types(type_format, string_pyarrow=False)[source]#
Retrieve all column types
- Parameters:
type_format (
str
) – type format. Should be one of [“numpy_type”, “pandas_type”, “pandas_type_efficient”, “python_type”, “vartype”]string_pyarrow (
bool
) – has an effect when True and used with type_format in [“pandas_type”, “pandas_type_efficient”]. In this case, string types==pd.StringDtype() is replaced with pd.StringDtype(storage=’pyarrow’).
- Return type:
- Returns:
A dict with column names column dtype mapping.
Examples
Let’s see an example of usage with pandas.read_csv engine:
>>> from ipumspy import readers >>> ddi_codebook = readers.read_ipums_ddi('extract_ddi.xml') >>> dataframe_dtypes = ddi_codebook.get_all_types(type_format='pandas_type', string_pyarrow=False) >>> df = readers.read_microdata(ddi=ddi_codebook, filename="extract.csv", dtype=dataframe_dtypes)
And an example of usecase of string_pyarrow set to True:
>>> from ipumspy import readers >>> ddi_codebook = readers.read_ipums_ddi('extract_ddi.xml') >>> dataframe_dtypes = ddi_codebook.get_all_types(type_format='pandas_type', string_pyarrow=True) >>> # No particular impact for reading from csv. >>> df = readers.read_microdata(ddi=ddi_codebook, filename="extract.csv", dtype=dataframe_dtypes) >>> # The benefit of using string_pyarrow: converting to parquet. The writing time is reduced. >>> df.to_parquet("extract.parquet") >>> # Also, the data loaded from the derived extract.parquet will be faster than if the csv file was converted >>> # using string_pyarrow=False