piaso.preprocessing package#
- piaso.preprocessing.getCrossCategories(df, col1, col2, delimiter='@', iterate_by_second_column=True)#
Generates a new categorical column from the cross combinations of two specified columns in a DataFrame, respecting existing categorical orders if present. The iteration order of combination can be controlled, and a custom delimiter can be used to join the column values.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the columns to be combined.
col1 (str) – Name of the first column to combine.
col2 (str) – Name of the second column to combine.
delimiter (str, optional) – Delimiter used to join the column values. Defaults to ‘@’.
iterate_by_second_column (bool, optional) – If set to True, the function iterates by the values of the second column first when generating the combined categories. Defaults to True.
- Returns:
A Pandas Categorical series of the combined columns with a defined order.
- Return type:
pd.Categorical
- piaso.preprocessing.table(values, rank: bool = False, ascending: bool = False, as_dataframe: bool = False)#
Returns the counts of unique values in the given list.
- Parameters:
values (list) – A list of values for which the counts are to be calculated.
rank (bool, optional) – If True, the results are sorted by count. Default is False.
ascending (bool, optional) – If True and rank is True, the results are sorted in ascending order. If False and rank is True, the results are sorted in descending order. Default is False.
as_dataframe (bool, optional) – If True, the result is returned as a pandas DataFrame with columns ‘value’ and ‘count’. If False, the result is returned as a dictionary. Default is False.
- Returns:
A dictionary (or DataFrame, if as_dataframe is True) containing the counts of unique values. If rank is True, the dictionary is sorted by count.
- Return type:
dict or pandas.DataFrame