piaso.preprocessing package#

piaso.preprocessing.getCrossCategories(df, col1, col2, delimiter='@', iterate_by_second_column=True)#

Generates a new categorical column from the cross combinations of two specified columns in a DataFrame, respecting existing categorical orders if present. The iteration order of combination can be controlled, and a custom delimiter can be used to join the column values.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the columns to be combined.

  • col1 (str) – Name of the first column to combine.

  • col2 (str) – Name of the second column to combine.

  • delimiter (str, optional) – Delimiter used to join the column values. Defaults to ‘@’.

  • iterate_by_second_column (bool, optional) – If set to True, the function iterates by the values of the second column first when generating the combined categories. Defaults to True.

Returns:

A Pandas Categorical series of the combined columns with a defined order.

Return type:

pd.Categorical

piaso.preprocessing.table(values, rank: bool = False, ascending: bool = False, as_dataframe: bool = False)#

Returns the counts of unique values in the given list.

Parameters:
  • values (list) – A list of values for which the counts are to be calculated.

  • rank (bool, optional) – If True, the results are sorted by count. Default is False.

  • ascending (bool, optional) – If True and rank is True, the results are sorted in ascending order. If False and rank is True, the results are sorted in descending order. Default is False.

  • as_dataframe (bool, optional) – If True, the result is returned as a pandas DataFrame with columns ‘value’ and ‘count’. If False, the result is returned as a dictionary. Default is False.

Returns:

A dictionary (or DataFrame, if as_dataframe is True) containing the counts of unique values. If rank is True, the dictionary is sorted by count.

Return type:

dict or pandas.DataFrame