Pandasのaggregateの処理をもっと綺麗に書きたい
例 1)
aggregateしたい関数毎にloop回してしまう
code:bad_sample.py
import pandas as pd
df = pd.read_csv(path)
for agg_function in agg_functions:
df.groupby(key).agg(agg_function)
関数をリストに入れてしまえばよい
code:good_sample.py
import pandas as pd
df = pd.read_csv(path)
df.groupby(key).agg(agg_functions)
例 2)
aggregateしたいグルーピングしたkey * 集計対象のカラム * 関数毎にカラム名を変更したい、そしてそれらカラム名を取り出したい
code:bad_sample.py
import pandas as pd
df = pd.read_csv(path)
key = 'hogehoge'
target_col = 'hugahuga'
group_aggs = df.groupby(key)target_col.agg(agg_functions)
group_aggs.columns = f'{merge_key}_{target_col}_{agg_func}' for agg_func in group_aggs.columns
関数をリストに入れてしまえばよい
code:good_sample.py
import pandas as pd
df = pd.read_csv(path)
key = 'hogehoge'
target_col = 'hugahuga'
group_aggs = df.groupby(key)target_col.agg(agg_functions)
group_aggs.columns = f'{merge_key}_{target_col}_{agg_func}' for agg_func in group_aggs.columns
参考
https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html?highlight=namedagg#named-aggregation
https://github.com/pfnet-research/xfeat/blob/master/xfeat/helper.py