polars.DataFrame.group_by()

複数の条件でのgroupingもできる

code:py

result = df.group_by(

(pl.col("birthdate").dt.year() // 10 * 10).alias("decade"),

(pl.col("height") < 1.7).alias("short?"),

).agg(pl.col("name"))

print(result)

結果の型はlistになる

code:_

shape: (3, 3)

┌────────┬────────┬─────────────────────────────────┐

│ decade ┆ short? ┆ name │

│ --- ┆ --- ┆ --- │

│ i32 ┆ bool ┆ liststr │

╞════════╪════════╪═════════════════════════════════╡

│ 1990 ┆ true ┆ "Alice Archer" │

│ 1980 ┆ false ┆ ["Ben Brown", "Daniel Donovan"… │

│ 1980 ┆ true ┆ "Chloe Cooper" │

└────────┴────────┴─────────────────────────────────┘

polars: GroupBy.agg()も複数の集計式を書ける

code:py

result = df.group_by(

(pl.col("birthdate").dt.year() // 10 * 10).alias("decade"),

(pl.col("height") < 1.7).alias("short?"),

).agg(

pl.len(),

pl.col("height").max().alias("tallest"),

pl.col("weight", "height").mean().name.prefix("avg_"),

)

print(result)

pl.len()でその長さを取得できる

.maxとか.meanで計算できてるのもすごいmrsekut.icon

PolarsのExpressionとPolarsのContextsが分離されているからこそ実現できている

code:_

shape: (3, 6)

┌────────┬────────┬─────┬─────────┬────────────┬────────────┐

│ decade ┆ short? ┆ len ┆ tallest ┆ avg_weight ┆ avg_height │

│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │

│ i32 ┆ bool ┆ u32 ┆ f64 ┆ f64 ┆ f64 │

╞════════╪════════╪═════╪═════════╪════════════╪════════════╡

│ 1980 ┆ true ┆ 1 ┆ 1.65 ┆ 53.6 ┆ 1.65 │

│ 1980 ┆ false ┆ 2 ┆ 1.77 ┆ 77.8 ┆ 1.76 │

│ 1990 ┆ true ┆ 1 ┆ 1.56 ┆ 57.9 ┆ 1.56 │

└────────┴────────┴─────┴─────────┴────────────┴────────────┘