Window関数を簡単な例を見て理解する

Window関数自体のメンタルモデルと、使っているツールのSyntaxの2つの理解が必要

SQLと比べると、Polarsのsyntaxの方がかなり単純mrsekut.icon

なので、単にWindow関数のという概念について理解するならPolarsの例の方が良さそう

Polarsのguideを参考にする

Window functions - Polars user guide

Pokemonのcsvを落としてくる

code:py

import polars as pl

df = pl.read_csv(

"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv"

)

df.head()

table:result

# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary

i64 str str str i64 i64 i64 i64 i64 i64 i64 i64 bool

1 "Bulbasaur" "Grass" "Poison" 318 45 49 49 65 65 45 1 false

2 "Ivysaur" "Grass" "Poison" 405 60 62 63 80 80 60 1 false

3 "Venusaur" "Grass" "Poison" 525 80 82 83 100 100 80 1 false

3 "VenusaurMega Venusaur" "Grass" "Poison" 625 80 100 123 122 120 80 1 false

4 "Charmander" "Fire" null 309 39 52 43 60 50 65 1 false

「Type 1ごとの攻撃力」の平均値を求めて表示する

単純にやるなら、group byで集約するが、今回は集約せずに、行数を変えずに表示するmrsekut.icon

例えば、以下の様に書くと、「全体の攻撃力」の平均値を求めて、全ての行にそのまま出している

code:py

out = (

df.select(

pl.col("Type 1"),

pl.col("Attack")

.mean()

.alias("avg_attack"),

)

out

そのため、全ての行で同じ値が表示されている

table:result

Type 1 mean_attack

str f64

"Grass" 75.349693

"Fire" 75.349693

今回は、Type 1ごとに平均を求めたいのでWindow関数を使う

PolarsのWindow関数はpl.Expr.over()

code:py

out = (

df.select(

pl.col("Type 1"),

pl.col("Attack")

.mean()

.over("Type 1") # これを追加するだけ

.alias("avg_attack_by_type"),

)

out

table:result

Type 1 mean_attack

str f64

"Grass" 72.923077

"Fire" 88.642857

… …

"Fire" 88.642857

"Dragon" 94.0

"Psychic" 53.875

グループごとの操作にも使える

例えばグループごとにsortするとかできる

例えば、Type 1毎にSpeedをsortしてみる

まず、これは全体に対して、Speedでsortしている

code:py

out = (

df.select(

pl.col("Type 1"),

pl.col("Speed").sort_by("Speed")

)

pl.Expr.over()を追記することで、Type 1ごとにsortできる

code:py

out = (

df.select(

pl.col("Type 1"),

pl.col("Speed").sort_by("Speed").over("Type 1")

)

out

table:result

Type 1 Speed

str i64

"Grass" 30

"Grass" 40

"Fire" 60

… …

"Fire" 105

"Dragon" 50

"Dragon" 70

"Dragon" 80

"Psychic" 150