seaborn - pokutuna

seaborn

seaborn: statistical data visualization — seaborn 0.12.2 documentation

記事

【Python】seabornで綺麗なグラフ作成を！たった1行で書けます | Smart-Hint

seabornの２種の神器、relplotとcatplot | やましなぶろぐ

Pythonデータ可視化に使えるseaborn 25メソッドデータ分析 - Qiita 例があってよい

seabornの細かい見た目調整をあきらめない Python - Qiita

簡単な使い方

https://gyazo.com/97e07b335bfaa9d92f01a8612496dcdd

基本的に一番上のやつだけ使って kind= で指定

relplot / displot / catplot

どれも kind= でグラフの種類 & col= で横に並べられる

data, x, y, col, hue はよく指定する

col, row, hue の3つの軸で分けつつ複数グラフ同時に書ける

数値同士の関係見るなら relplot

seaborn.relplot — seaborn 0.12.2 documentation

カテゴリと値なら catplot

seaborn.catplot — seaborn 0.12.2 documentation

分布を見るなら displot

seaborn.displot — seaborn 0.12.2 documentation

1行の各列の値ごとにヒストグラムにしたいことが多いので melt で縦持ちに整える → 縦持ち横持ち

code:facet_hist.py

sns.displot(

pd.melt(df, id_vars='id', 'label', value_vars='foo', 'bar', 'baz),

kind='hist',

col='variable',

hue='label', # 重ねず行増やすなら row で

x='value',

bins=30,

kde=True,

# log_scale=True,

# facet_kws={'sharey': False, 'sharex': False},

)

これら + heatmap でだいたい済む

seaborn.heatmap

relational

sns.lineplot

sns.scatterplot

categorical

seaborn.histplot

ヒストグラム

hue='col' で複数カラム重ねる

multiple='stack' で積み上げ

multiple='dodge' で横並べ

multiple='fill' で積み上げ 100%

seaborn.boxplot — seaborn 0.12.2 documentation

showfliers=False 外れ値表示しない

whis=1.5 外れ値の範囲、デフォルトは 1.5 IQR

タプルを渡すとパーセンタイル指定、whis=(0, 100) で全体をヒゲに含める

seaborn.boxenplot — seaborn 0.12.2 documentation

boxplot の区分多くするやつ

seaborn.violinplot — seaborn 0.12.2 documentation

boxplot より複数のピークが見れてよい

四分位 + カーネル密度推定

Suggestion: half violinplot half histogram · Issue 2152 · mwaskom/seaborn

えーこれ良いと思うんだけどな

他

seaborn.heatmap

ヒートマップ、dataframe で値域が一定でないと意味ないかな?

sns.heatmap(df.corr())

seaborn.jointplot

散布図 + 各軸にグラフ

sns.jointplot(data=df, x='x', y='y') が基本

hue='col' で col ごとに色分け

普段から relplot これでやってもいいか?

seaborn.regplot

回帰線つける、あんまりつかわないか

seaborn.lmplot

regplot をカテゴリ別に重ねる、こっちのほうが使うかな

hue= 指定するだけ

seaborn.pairplot

列の2つのペアごとに可視化する、関係性を見る

seaborn.PairGrid — seaborn 0.12.2 documentation の定番の shorthand

sns.pairplot(df, corner=True) あたりをよく使いそうかな? 下半分でいいし

sns.pairplot(df, kind='reg') は散布図に重ねて回帰線書けるので単に情報増えるから使いがち

diag_kind='kde' も

FacetGrid

seaborn.FacetGrid

複数グラフ並べる、FacetGrid で group by を宣言して map で実際のグラフの表示を指定する

code:facetgrid.py

g = sns.FacetGrid(tips, col="time", row="sex")

g.map(sns.scatterplot, "total_bill", "tip")

軸を共通にしないなら

sharex=False, sharey=False

軸を各 subplot に表示したいが、グラフの範囲(値域)は共通にしたい場合

sharex=False, sharey=False 共有をやめつつ、軸の範囲を計算して共通で与えるのが楽

sns.FacetGrid(..., xlim=(0, 10), ylim=(0, 100))

heatmap なら vmin, vmax を計算して与える(↓の例のように)

FacetGrid で heatmap 出す

map_dataframe で pivot する

code:facet_heatmap.py

g = sns.FacetGrid(df, col="category")

def plot_heatmap(data, **kwargs):

pivot = data.pivot(index="x", columns="y", values="value")

sns.heatmap(pivot, **kwargs,)

# map_dataframe は FacetGrid の条件のサブセットの DataFrame ごとに呼ばれる

g.map_dataframe(

plot_heatmap,

annot=True, # 値出す

cbar=False

# 軸は手作りするより min~max を共通にするのが楽

vmin=df'value'.min(),

vmax=df'value'.max()),

)

グラフ間のスペースちょっとだけ広げる

matplotlib.pyplot.subplots_adjust — Matplotlib 3.1.0 documentation

g.figure.subplots_adjust(wspace=0.3, hspace=0.3)

単位は平均の軸幅、軸高さによるものらしい(wspace=0.5 = 各サブグラフの横幅の半分?)

1 だとこれ

https://gyazo.com/5d96791f4c0a7c35af7647a78c8c2798

軸ラベルや軸タイトルがかぶるときに xticklabels とかでいじるより短くてハマりにくいと思う

square=True にしたい、という欲がレイアウトを狂わせるので諦めるのも吉

凡例をグラフの外に出す

plt.legend(loc="upper left", bbox_to_anchor=(1, 1))

あるいは

code:legend.py

ax = sns.hogeplot(...)

sns.move_levend(ax, "upper left", bbox_to_anchor=(1, 1))

サイズ調整

しぶしぶ matplotlib 使う、調整はしやすい

code:size.py

import matplotlib.pyplot as plt

plt.figure(figsize=(8, 4.5))

sns.scatterplot(...)

ラベルがかぶる

大抵 x 軸ラベル

code:rotate_labels.py

import seaborn as sns

import matplotlib.pyplot as plt

# matplotlib 経由で触る

sns.barplot(x='カテゴリ', y='値', data=df)

plt.xticks(rotation=90)

# グラフ

g = sns.relplot(data=df, x="day", y="total_bill", kind="line")

g.tick_params(axis="x", labelrotation=90) # これがいちばん穏当かな

カラーパレット

seaborn.set_palette — seaborn 0.12.2 documentation

sns.set_palette で全体に

各グラフに palette= や cmap= で都度指定

Choosing color palettes — seaborn 0.13.2 documentation

Seabornのカラーパレットの選び方 - Qiita

連続データ

rocket

https://gyazo.com/cbd59645fd60c110e64f5b3eacd9955d

mako

https://gyazo.com/440d0d63ba47c33b37f491cc370d2283

viridis

https://gyazo.com/3e1e72f2572f49860e9dd6778a9bf12f

まあ単に単色のほうがわかりやすいことも

seagreen

https://gyazo.com/70bdd6dcd28f473b798a8375dbd2418d

Blues

https://gyazo.com/5e48a2ebef0ed3729aad4260e60ca62c

相関関係、中央から両端へ

coolwarm

https://gyazo.com/cd59228d98329ba82dd35b4dea607bfb

Spectral

https://gyazo.com/fb27ac2808dd396b9a314331f7fa224a

icefire

https://gyazo.com/fa4f1ab61f8917e4118887ef5fecf31c

逆にするなら_r をつける

多クラスの可視化

palette が足りない時

Collection of perceptually accurate colormaps — colorcet v3.1.0 で要素数分のパレット作って渡す

How to make a color map with many unique colors in seaborn - Stack Overflow

code:colorcet.py

import colorcet as cc

palette = sns.color_palette(cc.glasbey, n_colors=len(set(labels)))

sns.scatterplot(x=embedding_2d:, 0, y=embedding_2d:, 1, hue=labels, palette=palette)

まあ1つ1つの要素見るというよりクラスタリングの結果可視化するときに...

https://gyazo.com/f1b212abdb2d178f3fd1ab56ff76459d

あるいは color * marker で順繰りに使う

code:markers.py

import math

colors = sns.color_palette("tab10", 10)

markers = "o", "^", "D", "X", "s", "H", "P", "h"

markers_map = {label: markersmath.floor(i / len(colors)) % len(markers) for i, label in enumerate(set(labels))}

# 特定のエラー値だけ上書きして置き換える

# markers_map-1 = "*"

sns.scatterplot(

x=embedding_2d:, 0,

y=embedding_2d:, 1,

hue=labels,

style=labels,

markers=markers_map,

palette=colors,

)

https://gyazo.com/fbcb0f75785b16c08120b99f56914148

見た目

Properties of Mark objects — seaborn 0.12.2 documentation

seaborn.set_style — seaborn 0.12.2 documentation

sns.set_style('whitegrid') が多いかな?

matplotlib の出力にも影響する

seaborn.set_theme — seaborn 0.12.2 documentation

markers

https://seaborn.pydata.org/tutorial/properties.html#marker

seaborn: scatterplot の markers に指定できる文字 - Qiita

複数列で hue

まあ結合して列を作って指定

python - Multiple Columns for HUE parameter in Seaborn violinplot - Stack Overflow

時刻フォーマットがダルくなりがちなのでどうするか調べる

普通に dataframe の plot のほうがましだったりする

histplot

discrete=True

x 軸のラベルが棒グラフなどのまんなかに来る

shrink=0.8 などで横と隙間を開ける

stat='probability' で高さの合計を 1.0 に

seaborn.histplot — seaborn 0.13.0 documentation

python - Seaborn Plot Distribution with histogram with stat = density or probability? - Stack Overflow

#Python