pandasの重要な機能
再インデックス付け
pandasのオブジェクトの非常に重要なメソッドに、reindexがあります。このメソッドは、新しいインデックスに従ったデータを持つ新しいオブジェクトを作成します。
code: Python
import pandas as pd
code: Python
obj
--------------------------------------------------------------------------
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64
--------------------------------------------------------------------------
code: Python
obj2
--------------------------------------------------------------------------
a -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64
--------------------------------------------------------------------------
code: Python
obj3
--------------------------------------------------------------------------
0 blue
2 purple
4 yellow
dtype: object
--------------------------------------------------------------------------
reindexはmethodオプションがあります。ffillは前方に穴埋めします。
code: Python
obj3.reindex(range(6), method='ffill')
--------------------------------------------------------------------------
0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object
--------------------------------------------------------------------------
bfillは後方に穴埋めします。
code: Python
obj3.reindex(range(6), method='bfill')
--------------------------------------------------------------------------
0 blue
1 purple
2 purple
3 yellow
4 yellow
5 NaN
dtype: object
--------------------------------------------------------------------------
code: Python
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
frame
--------------------------------------------------------------------------
Ohio Texas California
a 0 1 2
c 3 4 5
d 6 7 8
--------------------------------------------------------------------------
code: Python
frame.reindex(columns=states)
--------------------------------------------------------------------------
Texas Utah California
a 1 NaN 2
c 4 NaN 5
d 7 NaN 8
--------------------------------------------------------------------------
軸から要素を削除する
dropメソッドを使うと、指定した要素が軸から削除された新しいオブジェクトを作成します。
code: Python
obj
--------------------------------------------------------------------------
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
--------------------------------------------------------------------------
code: Python
new_obj = obj.drop('c')
new_obj
--------------------------------------------------------------------------
a 0.0
b 1.0
d 3.0
e 4.0
dtype: float64
--------------------------------------------------------------------------
code: Python
--------------------------------------------------------------------------
a 0.0
b 1.0
e 4.0
dtype: float64
--------------------------------------------------------------------------
code: Python
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
data
--------------------------------------------------------------------------
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
--------------------------------------------------------------------------
code: Python
--------------------------------------------------------------------------
one two three four
Utah 8 9 10 11
New York 12 13 14 15
--------------------------------------------------------------------------
code: Python
# data.drop('two', axis=1)
--------------------------------------------------------------------------
one three
Ohio 0 2
Colorado 4 6
Utah 8 10
New York 12 14
--------------------------------------------------------------------------
直接削除する
code: Python
obj.drop('c', inplace=True)
obj
--------------------------------------------------------------------------
a 0.0
b 1.0
d 3.0
e 4.0
dtype: float64
--------------------------------------------------------------------------
locとilocによるデータの選択
locやilocフィールドを使うと、NumPyのように軸を指定して、データフレームから行や列の一部分を選択することができます。軸のラベルを使うときはloc、整数のインデックス位置による参照を使うときはilocを使います。
code: Python
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
data
--------------------------------------------------------------------------
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
--------------------------------------------------------------------------
code: Python
--------------------------------------------------------------------------
two 5
three 6
Name: Colorado, dtype: int32
--------------------------------------------------------------------------
code: Python
--------------------------------------------------------------------------
four 11
one 8
two 9
Name: Utah, dtype: int32
--------------------------------------------------------------------------
code: Python
--------------------------------------------------------------------------
Ohio 0
Colorado 5
Utah 9
Name: two, dtype: int32
--------------------------------------------------------------------------
code: Python
--------------------------------------------------------------------------
one two three
Colorado 0 5 6
Utah 8 9 10
New York 12 13 14
--------------------------------------------------------------------------
code: Python
--------------------------------------------------------------------------
one 8
two 9
three 10
four 11
Name: Utah, dtype: int32
--------------------------------------------------------------------------
code: Python
data.iloc1, 2], [3, 0, 1
--------------------------------------------------------------------------
four one two
Colorado 7 0 5
Utah 11 8 9
--------------------------------------------------------------------------
算術メソッドと値の変換
算術メソッドには、add (radd), sub (rsub), div (rdiv), floordiv (rfloordiv), mul (rmul), pow (rpow)などがあります。
code: Python
df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)),
columns=list('abcd'))
df1
--------------------------------------------------------------------------
a b c d
0 0.0 1.0 2.0 3.0
1 4.0 5.0 6.0 7.0
2 8.0 9.0 10.0 11.0
--------------------------------------------------------------------------
code: Python
df2 = pd.DataFrame(np.arange(20.).reshape((4, 5)),
columns=list('abcde'))
df2
--------------------------------------------------------------------------
a b c d e
0 0.0 1.0 2.0 3.0 4.0
1 5.0 6.0 7.0 8.0 9.0
2 10.0 11.0 12.0 13.0 14.0
3 15.0 16.0 17.0 18.0 19.0
--------------------------------------------------------------------------
code: Python
df1.add(df2, fill_value=0)
--------------------------------------------------------------------------
a b c d e
0 0.0 2.0 4.0 6.0 4.0
1 9.0 5.0 13.0 15.0 9.0
2 18.0 20.0 22.0 24.0 14.0
3 15.0 16.0 17.0 18.0 19.0
--------------------------------------------------------------------------
データフレームとシリーズでの演算
code: Python
frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
columns=list('bde'),
frame
--------------------------------------------------------------------------
b d e
Utah 0.0 1.0 2.0
Ohio 3.0 4.0 5.0
Texas 6.0 7.0 8.0
Oregon 9.0 10.0 11.0
--------------------------------------------------------------------------
code: Python
series
--------------------------------------------------------------------------
b 0.0
d 1.0
e 2.0
Name: Utah, dtype: float64
--------------------------------------------------------------------------
code: Python
frame - series
--------------------------------------------------------------------------
b d e
Utah 0.0 0.0 0.0
Ohio 3.0 3.0 3.0
Texas 6.0 6.0 6.0
Oregon 9.0 9.0 9.0
--------------------------------------------------------------------------
関数の適用とマッピング
NumPyのufunc(配列の要素に適用可能なメソッド群)は、pandasのオブジェクトでも機能します。
code: Python
frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
frame
--------------------------------------------------------------------------
b d e
Utah -0.204708 0.478943 -0.519439
Ohio -0.555730 1.965781 1.393406
Texas 0.092908 0.281746 0.769023
Oregon 1.246435 1.007189 -1.296221
--------------------------------------------------------------------------
code: Python
np.abs(frame)
--------------------------------------------------------------------------
b d e
Utah 0.204708 0.478943 0.519439
Ohio 0.555730 1.965781 1.393406
Texas 0.092908 0.281746 0.769023
Oregon 1.246435 1.007189 1.296221
--------------------------------------------------------------------------
code: Python
f = lambda x: x.max() - x.min()
frame.apply(f)
--------------------------------------------------------------------------
b 1.802165
d 1.684034
e 2.689627
dtype: float64
--------------------------------------------------------------------------
code: Python
frame.apply(f, axis='columns')
--------------------------------------------------------------------------
Utah 0.998382
Ohio 2.521511
Texas 0.676115
Oregon 2.542656
dtype: float64
--------------------------------------------------------------------------
code: Python
def f(x):
frame.apply(f)
--------------------------------------------------------------------------
b d e
min -0.555730 0.281746 -1.296221
max 1.246435 1.965781 1.393406
--------------------------------------------------------------------------
要素ごとに適用するにはapplymapメソッドを使う。
code: Python
format = lambda x: '%.2f' % x
frame.applymap(format)
--------------------------------------------------------------------------
b d e
Utah -0.20 0.48 -0.52
Ohio -0.56 1.97 1.39
Texas 0.09 0.28 0.77
Oregon 1.25 1.01 -1.30
--------------------------------------------------------------------------
ソートとランク
行や列のインデックスを辞書順でソートするためには、sort_indexメソッドを使います。このメソッドは新しいソート済みのオブジェクトを返します。
code: Python
obj.sort_index()
--------------------------------------------------------------------------
a 1
b 2
c 3
d 0
dtype: int64
--------------------------------------------------------------------------
code: Python
frame = pd.DataFrame(np.arange(8).reshape((2, 4)),
frame.sort_index()
--------------------------------------------------------------------------
d a b c
one 4 5 6 7
three 0 1 2 3
--------------------------------------------------------------------------
code: Python
frame.sort_index(axis=1)
--------------------------------------------------------------------------
a b c d
three 1 2 3 0
one 5 6 7 4
--------------------------------------------------------------------------
code: Python
frame.sort_index(axis=1, ascending=False)
--------------------------------------------------------------------------
d c b a
three 0 3 2 1
one 4 7 6 5
--------------------------------------------------------------------------
特定の列だけソートする。
code: Python
frame.sort_values(by='b')
--------------------------------------------------------------------------
a b
2 0 -3
3 1 2
0 0 4
1 1 7
--------------------------------------------------------------------------
code: Python
obj
--------------------------------------------------------------------------
0 7
1 -5
2 7
3 4
4 2
5 0
6 4
dtype: int64
--------------------------------------------------------------------------
code: Python
obj.rank()
--------------------------------------------------------------------------
0 6.5
1 1.0
2 6.5
3 4.5
4 3.0
5 2.0
6 4.5
dtype: float64
--------------------------------------------------------------------------
code: Python
obj.rank(method='first')
--------------------------------------------------------------------------
0 6.0
1 1.0
2 7.0
3 4.0
4 3.0
5 2.0
6 5.0
dtype: float64
--------------------------------------------------------------------------