プロセス - tasuwo's notes

プロセス

プロセスは、動作中であるプログラムのインスタンスを指す。Linux を初め、大抵の OS はこのプロセスを生成する機能を持つ。

Linux におけるプロセス生成

Linux においてプロセスの生成に深く関わる関数には、fork と execve がある。既に動いているプログラムと同じプログラムをマルチプロセスで動かしたい場合には fork、全く新しいプロセスを生成したい場合には execve が利用できる。

fork

fork は子プロセスを生成する。ここで生成される子プロセスは、呼び出し元のプロセスをコピーとなる。この時、呼び出し元のプロセスが親プロセス、生成されたプロセスが子プロセスという関係を持つ。fork が実行されたタイミングで親プロセスのメモリ空間がそのまま子プロセスにコピーされ、親/子プロセスは各々別のメモリ空間上で動作するようになる。そのため、メモリへの書き込みやマッピング/アンマッピングは、互いに影響を及ぼさない。

execve

execve は、プログラムを実行する。引数としてファイルを受け取ることができるが、この時受け取ったファイルは、実行可能ファイルか、あるいは #! interpreter [optional-arg] から始まるスクリプトである必要がある。execve は受け取ったファイルを実行して、実行元のプロセス及びそのメモリ空間を上書きする。

プロセスの生成

通常は、fork and exec の組み合わせによって、新しいプログラムを実行する。例えば、bash が echo を実行する手順は以下のようになる。

1. bash が fork を実行する

2. 親の bash のメモリから子プロセスの bash のメモリとしてコピーされる

3-a. 親プロセスの bash はそのまま処理を続行する

3-b. 子プロセスの bash は exec を発行し、自身のメモリ空間を echo のために上書きする

プロセスと負荷

ロードアベレージ

待ちプロセス (TASK_RUNNING, TASK_UNINTERRUPTIBLE) を単位時間 (Timer Interrapt) で割った値

TASK_RUNNING: CPUの実行権限が与えられるのを待っているプロセス

TASK_UNINTERRUPTIBLE: ディスクI/O が完了するのを待っているプロセス

ロードアベレージの確認の仕方。

code:shell

# 左から、1, 5, 15 分間のロードアベレージ

# 4 つ目は <プロセス数>/<プロセス総数>

# 5 つ目は最後に走ったプロセスの PID

ec2-user@ ~$ cat /proc/loadavg

0.00 0.00 0.00 1/89 2710

https://www.centos.org/docs/5/html/5.1/Deployment_Guide/s2-proc-loadavg.html

https://access.redhat.com/documentation/ja-jp/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-loadavg

top コマンドは、リアルタイムにシステムの様子を監視できる。右上あたりに確認できる。

code:shell

top - 13:46:45 up 16 min, 1 user, load average: 0.00, 0.00, 0.00

Tasks: 75 total, 1 running, 50 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 1009424k total, 164300k used, 845124k free, 9384k buffers

Swap: 0k total, 0k used, 0k free, 101920k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1 root 20 0 19692 2592 2272 S 0.0 0.3 0:01.10 init

2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd

3 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0

4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H

5 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/u30:0

uptime も似たような感じ。基本的には 1, 5, 15 分で計測したものが見れるようだ。

http://man7.org/linux/man-pages/man1/top.1.html

Linux カーネルコードにおけるプロセス

Linux カーネルのコードを読んでみる。バージョンにより割と異なるが、とりあえず最新バージョンを読んでみることにする。タトバ、サーバ/インフラを支える技術内で参照しているカーネルコードのバージョンは 2.6.23 だった。今回は 4.18 のコードを読んでみる。

プロセスディスクリプタ

プロセスは、プロセスディスクリプタと呼ばれる task_struct 構造体に情報を持つ。task_struct 構造体は include/linux/sched.h で定義されている。

https://elixir.bootlin.com/linux/v4.18.3/source/include/linux/sched.h

タスクの状態に対するビットマスクが定義されている。実行可能状態と終了状態が各々定義されているようだ。

code:c

* Task state bitmask. NOTE! These bits are also

* encoded in fs/proc/array.c: get_task_state().

* We have two separate sets of flags: task->state

* is about runnability, while task->exit_state are

* about the task exiting. Confusing, but this way

* modifying one set can't modify the other one by

* mistake.

/* Used in tsk->state: */

#define TASK_RUNNING 0x0000

#define TASK_INTERRUPTIBLE 0x0001

#define TASK_UNINTERRUPTIBLE 0x0002

#define __TASK_STOPPED 0x0004

#define __TASK_TRACED 0x0008

/* Used in tsk->exit_state: */

#define EXIT_DEAD 0x0010

#define EXIT_ZOMBIE 0x0020

#define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD)

/* Used in tsk->state again: */

#define TASK_PARKED 0x0040

#define TASK_DEAD 0x0080

#define TASK_WAKEKILL 0x0100

#define TASK_WAKING 0x0200

#define TASK_NOLOAD 0x0400

#define TASK_NEW 0x0800

#define TASK_STATE_MAX 0x1000

ロードアベレージ

定義場所はバージョン間で割と変更されていて、4.18 では /kernel/sched/loadavg.c に移されている。

ロードアベレージの基本的な計算は以下のように行われる。と、ファイル頭で示されている。CPU 毎に TASK_RUNNING, TASK_UNITERRUPTIBLE の数を数え (正確には各 CPU におけるランキュー nr_running, nr_uninterruptible を参照し)、それらをアクティブなタスク (nr_active) として捉え、グローバルな配列 avenrun に格納する、という流れに見える。グローバルなロードアベレージとは、つまり TASK_RUNNING と TASK_UNINTERRUPTIBLE の和の平均値、である。

for_each_possible_cpu() は、大量の CPU を備えたマシンでは高価になるので、分散的にこれを計算する必要がある。

code:c

nr_active = 0;

for_each_possible_cpu(cpu)

nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;

avenrunn = avenrun0 * exp_n + nr_active * (1 - exp_n)

code:c

// SPDX-License-Identifier: GPL-2.0

* kernel/sched/loadavg.c

* This file contains the magic bits required to compute the global loadavg

* figure. Its a silly number but people think its important. We go through

* great pains to make it work on big machines and tickless kernels.

#include "sched.h"

* Global load-average calculations

* We take a distributed and async approach to calculating the global load-avg

* in order to minimize overhead.

* The global load average is an exponentially decaying average of nr_running +

* nr_uninterruptible.

* Once every LOAD_FREQ:

* nr_active = 0;

* for_each_possible_cpu(cpu)

* nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;

* avenrunn = avenrun0 * exp_n + nr_active * (1 - exp_n)

* Due to a number of reasons the above turns in the mess below:

* - for_each_possible_cpu() is prohibitively expensive on machines with

* serious number of CPUs, therefore we need to take a distributed approach

* to calculating nr_active.

* \Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0

* = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }

* So assuming nr_active := 0 when we start out -- true per definition, we

* can simply take per-CPU deltas and fold those into a global accumulate

* to obtain the same result. See calc_load_fold_active().

* Furthermore, in order to avoid synchronizing all per-CPU delta folding

* across the machine, we assume 10 ticks is sufficient time for every

* CPU to have completed this task.

* This places an upper-bound on the IRQ-off latency of the machine. Then

* again, being late doesn't loose the delta, just wrecks the sample.

* - cpu_rq()->nr_uninterruptible isn't accurately tracked per-CPU because

* this would add another cross-CPU cacheline miss and atomic operation

* to the wakeup path. Instead we increment on whatever CPU the task ran

* when it went into uninterruptible state and decrement on whatever CPU

* did the wakeup. This means that only the sum of nr_uninterruptible over

* all CPUs yields the correct result.

* This covers the NO_HZ=n code, for extra head-aches, see the comment below.

参考文献

https://qiita.com/Kohei909Otsuka/items/26be74de803d195b37bd

matplotlib

武内覚 (2018/2/23)［試して理解］Linuxのしくみ～実験と図解で学ぶOSとハードウェアの基礎知識

サーバ/インフラを支える技術 ‾スケーラビリティ、ハイパフォーマンス、省力運用 (WEB+DB PRESS plusシリーズ)

https://qiita.com/satoru_takeuchi/items/aa626e067be12ac935d0