遺伝研GPUノード作業ログ
まずアカウント作ってもらう
FortiClient で VPN に繋ぐ
必要なのは配布されたユーザ名とパスワード
ssh でまずゲートウェイにログイン、その後アサインされたノードにログイン
今回は igt004
GPU載ってますかねちゃんと
code:sh
# inside igt004
$ nvidia-smi
Thu Jan 25 10:08:40 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB Off| 00000000:15:00.0 Off | 0 |
| N/A 34C P0 54W / 300W| 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2-16GB Off| 00000000:16:00.0 Off | 0 |
| N/A 35C P0 55W / 300W| 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2-16GB Off| 00000000:3A:00.0 Off | 0 |
| N/A 33C P0 54W / 300W| 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2-16GB Off| 00000000:3B:00.0 Off | 0 |
| N/A 36C P0 54W / 300W| 0MiB / 16384MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
ワオ
rootless docker の設定をやる
code:sh
$ mkdir -p /home/$(id -un)/.docker/run_$(hostname)
$ mkdir -p /home/$(id -un)/.config/docker
$ cat <<EOF > /home/$(id -un)/.config/docker/daemon.json
{"data-root":"/data1/rootless-docker-$(id -un)"}
EOF
$ dockerd-rootless.sh --experimental --storage-driver vfs &
...
...
$ docker info
Client: Docker Engine - Community
Version: 24.0.7
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.21.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
...
...
notebook の compose を持ってきてやっていく
code:sh
おまけ
code:sh
$ nvidia-smi
Fri Jan 26 16:14:15 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:18:00.0 Off | 0 |
| N/A 31C P0 114W / 700W| 2323MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:2A:00.0 Off | 0 |
| N/A 34C P0 117W / 700W| 529MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:3A:00.0 Off | 0 |
| N/A 32C P0 112W / 700W| 529MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:5D:00.0 Off | 0 |
| N/A 29C P0 110W / 700W| 529MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:9A:00.0 Off | 0 |
| N/A 28C P0 112W / 700W| 529MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:AB:00.0 Off | 0 |
| N/A 32C P0 114W / 700W| 529MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:BA:00.0 Off | 0 |
| N/A 31C P0 119W / 700W| 529MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:DB:00.0 Off | 0 |
| N/A 28C P0 115W / 700W| 529MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3955878 C python 528MiB |
| 1 N/A N/A 3955878 C python 526MiB |
| 2 N/A N/A 3955878 C python 526MiB |
| 3 N/A N/A 3955878 C python 526MiB |
| 4 N/A N/A 3955878 C python 526MiB |
| 5 N/A N/A 3955878 C python 526MiB |
| 6 N/A N/A 3955878 C python 526MiB |
| 7 N/A N/A 3955878 C python 526MiB |
+---------------------------------------------------------------------------------------+