発現類似度の高速化 nim
2019/9/17
nimを使って、発現類似度の計算速度を向上させたい。
forkする
branchでdevelopを作る。
nimのinstall
①brew install nim
エラー→Error: Could not remove ghostscript keg! Do so manually sudo rm -rf /usr/local/Cellar/ghostscript/9.26_1
sudoが怖いから、別の方法を考える。。。
②公式を見る。
curl https://nim-lang.org/choosenim/init.sh -sSf | sh
うまくいった!
code:bash
choosenim-init: Downloading choosenim-0.4.0_macosx_amd64
Prompt: Can choosenim record and send anonymised telemetry data? y/n ... Anonymous aggregate user analytics allow us to prioritise
... fixes and features based on how, where and when people use Nim.
Answer: 
Answer: 
Answer: y
Downloading Nim 0.20.2 from nim-lang.org
Extracting nim-0.20.2.tar.gz
Building Nim 0.20.2
Building tools (nimble, nimgrep, nimpretty, nimsuggest)
Installed component 'nim'
Warning: Binary 'nim' is shadowed by '/usr/local/Cellar/nim/0.20.2/nim/bin/nim'.
Hint: Ensure that '/Users/yamada/.nimble/bin' is before '/usr/local/Cellar/nim/0.20.2/nim/bin' in the PATH env var.
Installed component 'nimble'
Warning: Binary 'nimble' is shadowed by '/usr/local/Cellar/nim/0.20.2/nim/bin/nimble'.
Installed component 'nimgrep'
Warning: Binary 'nimgrep' is shadowed by '/usr/local/Cellar/nim/0.20.2/nim/bin/nimgrep'.
Installed component 'nimpretty'
Warning: Binary 'nimpretty' is shadowed by '/usr/local/Cellar/nim/0.20.2/nim/bin/nimpretty'.
Installed component 'nimsuggest'
Warning: Binary 'nimsuggest' is shadowed by '/usr/local/Cellar/nim/0.20.2/nim/bin/nimsuggest'.
Switched to Nim 0.20.2
choosenim-init: ChooseNim installed in /Users/yamada/.nimble/bin
choosenim-init: You must now ensure that the Nimble bin dir is in your PATH.
choosenim-init: Place the following line in the ~/.profile or ~/.bashrc file.
choosenim-init:     export PATH=/Users/yamada/.nimble/bin:$PATH
$ nim
Compiled at 2019-07-18
Copyright (c) 2006-2019 by Andreas Rumpf
::
Command:
compile, c                compile project with default code generator (C)
doc                       generate the documentation for inputfile
Arguments:
arguments are passed to the program being run (if --run option is selected)
Options:
-p, --path:PATH           add path to search paths
-d, --define:SYMBOL(:VAL)
define a conditional symbol
(Optionally: Define the value for that symbol,
see: "compile time define pragmas")
-u, --undef:SYMBOL        undefine a conditional symbol
-f, --forceBuild:on|off   force rebuilding of all modules
--stackTrace:on|off       turn stack tracing on|off
--lineTrace:on|off        turn line tracing on|off
--threads:on|off          turn support for multi-threading on|off
-x, --checks:on|off       turn all runtime checks on|off
-a, --assertions:on|off   turn assertions on|off
--opt:none|speed|size     optimize not at all or for speed|size
Note: use -d:release for a release build!
--debugger:native         Use native debugger (gdb)
--app:console|gui|lib|staticlib
generate a console app|GUI app|DLL|static library
-r, --run                 run the compiled program with given arguments
--fullhelp                show all command line switches
-h, --help                show this help
-v, --version             show detailed version information
Note, single letter options that take an argument require a colon. E.g. -p:PATH.
インタラクティブなNimの環境を作る。
①JupyterLabで使えるそう!
https://gyazo.com/0673420caf3358f19d8ac60136ff6df3
↑これとは違うそう
code:bash
$ brew install zmq
$ pwd
/Users/yamada/refex
にて。
$ cd Inim
$ nimble install
起動
$ jupyter-notebook
<エラー>
code:bash
Error: Build failed for package: jupyternim
... Details:
... Execution failed with exit code 1
... Command: "/usr/local/Cellar/nim/0.20.2/nim/bin/nim" c --noBabelPath --path:"/Users/yamada/.nimble/pkgs/zmq-0.2.2"  --path:"/Users/yamada/.nimble/pkgs/hmac-0.1.9"  --path:"/Users/yamada/.nimble/pkgs/nimSHA2-0.1.1"  --path:"/Users/yamada/.nimble/pkgs/sha1-1.1"  --path:"/Users/yamada/.nimble/pkgs/nimSHA2-0.1.1"  --path:"/Users/yamada/.nimble/pkgs/python3-1.3"  -o:"/Users/yamada/refex/jupyternim/jupyternim" "/Users/yamada/refex/jupyternim/src/jupyternim.nim"
... Output: /Users/yamada/refex/jupyternim/src/jupyternim/messages.nim(39, 23) Warning: Deprecated since v0.20.0; use 'sample' instead; rand is deprecated Deprecated ... /Users/yamada/refex/jupyternim/src/jupyternim/messages.nim(40, 24) Warning: Deprecated since v0.20.0; use 'sample' instead; rand is deprecated Deprecated ... /Users/yamada/refex/jupyternim/src/jupyternim/sockets.nim(198, 30) Error: \u not allowed in character literal
https://gyazo.com/da2bb760ab66b18bc0a19ce4e8e96af3
②atom*nim
プラグイン、nimとlinterをinstall。
https://gyazo.com/5f29eeab1dff5aef4ff946b3693b73ff
できた!
Nimの文法
大石さんのPythonでのコード
code:python
import numpy as np
from numpy.linalg import norm
import csv
default_file_name = '/Users/oec/Documents/Jupyter/aoe/T-TPM.txt'
output_file_name = 'data/result_final.txt'
def get_index():
with open(default_file_name, 'rb') as input_f:
names = [row.split()0.decode('utf-8') for row in input_f] return names
def get_col_len():
with open(default_file_name, 'rb') as input_f:
for i, row in enumerate(input_f):
sample_list = row.split()
if i >= 0:
break
return sample_list
def load_data():
cols = len(get_col_len())
# ヘッダを飛ばし&idカラムの次の列から取得
data = np.loadtxt(default_file_name, delimiter='\t', skiprows=1, usecols=range(1, cols))
return data
def calc():
data = load_data()
header = get_index()
f = open(output_file_name, 'w')
writer = csv.writer(f, delimiter='\t', lineterminator='\n')
writer.writerow(header)
for i in range(len(data)):
v = []
for j in range(len(data)):
n1 = norm(v1)
n2 = norm(v2)
s = np.dot(v1, v2) / (n1 * n2)
v.append(s)
writer.writerow(v)
f.close()
calc()
input
https://gyazo.com/56f6584690c56ec5eeb133db07fe91c7
横:サンプル
縦:gene ID(NCBI)
サンプル600くらい×gene id 数万のmatrix。
・ダウンロード
https://gyazo.com/ad13807058dba80b6a832bef2fe8b611
やりたいこと
アプリケーション仕様 
1.Sample x 遺伝子の発現量データファイル(TPMを想定)を入力データとし、遺伝子間の類似度、Sample間の類似度を算出し、TSVファイルとして出力する。
2.開発言語はNIMとする
3.(できるなら)遺伝子またはサンプルIDを入力し、類似度ランキングを返すAPI(sqlite3想定、そのあたりは相談しながら)も試作する
TPM は transcripts per million。
TPM は、サンプル中に全転写産物が 100 万個存在するときに、各転写産物に何個あたりの転写産物が存在するのかを表す値。(TPM) 大石さんのPythonで動かしてみる。
code:bash
$ time python luigi/python.py data/RefEx_expression_CAGE_all_human_PRJDB3010.tsv out/test.txt
上のは、カラムとインデックスが逆ver。
code:bash
$ time python luigi/python.py data/T-TPM-100.txt out/test_100.txt
real	0m3.455s
user	0m2.885s
sys	0m0.338s
$ time python luigi/python.py data/T-TPM.txt out/test.txt
real	89m6.753s
user	88m31.211s
sys	0m22.279s