embedding
Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. なんらかの概念を数値の並びで表したもの
概念の間の関係性(類似度)をコンピュータで扱えるようになる 似ているものは似ている数値の並びになり、似ていないものはそうでない並びになるようにモデルを作っているはず基素.icon
こういうモデルをどう作るのか?
An embedding is a vector (list) of floating point numbers
The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness. 利用例
Search (where results are ranked by relevance to a query string) Clustering (where text strings are grouped by similarity)
Recommendations (where items with related text strings are recommended)
Anomaly detection (where outliers with little relatedness are identified)
Diversity measurement (where similarity distributions are analyzed) Classification (where text strings are classified by their most similar label)
テキストをこのモデルに渡すと次のような表現が返ってくる
code:json
"embedding": [
-0.006929283495992422,
-0.005336422007530928,
...
-4.547132266452536e-05,
-0.024047505110502243
],
modelによってサイズは異なるが高次元のベクトルになる
このようなデータを二次元平面に可視化するためにt-SNEを使う