vector storeを探す

vector store

観点

安さ

自前でホストできるか、クラウドのみか

理屈

ベクトルの内積計算だけでいいから、なんなら普通のDBよりも安く済んでほしい

ここは実装技術が上がれば自然と価格競争が起きるのでは？

たとえばベクトルの少数第1か2位くらいで分割配置するだけで、全探索せずに類似度が高いベクトルをみつけられる

メモリ等が節約できそう

サーバーレスもできるんじゃないの

一旦は安めのインスタンス立てて自分でホストすれば良いのかな

chromaのコードの行数が少ないのを見て、やっぱりそうなのではという気がしてきた

chromaが良さそうなのでこれで素振りをやってみたいmiyamonz.icon

コードの行数対して多くないし、素朴なことしかやってなさそう。良いのでは

langchain.js

https://js.langchain.com/docs/modules/indexes/vector_stores/

Chroma

Chroma is an open-source Apache 2.0 embedding database.

https://github.com/chroma-core/chroma

https://docs.trychroma.com/deployment

This template uses a t3.small EC2 instance, which costs about two cents an hour, or $15 for a full month. If you follow these instructions, AWS will bill you accordingly.

新しめ。５ヶ月前

OSSで良さげだけど、メモリで動くだけかな？

クソデカドキュメントとかどうやって入れればいいんだろ

on-disk databaseでもできそうだ

HNSWLib

HNSWLib is an in-memory vectorstore that can be saved to a file. It uses HNSWLib.

https://github.com/nmslib/hnswlib

Pinecone

https://www.pinecone.io/

https://docs.pinecone.io/docs/node-client

From $0.096/hour

高くね？月で69$

@Exploringfornow: Just paid $1,000 for a month on @pinecone with no real usage. I'll give Pinecone benefit of the doubt to say it's a poorly designed pricing page. Nonetheless beware of predatory pricing strategies, esp if you're new to building generative AI products.

Pinecone's pricing page:

1)…

https://pbs.twimg.com/media/Fsu2ZNvWcAY8ycJ.png

怖いね

Supabase

postgres

pgvector postgres extension

https://supabase.com/blog/openai-embeddings-postgres-vector

python langchain

https://python.langchain.com/en/latest/modules/indexes/vectorstores.html

AtlasDB

https://atlas.nomic.ai/