qdrant
Qdrant - Vector Database
A collection is a named set of points (vectors with a payload) among which you can search. Vectors within the same collection must have the same dimensionality and be compared by a single metric.
One of the significant features of Qdrant is the ability to store additional information along with vectors. This information is called payload in Qdrant terminology.
Qdrant allows you to store any information that can be represented using JSON.
If you load the project name into the payload, you could search only or cross-search from a specific project.
ID is a 64-bit integer, determined by the PUTting party
Whether to make it in terms of UUID or some other mechanism such as sequential numbers...
If you PUT the same ID, it will be overwritten.
Can specify conditions for payloads and retrieve all IDs that satisfy the conditions
You could use a similarity score to cut off the footprint, but since you don't know what the appropriate threshold is, visualization of similarity is more likely to be better.
"Match Any"
code:json
{
"key": "project",
"match": {
}
}
Full-text search matches are also available.
code:json
{
"key": "description",
"match": {
"text": "good cheap"
}
}
If no index is created, the search is for substrings contained within
Vector Storage: In-memmory / memmap
Payload Storage: InMemory / OnDisk
If the payload is large, it is not practical to put it in memory.
1GB RAM on a minimal plan, and this Scrapbox JSON is 32MB, so we should just not worry about it and go on-memory for now!
If you try to put 1,000 cut and scanned books in it, it just barely overflows.
Well, we're going to start small first.
Full text index tokenizer, from the description I doubt it will work properly for Japanese.
I'll give it a try.
I don't see any particular problem with a partial string match hit.
Indexing Mechanism
Qdrant currently only uses HNSW as a vector index. HNSW (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm.
https://gyazo.com/66730fadc2491f94187fc5054f38d934
https://gyazo.com/7fa11a66d9efe8c845339a477292308f
https://gyazo.com/edcbeb002bc6bb08531c9ecea269f7f6
...
https://gyazo.com/07bee77dd85fdd095bfb55421382d965
---
This page is auto-translated from /nishio/qdrant using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.