pVectorSearch
This Scrapbox-centric project on [vector search


New Feature Proposal
Pass the Scrapbox page name in the URL fragment with GET
On the Scrapbox side, create a UserScript that opens in a new tab.
The receiver hits the API with client JS, loads it into a query, and performs a search.
Now you can have a "Vector Search Related Pages" from any public project.
It's annoying to have the whole content in the search history.

I'd like to search only the diagrams from the page with my hand-drawn diagrams.
I'm starting to think that excluding hits on the same page is a better default.

New UI proposal
+ is all and - is none.
2023/6/26 No, I don't understand what you mean if you don't explain, just write ALL and NONE.

2023/6/20
Searching from my phone while on a business trip resulted in an error.
I couldn't deal with it on the spot, so I just checked.
The problem is not reproducible.
It's a free version of Vercel, so the log seems to be up to an hour, I don't know what the error was anymore.
It seems to be working fine for now, so I'll pass this time.
I guess in terms of Next Action, I'd introduce Sentry.

New Feature Proposal
Allows "load it and prompt execution" from search results
digest
roleplay (e.g. in computer RPG games)

2023-06-21
client side
Unexpected token 'A', "An error o"... is not valid JSON
Failed to load resource: the server responded with a status of 504 ()
server side:
[POST] /api/search
Execution Duration / Limit 10.01s / 10s (timed out)
It's simply a timeout.
I guess the next action would be to improve the error message on timeouts or retry.

2023/6/26
I could see a future where I included my own private sources and then inadvertently shared the search results that included them.
If it contains non-public sources, let's keep the share button out.

2023/6/27
Create a mechanism to automatically update the system, assuming you have export privileges.
Fork the current source and create a /omoikanespecialized version
create anew
The source code is scattered all over the place because of all the experimenting and piling up.
I'm sorry, but I'm terrible.
Sometimes they go missing if you don't organize them.
Duplicate export code from etude-github-actions repository
From the qdrant repository
I didn't pip freeze.
$ pip install -r requirements.txt
make_index_from_scrapbox.py
I was able to embed it.
Git LFS to save cache
from_pickle_to_qdrant.py
I could send it to qdrant.
I named the collection omoikane.
I was a little unsure whether or not to put it together with the other cross-search, but I thought it would be simpler to talk about it with fewer dependencies if I kept it separate.
Around UI from nishio-vecsearch
Simplify by cutting down on unnecessary functions.
deploy
Oh, shoot, I should have separated the part where data is plugged in from the server implementation.
Deployment runs in Github Action.
Vercel is confused by the TS for export to begin with.
oh no (used as an expression of despair or when giving up)
Should we be honest and separate them?
omoikane-embed to insert vectors and omoikane-vecsearch to search
I've got it.

2023-06-28
Omoikane Embed, I'm getting a lot of duplicate hits from the same page after adding the ability to chop by 100 tokens.
Easy to undo, but no fun.
Or make sure to take only one case from the same page.
But I forgot to commit the code for the chopping one, so it went back on its own.

---
This page is auto-translated from /nishio/pVectorSearch using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.