pVectorSearch2024-04-02
Vector search for [/plurality-japanese
prev
reading
$ pip install -r requirements.txt
ModuleNotFoundError: No module named 'distutils'
Ensure distutils is Installed: distutils is included with the standard library for Python versions prior to 3.10. For Python 3.10 and later, distutils has been deprecated and is not included by default. If you're using Python 3.10 or later, consider using setuptools instead for package management and distribution.
Ha, I see.
When I made it, it was 3.10, now it's 3.12.
It worked with various modifications.
The openai library itself has a different interface in 1.0.
I also reduced it to omoikane-embed-core.
code::
% python make_vecs_from_json/main.py
processing 769 pages
total tasks: 7470, 0.0% was cached
processing 7470 tasks in 150 batches
https://gyazo.com/da4c33d3103a54406c134ebe32bb64be
upload
code::
% python upload_vecs/main.py
uploading plurality-japanese.pickle
OK
before/after
https://gyazo.com/dcb1513199fa74784453c96692aec52ahttps://gyazo.com/f1a444a10303d546657536e1847c8c99
Experiment with blocksize=100
Developing views in parallel while waiting for results
result
code::
% python make_vecs_from_json/main.py
processing 769 pages
total tasks: 19866, 13.4% was cached
processing 17205 tasks in 345 batches
% python upload_vecs/main.py
uploading plurality-japanese.pickle
OK
https://gyazo.com/981c5c414d3b13486c9157782bbd9554
https://gyazo.com/662cc912e1027805fdf3f25e9d8f09d2
About $0.36 for a smaller chunk run.
view
% npm install
I did audit fix --force and returned it to omoikane-vecsearch.
% npm run dev
and make sure you can search properly locally.
% git remote rename origin upstream
% git branch -M main
% git push -u origin main
Open the Vercel dashboard
https://gyazo.com/1f5d0e71feb72f20c23be48ea2cf03d2
I was able to build and deploy, but I don't see the search target project set up.
I think it's supposed to be put in the Vercel environment variable.
https://gyazo.com/0193ff3be6131ed5d5346259555e3215
before / after
https://gyazo.com/ccba715dbfd382f9a0e661d556b07a09https://gyazo.com/7580efcc7f532c7bd99c46670ac371a3
after
https://gyazo.com/952990172f413224ca7e7f329219eac6
hmm
Well, we can improve this place later.
Release!
What we did today from [/plurality-japanese/vector-search-improvement
Create a separate service with only ✅ Japanese language
Chunk Improvements
✅Include ✅chunks of 100 tokens as well as the 500 tokens that have been used so far.
Only ✅1 chunk of hits from 1 page
Adding Data
✅1: First, this Scrapbox
"Vector Search" and other matches are also found.
2024-04-04
Fix issue with GitHub Actions not working.
build
code::
The conflict is caused by:
The user requested protobuf==5.26.1
grpcio-tools 1.62.1 depends on protobuf<5.0dev and >=4.21.6
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
https://gyazo.com/7578bc6c498598fb16415cbf4aa82bd3
https://gyazo.com/a8c2b43a31c515aa6007d0e09b9af411https://gyazo.com/617b9542c44a057e07b0981ac36192c6
https://gyazo.com/c97e93de76deefbf8539d2409d7703f7
https://gyazo.com/dc258ece7db18a91955ce376400f8321
---
This page is auto-translated from /nishio/pVectorSearch2024-04-02 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.