multilingual-e5-small
https://huggingface.co/intfloat/multilingual-e5-small
#E5
This model has 12 layers and the embedding size is 384.
This model is initialized from microsoft/Multilingual-MiniLM-L12-H384 and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation.
For all labeled datasets, we only use its training set for fine-tuning.
1. Do I need to add the prefix "query: " and "passage: " to input texts?
Yes, this is how the model is trained, otherwise you will see a performance degradation.
積ん読
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Tensorflow.js ポート版 https://huggingface.co/Xenova/multilingual-e5-small