multilingual-e5-small

https://huggingface.co/intfloat/multilingual-e5-small

This model has 12 layers and the embedding size is 384.

This model is initialized from microsoft/Multilingual-MiniLM-L12-H384 and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation.

For all labeled datasets, we only use its training set for fine-tuning.

1. Do I need to add the prefix "query: " and "passage: " to input texts?

Yes, this is how the model is trained, otherwise you will see a performance degradation.

積ん読

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Tensorflow.js ポート版 https://huggingface.co/Xenova/multilingual-e5-small