DF - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

Document Frequency

Affected by [Document Granularity

As an extreme example, if we take one word and one document, we can match the TF

Often set to "1" if it appears more than once.

For concentration (of one's attention), also use the value "if it appears two or more times

I mean, you're multiplying a step function.

The number of times is used as the threshold, a value that naturally tends to increase as the number of words in the document increases

Wouldn't it be better to divide by the number of words to get the probability of occurrence...

---

This page is auto-translated from /nishio/DF using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.