tokenizers.trainers.BpeTrainer
Trainer capable of training a BPE model
Parameters(いずれもoptional)
special_tokens (List[Union[str, AddedToken]], optional) — A list of special tokens the model should know of.
vocab_size (int, optional) — The size of the final vocabulary, including all tokens and alphabet.
quicktourによると 30_000
min_frequency (int, optional) — The minimum frequency a pair should have in order to be merged.
quicktourによると 0
show_progress (bool, optional) — Whether to show progress bars while training.
initial_alphabet (List[str], optional) — A list of characters to include in the initial alphabet, even if not seen in the training dataset. If the strings contain more than one character, only the first one is kept.