QuicktourのPost-processing

https://huggingface.co/docs/tokenizers/quicktour#postprocessing

We might want our tokenizer to automatically add special tokens, like "[CLS]" or "[SEP]".To do this, we use a post-processor.

TemplateProcessing is the most commonly used, you just have to specify a template for the processing of single sentences and pairs of sentences, along with the special tokens and their IDs.

tokenizers.processors.TemplateProcessing

Here is how we can set the post-processing to give us the traditional BERT inputs:

BERTの場合のTemplateProcessingの設定例を紹介

If you save your tokenizer with Tokenizer.save, the post-processor will be saved along.

「Tokenizer.saveで保存すると、post-processorも一緒に保存される」