QuicktourのPost-processing
We might want our tokenizer to automatically add special tokens, like "[CLS]" or "[SEP]".To do this, we use a post-processor.
TemplateProcessing is the most commonly used, you just have to specify a template for the processing of single sentences and pairs of sentences, along with the special tokens and their IDs.
Here is how we can set the post-processing to give us the traditional BERT inputs:
BERTの場合のTemplateProcessingの設定例を紹介
If you save your tokenizer with Tokenizer.save, the post-processor will be saved along.
「Tokenizer.saveで保存すると、post-processorも一緒に保存される」