transformers.BertConfig
https://huggingface.co/docs/transformers/v4.18.0/en/model_doc/bert#transformers.BertConfig
BertConfigは #transformers.PretrainedConfig を継承
https://github.com/huggingface/transformers/blob/v4.18.0/src/transformers/models/bert/configuration_bert.py#L54
vocab_size (int, optional, defaults to 30522) — Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling BertModel or TFBertModel.
num_hidden_layers (int, optional, defaults to 12) — Number of hidden layers in the Transformer encoder.
num_attention_heads (int, optional, defaults to 12) — Number of attention heads for each attention layer in the Transformer encoder.
max_position_embeddings (int, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (int, optional, defaults to 2) — The vocabulary size of the token_type_ids passed when calling BertModel or TFBertModel.