transformers.PretrainedConfig

https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/configuration#transformers.PretrainedConfig

Common attributes present in all config classes are: hidden_size, num_attention_heads, and num_hidden_layers. Text models further implement: vocab_size.

num_attention_heads (int) — The number of attention heads used in the multi-head attention layers of the model.

num_hidden_layers (int) — The number of blocks in the model.

vocab_size (int) — The number of tokens in the vocabulary, which is also the first dimension of the embeddings matrix (this attribute may be missing for models that don’t have a text modality like ViT).