transformers.PretrainedConfig
Common attributes present in all config classes are: hidden_size, num_attention_heads, and num_hidden_layers. Text models further implement: vocab_size.
num_attention_heads (int) — The number of attention heads used in the multi-head attention layers of the model.
num_hidden_layers (int) — The number of blocks in the model.
vocab_size (int) — The number of tokens in the vocabulary, which is also the first dimension of the embeddings matrix (this attribute may be missing for models that don’t have a text modality like ViT).