ConstantLengthDataset
Iterable dataset that returns constant length chunks of tokens from stream of text files.
The dataset also formats the text before tokenization with a specific format that is provided by the user.
seq_length (int, optional, defaults to 1024) — Length of token sequences to return.
num_of_sequences (int, optional, defaults to 1024) — Number of token sequences to keep in buffer.
self.datasetに持つ