CLIP - 西尾泰和の外部脳

CLIP

This is the Image & Text model CLIP, which maps text and images to a shared vector space. For applications of the models

49408

BOS=49406

EOS=49407

code:python

>> clip.tokenize("a painting of a cat")

tensor([[49406, 320, 3086, 539, 320, 2368, 49407, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0]], dtype=torch.int32)

subwords

code:python

>> clip.tokenize("bozuman")

tensor([[49406, 647, 4091, 786, 49407, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0]], dtype=torch.int32)