Step-Video-T2V - work4ai

Step-Video-T2V

https://github.com/stepfun-ai/Step-Video-T2Vstepfun-ai/Step-Video-T2V

https://huggingface.co/stepfun-ai/stepvideo-t2vstepfun-ai/stepvideo-t2v

https://huggingface.co/stepfun-ai/stepvideo-t2v-turbostepfun-ai/stepvideo-t2v-turbo

https://arxiv.org/abs/2502.10248Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

https://yuewen.cn/videos跃问

300億のパラメータ・最大204フレーム/544×992ピクセルのビデオを生成する動画生成モデル

RGB動画を16×16の空間圧縮および8倍の時間圧縮

Bilingual Text Encoder

Hunyuan-CLIP × Step-LLM

動画ベースのDPO

30B(FP16/BF16で≒VRAM60GB)は流石にワロタ(FP8でも≒30GB相当なので笑うしかできない)morisoba65536.icon