AI CUDA Engineer

https://sakana.ai/ai-cuda-engineer/The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

https://pub.sakana.ai/static/paper.pdf

https://pub.sakana.ai/ai-cuda-engineer/leaderboardAI CUDA Engineer - Kernel Leaderboard 🏆

https://huggingface.co/datasets/SakanaAI/AI-CUDA-Engineer-ArchiveSakanaAI/AI-CUDA-Engineer-Archive

2025/2/22

Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system had found a memory exploit in the evaluation code which, in a number of cases, allowed it to avoid checking for correctness. Furthermore, we find the system could also find other novel exploits in the benchmark’s tasks.

We have since made the evaluation and runtime profiling harness more robust to eliminate many of such loopholes. We are in the process of revising our paper, and our results, to reflect and discuss the effects, and mitigation of LLM reward hacking for CUDA kernel optimization.

We deeply apologize for our oversight to our readers. We will provide a revision of this work soon, and discuss our learnings.

2025/2/21

シェイン・グウ(@shanegJP)

AI-Scientistも色々言われてたがAI-CUDA-Engineerは酷い。DeepSeekと違いただ他社のAPIを叩くいわゆるプロンプト論文。なのに直感的なテストすらせず大大と発表。基礎的な評価や検証すらできないとSakanaにプロダクト開発は期待できなさそう。研究の100倍注意かけないとプロダクトには至らない。

https://gyazo.com/ef9b40c623cfc3395d24a72f4509c9c5 https://gyazo.com/380659f5f526708f8360fa322f431180

> 夢アカデミー(@yumeacademy)

> Sakana AI(が賭けるswarmで？GPUプログラミング効率化する？)CUDA Engineerの論文リリース…

> 僅か1時間後に本職Nvidiaエンジニアから間違いだらけと指摘

> 砂状の楼閣作りではなくDeepSeekを見習ってとことん本質を探求していきましょ！

> ※この論文(SakanaAI?)海外の生成AI開発者達からフルボッコなう

> https://gyazo.com/6aa08abeb12801a479e061c6c9c8922f

https://wirelesswire.jp/2025/02/88134/やはりSakanaは釣りだった!?Sakana.aiが発表した論文が海外のAI研究者コミュニティで炎上 – WirelessWire News

𝕜(@kyo_takano)

おそらく本件は

「計算グラフの等価性を出力の等価性によって代理的に評価していたところ、CUDA kernelの予期せぬ挙動で直前の計算結果がすり抜けてしまった」

といったところなので、*事後的にのみ* 検出可能な不備であり、決して意図したものではないと思うよ。

そりゃ発見するの難しいわwogikaze.icon

> 𝕜(@kyo_takano)

> なんだこれ... ミスを犯したからと言って「ジョーク論文」だの「悪質な嘘」だの言っていいわけが無いだろ。そもそも事実誤認が酷すぎる…

> https://gyazo.com/26d41125a1249c35accc286d5d9675fb https://gyazo.com/3a683ad74e9caa0014afd1194d9b6c38

炎上してるw

チェック漏れのまま公開してるので出資してる投資家からは怒られてもしょうがない案件…だが大抵放火するのは金も出してなければ知りもしない第三者という謎morisoba65536.icon

https://note.com/o_ob/n/n2fcc4e927d5aSakana AI の間違いを徹底的に査読してみた (Colabコード付き)

このくらい検証して突っつくのは妥当な範囲に感じる

https://www.itmedia.co.jp/aiplus/articles/2502/20/news128.htmlAIを“AIで”改善 Sakana AIが新技術「AI CUDA Engineer」発表目指すは100万倍の効率化 - ITmedia AI＋

https://gyazo.com/a58a054e4677669ecb2432177c1c2dbb

またAI CUDA Engineerは、一般的な機械学習において、PyTorchネイティブとコンパイルされたPyTorchコードよりも、10～100倍高速化できるCUDAカーネルを安定して発見したという。これにより、機械学習アーキテクチャ全体を最適化したCUDAカーネルに変換することも可能。この結果は、GPUカーネルの性能評価指標「KernelBench」で最高水準の成果を記録した。

The AI Scientist

#sakana.ai