Agents for Software Development - nikkie-memos

Agents for Software Development

https://nlp-colloquium-jp.github.io//schedule/2025-04-02_graham-neubig/

#NLPコロキウム #グラム・ニュービッグ #OpenHands

https://docs.google.com/presentation/d/1rzXIAdY8HlBuPyp-VVRPGmbySEz_EHdG2pkyD8rnQrs/edit?usp=sharing

開発者は何に時間を使っているのか？

Today was a Good Day: The Daily Life of Software Developers

コードを書く以外にもタスクがある（コミュニケーションが3分の1）

OpenHands: Open-source AI Software Development Agents

エージェントと環境のインタフェース

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Figure 2

2.1 event stream

数少なく強力なツール (2.1 Actions)

cmd

ipython

BrowserGym（WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?）

BrowseURLAction

BrowseInteractiveAction ブラウザ操作

コード編集

StrReplaceAction

引数でview, replace, undoなどサポート

ファイル特定

3つの方法

Agentless: Demystifying LLM-based Software Engineering Agents

ソフトウェアエージェントの評価

テスト環境

能力を個別に

エンドツーエンド

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

WebArena: A Realistic Web Environment for Building Autonomous Agents

Webサイトをローカルで立ち上げる

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

SWE-bench Verified

60%のissueを閉じられる

ただしOSSは言語モデルの訓練に使われている

それぞれの言語モデルの実力は？ (slide=24)

claude+o1+critic 61%

OpenHands LM（Introducing OpenHands LM 32B -- A Strong, Open Coding Agent Model）

どのように改善するか

Training Software Engineering Agents and Verifiers with SWE-Gym

その他のトピック

CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

Interactive Agents to Overcome Ambiguity in Software Engineering

Agent Workflow Memory

使えば使うほどエージェントがよくなってほしい

安全な使用

https://youtu.be/VU6Qy-7-2HI?si=wkrt-TH3FzwxsUWW