Building effective agents
https://www.anthropic.com/engineering/building-effective-agents
#Engineering_at_Anthropic
Consistently, the most successful implementations use simple, composable patterns rather than complex frameworks.
In this post, we share what we’ve learned from working with our customers and building agents ourselves, and give practical advice for developers on building effective agents.
What are agents?
At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:
Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
「事前に定義されたコードのパスを通してLLMとtoolが組織されるシステム」
Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
「どのようにタスクを達成するかについて制御を維持しながら、LLMが自身のプロセスとツールの使用を動的に指揮するシステム」
When (and when not) to use agents
When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed.
Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.
When more complexity is warranted, workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale.
For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough.
When and how to use frameworks
These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together.
However, they often create extra layers of abstraction that can obscure the underlying prompts ​​and responses, making them harder to debug.
LangGraph
Amazon Bedrock's AI Agent framework
Rivet
Vellum
We suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code.
If you do use a framework, ensure you understand the underlying code.
参考実装 Building Effective Agents Cookbook
Building blocks, workflows, and agents
Combining and customizing these patterns
Summary
It(Success in the LLM space)'s about building the right system for your needs.
Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short.
3つのcore principles
1. Maintain simplicity in your agent's design.
2. Prioritize transparency by explicitly showing the agent’s planning steps.
3. Carefully craft your agent-computer interface (ACI) through thorough tool documentation and testing.
Appendix 1: Agents in practice
agentic systemが価値を提供する2つのドメイン
A. Customer support
B. Coding agents
Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet
Appendix 2: Prompt engineering your tools
No matter which agentic system you're building, tools will likely be an important part of your agent.
Claude can now use tools
There are often several ways to specify the same action.
ファイル編集
diffを書く
何行目が変わるかを知らないといけない
ファイル全体をリライト
構造化出力
マークダウンの中にコードを書く
JSONの中にコードを書く
余計なエスケープ
Our suggestions for deciding on tool formats are the following:
Give the model enough tokens to "think" before it writes itself into a corner.
「書く前に"考える"ために十分なトークンを与える」
Keep the format close to what the model has seen naturally occurring in text on the internet.
「インターネットのテキストで自然に目にするようなフォーマットに近くする」
Make sure there's no formatting "overhead" such as having to keep an accurate count of thousands of lines of code, or string-escaping any code it writes.
「フォーマットのoverheadがない」
One rule of thumb is to think about how much effort goes into human-computer interfaces (HCI), and plan to invest just as much effort in creating good agent-computer interfaces (ACI).
Is it obvious how to use this tool, based on the description and parameters, or would you need to think carefully about it?
モデルの視点から考える
How can you change parameter names or descriptions to make things more obvious?
Test how the model uses your tools: Run many example inputs in our workbench to see what mistakes the model makes, and iterate.
https://ja.wikipedia.org/wiki/ポカヨケ
While building our agent for SWE-bench, we actually spent more time optimizing our tools than the overall prompt.