AgentOps
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
How Rexera’s AI agents drive quality control with LangGraph
SREが投資するAIOps ~ペアーズにおけるLLM for Developerへの取り組み~
AIエージェントを実運用に乗せるステップはこんな感じじゃなかろうか
The Shift from Models to Compound AI Systems
Weaveを用いた生成AIアプリケーションの評価_モニタリンングと実践例.pdf
Announcing the OWASP LLM and Gen AI Security Project Initiative for Securing Agentic Applications
Agent-SafetyBench: Evaluating the Safety of LLM Agents
2025年の年始に読み直したいAIエージェントの設計原則とか実装パターン集
Building Agentic Workflows with Inngest
「完全自律型」AIエージェント至高論への違和感〜ワークフロー構築という現実解
Why AI agents fail in production—and how to fix it
「AIエージェントキャッチアップ #18 - Guardrails AI」を開催しました Secure a generative AI assistant with OWASP Top 10 mitigation
Choosing the Right AI Agent Framework: LangGraph vs CrewAI vs OpenAI Swarm
Magma: A Foundation Model for Multimodal AI Agents
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
A-MEM: A Novel Agentic Memory System for LLM Agents that Enables Dynamic Memory Structuring without Relying on Static, Predetermined Memory Operations
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents
AIエージェント開発におけるオブザーバビリティ、W&Bミートアップ #21 TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems