AgentOps - yuyan

AgentOps

マルチエージェントシステム

AIエージェント

GUIエージェント(Computer use)

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

https://arxiv.org/abs/2410.07095

https://openai.com/index/mle-bench/

How Rexera’s AI agents drive quality control with LangGraph

https://blog.langchain.dev/customers-rexera/

SREが投資するAIOps ~ペアーズにおけるLLM for Developerへの取り組み~

https://speakerdeck.com/takumiogawa/sregatou-zi-suruaiops-peazuniokerullm-for-developerhenoqu-rizu-mi?slide=4

AIエージェントを実運用に乗せるステップはこんな感じじゃなかろうか

https://speakerdeck.com/naotoota/aiezientowoshi-yun-yong-nicheng-serusutetupuhakonnagan-ziziyanakarouka

The Shift from Models to Compound AI Systems

https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/

Weaveを用いた生成AIアプリケーションの評価_モニタリンングと実践例.pdf

https://speakerdeck.com/olachinkei/weavewoyong-itasheng-cheng-aiahurikesiyonnoping-jia-monitarinnkutoshi-jian-li

Announcing the OWASP LLM and Gen AI Security Project Initiative for Securing Agentic Applications

https://genai.owasp.org/2024/12/15/announcing-the-owasp-llm-and-gen-ai-security-project-initiative-for-securing-agentic-applications/

Agent-SafetyBench: Evaluating the Safety of LLM Agents

https://arxiv.org/abs/2412.14470

2025年の年始に読み直したいAIエージェントの設計原則とか実装パターン集

https://zenn.dev/r_kaga/articles/e0c096d03b5781

Building Agentic Workflows with Inngest

https://weaviate.io/blog/inngest-ai-workflows

「完全自律型」AIエージェント至高論への違和感〜ワークフロー構築という現実解

https://zenn.dev/pharmax/articles/d1d3695e4114c0

Why AI agents fail in production—and how to fix it

https://hugobowne.substack.com/p/why-ai-agents-fail-in-productionand

「AIエージェントキャッチアップ #18 - Guardrails AI」を開催しました

https://blog.generative-agents.co.jp/entry/2025/01/21/152659

Secure a generative AI assistant with OWASP Top 10 mitigation

https://aws.amazon.com/jp/blogs/machine-learning/secure-a-generative-ai-assistant-with-owasp-top-10-mitigation/

Choosing the Right AI Agent Framework: LangGraph vs CrewAI vs OpenAI Swarm

https://www.relari.ai/blog/ai-agent-framework-comparison-langgraph-crewai-openai-swarm

Magma: A Foundation Model for Multimodal AI Agents

https://arxiv.org/abs/2502.13130

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

https://arxiv.org/abs/2502.11271

A-MEM: A Novel Agentic Memory System for LLM Agents that Enables Dynamic Memory Structuring without Relying on Static, Predetermined Memory Operations

https://www.marktechpost.com/2025/03/01/a-mem-a-novel-agentic-memory-system-for-llm-agents-that-enables-dynamic-memory-structuring-without-relying-on-static-predetermined-memory-operations/

https://arxiv.org/abs/2502.12110v1

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

https://arxiv.org/abs/2502.08691

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

https://arxiv.org/abs/2503.08026

AIエージェント開発におけるオブザーバビリティ、W&Bミートアップ #21

https://wandb.connpass.com/event/355002/presentation/?utm_campaign=new_event_links_to_group_member&utm_source=notifications&utm_medium=email&utm_content=detail_btn

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

https://arxiv.org/abs/2506.04133