AgentOps
LLMOps
MLOps
DataOps
マルチエージェントシステム
AIエージェント
GUIエージェント(Computer use)
LLMの評価
AI safety
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
https://arxiv.org/abs/2410.07095
https://openai.com/index/mle-bench/
How Rexera’s AI agents drive quality control with LangGraph
https://blog.langchain.dev/customers-rexera/
SREが投資するAIOps ~ペアーズにおけるLLM for Developerへの取り組み~
https://speakerdeck.com/takumiogawa/sregatou-zi-suruaiops-peazuniokerullm-for-developerhenoqu-rizu-mi?slide=4
AIエージェントを実運用に乗せるステップはこんな感じじゃなかろうか
https://speakerdeck.com/naotoota/aiezientowoshi-yun-yong-nicheng-serusutetupuhakonnagan-ziziyanakarouka
The Shift from Models to Compound AI Systems
https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/
Weaveを用いた生成AIアプリケーションの評価_モニタリンングと実践例.pdf
https://speakerdeck.com/olachinkei/weavewoyong-itasheng-cheng-aiahurikesiyonnoping-jia-monitarinnkutoshi-jian-li
Announcing the OWASP LLM and Gen AI Security Project Initiative for Securing Agentic Applications
https://genai.owasp.org/2024/12/15/announcing-the-owasp-llm-and-gen-ai-security-project-initiative-for-securing-agentic-applications/
Agent-SafetyBench: Evaluating the Safety of LLM Agents
https://arxiv.org/abs/2412.14470
2025年の年始に読み直したいAIエージェントの設計原則とか実装パターン集
https://zenn.dev/r_kaga/articles/e0c096d03b5781
Building Agentic Workflows with Inngest
https://weaviate.io/blog/inngest-ai-workflows
「完全自律型」AIエージェント至高論への違和感〜ワークフロー構築という現実解
https://zenn.dev/pharmax/articles/d1d3695e4114c0
Why AI agents fail in production—and how to fix it
https://hugobowne.substack.com/p/why-ai-agents-fail-in-productionand
「AIエージェントキャッチアップ #18 - Guardrails AI」を開催しました
https://blog.generative-agents.co.jp/entry/2025/01/21/152659
Secure a generative AI assistant with OWASP Top 10 mitigation
https://aws.amazon.com/jp/blogs/machine-learning/secure-a-generative-ai-assistant-with-owasp-top-10-mitigation/
Choosing the Right AI Agent Framework: LangGraph vs CrewAI vs OpenAI Swarm
https://www.relari.ai/blog/ai-agent-framework-comparison-langgraph-crewai-openai-swarm
Magma: A Foundation Model for Multimodal AI Agents
https://arxiv.org/abs/2502.13130
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
https://arxiv.org/abs/2502.11271
A-MEM: A Novel Agentic Memory System for LLM Agents that Enables Dynamic Memory Structuring without Relying on Static, Predetermined Memory Operations
https://www.marktechpost.com/2025/03/01/a-mem-a-novel-agentic-memory-system-for-llm-agents-that-enables-dynamic-memory-structuring-without-relying-on-static-predetermined-memory-operations/
https://arxiv.org/abs/2502.12110v1
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society
https://arxiv.org/abs/2502.08691
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents
https://arxiv.org/abs/2503.08026
AIエージェント開発におけるオブザーバビリティ、W&Bミートアップ #21
https://wandb.connpass.com/event/355002/presentation/?utm_campaign=new_event_links_to_group_member&utm_source=notifications&utm_medium=email&utm_content=detail_btn
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems
https://arxiv.org/abs/2506.04133