Financial Agent → General Agent · Paper-Level Technical Landscape¶

在 financial_agent_to_general_agent_technical_landscape.md 的 6 层 stack 上，每个关键 paper 给出 author / year / venue / core idea / 具体例子 / 实现链接。

共 ~70 篇 paper，每篇标准化条目格式。

Maintained by Paul Weng · 2026-05-19

Paper 条目格式¶

**[Title]** · Author1, Author2, ..., Year
- Venue: NeurIPS / ICML / Nature / arXiv:xxxx.xxxxx
- Core idea: 2-3 sentence 技术核心
- Example: 具体能 work 的 concrete case
- Code: GitHub / 官方实现链接
- L-anchor: 这 paper 在 landscape 哪一层 + 为什么重要

L0 · Financial Agents¶

L0.1 Customer-Facing Financial Agents¶

BloombergGPT: A Large Language Model for Finance · Shijie Wu, Ozan Irsoy, Steven Lu, et al. (Bloomberg L.P., 2023) - Venue: arXiv:2303.17564 - Core idea: 50B parameter LLM 用 363B token 金融语料 + 345B token 通用语料 pretrain。证明 domain-specific pretraining 在金融 NLP 任务（sentiment / NER / QA）上能显著 beat 通用 LLM。 - Example: 在 ConvFinQA 数据集上从 GPT-3 的 33% 提到 43%，专门用于 Bloomberg Terminal 内部辅助。 - Code: 未开源（Bloomberg 内部） - L-anchor: L0.1 finance-domain LLM 起点，证明 domain pretraining 的 value

FinGPT: Open-Source Financial Large Language Models · Hongyang Yang, Xiao-Yang Liu, Christina Dan Wang (2023) - Venue: arXiv:2306.06031 - Core idea: 开源的金融 LLM 框架，用 LoRA fine-tune 通用模型（Llama / ChatGLM）为金融 task。强调 data-centric pipeline（数据流水线 > 模型本身）。 - Example: 用 LoRA 在消费级 GPU 上微调 Llama 2-7B 做 sentiment analysis，AUC 比通用 base model 高 10-15%。 - Code: github.com/AI4Finance-Foundation/FinGPT - L-anchor: L0.1 开源对照线，LoRA + 金融数据 = 低成本金融 LLM

L0.2 Trading-Decision Agents¶

TradingAgents: Multi-Agents LLM Financial Trading Framework · Yijia Xiao, Edward Sun, Di Luo, Wei Wang (UCLA, 2024) - Venue: arXiv:2412.20138 - Core idea: 7 个 specialized agent（Fundamental / Sentiment / News / Technical Analyst + Bull/Bear Researcher + Trader + Risk Manager + Fund Manager）通过结构化辩论协作决策。模拟真实 trading firm 的角色分工。 - Example: 在 NVDA / AAPL / GOOG 等 stock 上 1 年 backtest，annualized return 比 baseline LLM-trader (GPT-4 single-agent) 高 30%+。 - Code: github.com/TauricResearch/TradingAgents - L-anchor: L0.2 multi-agent trading 标杆，Bull/Bear debate 模式是 RD-Agent 等的灵感来源

QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model · Saizhuo Wang, Hang Yuan, Lionel M Ni, Jian Guo (HKUST, 2024) - Venue: arXiv:2402.03755 - Core idea: 双层架构——内层 LLM agent 生成交易信号 + 外层 LLM agent 反思并改进信号。short-horizon HFT-style structured-signal decision。 - Example: 在 9 个金融工具（BTC / Nasdaq futures / 农产品）上 5 分钟级别预测，directional accuracy 显著优于 single-pass LLM。 - Code: 部分开源 - L-anchor: L0.2 self-improving trading agent，RL-like 自我反思但用语言而非 reward

FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design · Yangyang Yu, Haohang Li, et al. (Stevens Institute, 2023) - Venue: arXiv:2311.13743 - Core idea: 三层 memory（Working / Episodic / Semantic）+ persona character。Episodic 存历史 trade 决策，Semantic 存抽象市场知识。 - Example: 用 FinMem 在 GME, COIN, TSLA 等股票上跑 backtest，cumulative return 在多个 ticker 上 outperform 标准 GPT-4 agent baseline。 - Code: github.com/pipiku915/FinMem-LLM-StockTrading - L-anchor: L0.2 memory architecture 在金融场景的早期尝试，对接 L3.4 MemGPT

FinAgent: A Multimodal Foundation Agent for Financial Trading · Wentao Zhang, Lingxuan Zhao, et al. (Microsoft + NTU, 2024) - Venue: arXiv:2402.18485 - Core idea: 多模态 foundation agent——同时处理 price chart (image) + news text + financial reports + numerical indicators。Tool-augmented：能调用 chart pattern recognition / event extraction 等工具。 - Example: 在 stocks + crypto datasets 上跑，annualized return 比 7 个 baseline（including reinforcement learning agents）高 36%+。 - Code: 未广泛开源 - L-anchor: L0.2 多模态 trading agent，技术上 inherit GPT-4V 视觉理解

L0.3 Alpha Mining / Factor Discovery Agents（你的主战场）¶

gplearn · Trevor Stephens, 2015-ongoing - Venue: Open-source Python package - Core idea: Genetic Programming 的 Python 实现。通过 tree-based 算子（+, -, , log, etc.）演化 symbolic 表达式，fit IC 或 Sharpe 作为 fitness。 - Example: est_gp = SymbolicTransformer(generations=20, population_size=2000, ...) → 生成 50 个 candidate alphas，挑选 IC 最高的。 - Code: github.com/trevorstephens/gplearn - L-anchor: L0.3 *classical baseline**，所有现代 alpha mining 工作都对照它

AutoAlpha: An Efficient Hierarchical Evolutionary Algorithm for Mining Alpha Factors in Quantitative Investment · Tianping Zhang, Yuanqi Li, et al. (2020) - Venue: arXiv:2002.08245 - Core idea: 把 GP 演化拆成"主线评估"和"子线变异"两层——主线管 candidate pool，子线管 mutation budget 分配。显著提升 mining 效率。 - Example: 在 A 股 universe 上挖出 200+ alpha factor，比 plain GP 减少 40% 时间收敛。 - Code: 论文附 pseudocode - L-anchor: L0.3 GP 优化方向

Alpha Mining and Enhancing via Warm Start Genetic Programming · 2024 - Venue: arXiv:2412.00896 - Core idea: 用 LLM 给 GP 提供 warm-start initial population——LLM 先生成"有经济学故事"的 candidate alphas，GP 在这个 population 基础上做精细化演化。混合优势。 - Example: 在中文市场 backtest 上 LLM warm-started GP 比 random init 的 GP 在 OOS 上 Sharpe 高 0.3。 - L-anchor: L0.3 LLM + GP 混合早期尝试

AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors · 2024 - Venue: arXiv:2406.18394 - Core idea: 不再是单纯 GP——用 DL-based generator 生成 formulaic alpha，再用 dynamic combination network 学习如何 combine 多个 alpha。 - Example: 在 CSI300 / CSI500 上跑 backtest，把 100 个 candidate alpha 动态合成，Sharpe 显著高于静态加权。 - L-anchor: L0.3 从 expression search → expression + combination 双层学习

AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay · 2025 - Venue: arXiv:2502.16789 - Core idea: 三 agent 闭环（Hypothesis Generation → Factor Construction → Evaluation）+ 三种 regularization 防 alpha decay：originality enforcement（强制和已有不同）/ hypothesis alignment / complexity control。 - Example: 在 CSI500 + S&P500 上 81% hit ratio improvement，对 alpha decay 鲁棒性显著。 - L-anchor: L0.3 LLM-driven alpha mining 早期完整闭环

Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining · 2025 - Venue: arXiv:2505.11122 - Core idea: AlphaProof 范式搬到 alpha 搜索——LLM 提供 prior（哪些 sub-expression 更有希望），MCTS 在因子表达式树上做 exploration。Backtest reward 作为节点 value。 - Example: 在 stock 数据上 MCTS depth-8 搜索，找到 50+ 个 IC > 0.05 的新 alpha，比纯 GP 高 30%+ 命中率。 - L-anchor: L0.3 AlphaProof → alpha mining 直接迁移

QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining · 2026 - Venue: arXiv:2602.07085 - Core idea: Self-evolving framework——LLM agent 不只生成单个 alpha，而是生成"研究 trajectory"（hypothesis → 实验设计 → factor → 评估 → 修正）。轨迹质量高的被存入 trajectory pool，作为后续 example。 - Example: 跨 CSI300 / S&P500 两市场迁移测试，trajectory-based learning 比 single-shot prompting 在 OOS Sharpe 高 0.4+。 - Code: github.com/QuantaAlpha/QuantaAlpha - L-anchor: L0.3 开源 + 框架完整度最高的 LLM-driven alpha mining

AlphaSAGE: GFlowNet-based Alpha Mining · 2025 - Venue: arXiv:2509.25055 - Core idea: 用 GFlowNet 替代 RL / GP——学到"和 reward 成比例的 distribution"而不只是 argmax，能 explore 多个 high-reward mode。这点对 alpha mining 长尾问题至关重要。 - Example: 在 alpha factor space 上 GFlowNet trajectory 比 PPO 找到的 high-reward mode 多 3-5 倍。 - L-anchor: L0.3 GFlowNet 路线，是 RL 之外的 viable 选择

FactorMAD: Multi-Agent Debate Framework for Interpretable Stock Alpha Factor Mining · 2025 - Venue: ACM ICAIF 2025 - Core idea: 两个 LLM agent 通过辩论 refine 因子——一个 propose 因子，一个 critique 经济学逻辑/统计严谨性，多轮 debate 收敛到 interpretable factor。 - Example: 生成的 alpha 比 single-agent baseline 在"可解释性 score"（人类评估）上高 40%。 - L-anchor: L0.3 debate-based mining，但注意 LLM-as-judge 闭环风险

Alpha-GPT: Human-AI Interactive Alpha Mining · 2025 - Venue: EMNLP 2025 demo paper - Core idea: 不是全自动——给用户一个 interactive REPL，用户输入"想法"，LLM 帮 formalize 成 alpha expression + config，跑回测。Trader-in-the-loop。 - Example: 一个 trader 用 Alpha-GPT 在 30 分钟内迭代出 5 个 candidate alpha，平均 IC 0.04。 - L-anchor: L0.3 product-form interactive mining，连接 L0.2 trading agent UX

QuantFactor REINFORCE · 2024 - Venue: arXiv:2409.05144 - Core idea: 明确指出 PPO 在 alpha mining 上 unstable——variance 太大 + reward 稀疏。改用 REINFORCE with variance bound，更适合 sparse-reward setting。 - Example: 在同样 candidate budget 下，REINFORCE-based search 比 PPO 找到 alpha 多 2x。 - L-anchor: L0.3 RL 路线的 PPO 替代

L0.4 Quant R&D Agents¶

RD-Agent(Q): Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization · Microsoft Research, 2025 - Venue: arXiv:2505.15155 - Core idea: 全栈 quant R&D agent——Research（hypothesis from priors + 任务 decompose）+ Development（Co-STEER code-gen agent）+ Feedback（real-market backtest）+ MAB scheduler 自适应分配研究方向。Factor 和 model 联合优化。 - Example: 在 stock market 上 2x annualized return vs classical factor library，且 factor 数量更少。 - Code: github.com/microsoft/RD-Agent · docs - L-anchor: L0.4 full quant R&D automation 标杆，是你 RQ3 的最强 threat 和参照

Beyond Prompting: Autonomous Systematic Factor Investing · 2025 - Core idea: 强调 "interpretable signal set" 而非黑盒——agent 不只输出因子，还输出经济学 rationale 和 OOS validation evidence。 - Example: 在 U.S. equities 上 18 个月 OOS Sharpe 1.2，且每个 factor 都有 paragraph-length 经济学解释。 - L-anchor: L0.4 autonomous + interpretable 路线

MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents · Ruochen Li, Teerth Patel, Qingyun Wang, Xinya Du (UTD + UIUC, 2024) - Venue: arXiv:2408.14033 - Core idea: 把 ML research 拆成 4 阶段（Idea → Experimentation → Implementation → Analysis），每阶段一个专门 agent。可作为通用 quant R&D 模板。 - Example: 在 5 个 ML benchmark 上自主完成 hypothesis-to-paper 流程。 - Code: github.com/du-nlp-lab/MLR-Copilot - L-anchor: L0.4 L0.4 → L1 桥梁，ML research agent 范式

L0.5 Risk / Safety / Reproducibility Agents¶

Hubble: Safe and Reproducible LLM Alpha Discovery · 2025 - Core idea: 安全沙箱（AST-level）+ cross-sectional metric + OOS evidence + HAC standard error。强调 reproducibility 比 best-of-N 重要。 - Example: 在 U.S. equities 上跑 10,000 候选 alpha，最终通过严格 OOS test 的 < 1%，但 reproducibility 在不同 random seed 间稳定。 - L-anchor: L0.5 safety-first alpha discovery

FactorMiner: Self-Evolving Alpha Discovery with Memory · 2025 - Core idea: Modular evaluation tools + redundancy/correlation control。每次发现新 alpha，先和 memory 中已有 alpha 算相关性，相关性 > 0.7 直接 drop。 - L-anchor: L0.5 memory + redundancy control

CogAlpha: LLM-Driven Code Evolution for Alpha · 2025 - Core idea: 不演化 expression，演化 executable code。LLM 生成 Python 函数，evolutionary fitness 反馈给 LLM 让它写下一版。 - Example: 在 A-share 上生成 100+ executable code alphas，robustness across regimes 比 expression-based 高。 - L-anchor: L0.5 代码层演化，接近 AlphaEvolve

L1 · Domain Research Agents¶

FunSearch: Mathematical Discoveries from Program Search with Large Language Models · Bernardino Romera-Paredes, et al. (DeepMind, 2023, Nature 2024) - Venue: Nature volume 625, pages 468–475 (2024) - Core idea: LLM + evolutionary algorithm + program-level verification。LLM 提议 program（"如何解题"），evolutionary loop 选最佳 program。最关键的是 verification 是 mechanical——Python 跑 program，结果对就保留。 - Example: 在 cap set 问题（长达 50 年的开放数学问题）上找到比已知更好的 lower bound；在 bin packing 上找到比 First Fit Decreasing 更好的算法。 - Code: github.com/google-deepmind/funsearch - L-anchor: L1 near-L5 系统（Chen 2026 综述判断），verifiable novelty 模板，你 RQ 直接继承

AI Scientist v1: Towards Fully Automated Open-Ended Scientific Discovery · Chris Lu, et al. (Sakana AI + 多家, 2024) - Venue: arXiv:2408.06292 - Core idea: 第一个端到端自主完成"idea → experiment → paper → review"全流程的系统。包括 LLM-based peer review。 - Example: 在 NanoGPT / 2D Diffusion / Grokking 三个领域产出 10 篇 paper，平均成本 $15/paper。 - Code: github.com/SakanaAI/AI-Scientist - L-anchor: L1 通用 ML 研究 agent 起点

AI Scientist v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search · Yamada et al. (Sakana, 2025) - Venue: arXiv:2504.08066 - Core idea: 抛弃 v1 的人工模板，引入 Agentic Tree Search——Experiment Manager Agent 在树节点决定下一步。节点 = 研究状态，action = 修改代码 / 跑实验 / 改方向 / 回退。 - Example: 首个 AI 写的 paper 通过 peer review（ICLR 2025 workshop）。 - Code: github.com/sakanaai/ai-scientist-v2 - L-anchor: L1 end-to-end autonomous research，你 RQ3 的范式

ASI-ARCH: AlphaGo Moment for Model Architecture Discovery · Liu et al. (GAIR + Shanghai AI Lab, 2025) - Venue: arXiv:2507.18074 - Core idea: 多 agent 框架（Researcher / Engineer / Analyst）+ Cognition Base（结构化人类架构论文知识）+ Database 演化储存。在 20,000 GPU 小时上跑 1,773 实验，发现 106 个超越 Mamba2 的新线性 attention 架构。最重磅是给出第一条 Scaling Law for Scientific Discovery。 - Example: 累计 SOTA 架构数 vs total GPU hours 在 1,773 实验范围内呈线性关系。 - Code: github.com/GAIR-NLP/ASI-Arch - L-anchor: L1 architecture discovery + scaling law，你的 Crypto-Alpha-Bench 试图复制 scaling law on 金融

AlphaProof / AlphaGeometry 2: Olympiad-level Formal Mathematical Reasoning with RL · DeepMind, 2024-2025 - Venue: Nature volume 643 (2025) - Core idea: 双系统——AlphaProof（代数/数论，用 AlphaZero MCTS + Lean kernel）+ AlphaGeometry 2（几何，LLM 提辅助线 + 符号引擎推演绎闭包）。在 IMO 2024 拿银牌（6 题 4 题）。 - Example: AlphaProof 解决 IMO 2024 P6（被认为最难的代数题，现场只 5 人解出）。 - Code: 论文附 Lean tactics - L-anchor: L1 neuro-symbolic verification 终极形式，verifier 可信度 = Lean kernel

AlphaEvolve: A Gemini-Powered Coding Agent for Designing Advanced Algorithms · DeepMind, 2025 - Venue: arXiv:2506.13131 - Core idea: FunSearch 升级版——演化"长程序"而非小函数；Gemini Flash + Pro 双模型分工（Flash 快变异，Pro 精修）；多 evaluator Pareto 优化。 - Example: 把 4×4 复数矩阵乘法从 Strassen 的 49 次乘法降到 48 次（57 年来首次改进）；DeepMind 数据中心调度策略带来可观节省。 - Code: 第三方复现 OpenEvolve - L-anchor: L1 FunSearch 工程化，最接近 production 的 AI for algorithm discovery

Coscientist: Autonomous Chemical Research with Large Language Models · Boiko et al. (CMU + Emerald Cloud Lab, 2023, Nature 2024) - Venue: Nature 624, pages 570–578 (2023) - Core idea: 4 模块（Planner / Web Searcher / Docs / Code Execution）+ 自动接 Emerald Cloud Lab 实验室机器人。第一个端到端自主跑湿实验的 chemistry agent。 - Example: 自主完成 palladium-catalyzed cross-coupling 反应优化，从文献检索到机器人执行全自动。 - Code: 部分开源 - L-anchor: L1 wet-lab integrated，chemistry domain 标杆

ChemCrow: Augmenting Large-Language Models with Chemistry Tools · Bran, Cox, Schilter, et al. (EPFL + Rochester, 2023) - Venue: arXiv:2304.05376 - Core idea: 不训练 chemistry-specific LLM，而是给 GPT-4 装 17 个 chemistry tool（RDKit / WebSearch / synthesis planning APIs），ReAct 串起来。 - Example: 自主设计催化剂、规划合成路线、解释 NMR 谱。 - Code: github.com/ur-whitelab/chemcrow-public - L-anchor: L1 tool-augmented 范式，证明通用 LLM + 专业 tool > domain-tuned LLM

BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology · O'Donoghue et al. (Cambridge, 2024) - Venue: ICLR 2024 - Core idea: 不是端到端 agent，是 benchmark + planner——评估 LLM 在生物学 protocol planning 上的能力。 - L-anchor: L1 biology domain 评估

GNoME: Scaling Deep Learning for Materials Discovery · Merchant et al. (DeepMind + LBNL, 2023, Nature 2023) - Venue: Nature 624, pages 80–85 (2023) - Core idea: GNN + active learning iterative loop。从 4.8 万已知 stable crystal 出发，迭代生成 + DFT validation，最终发现 2.2 million new crystal structures，其中 38 万被预测稳定。 - Example: 38 万新材料中 736 个已被实验室合成验证。 - Code: 部分开源 - L-anchor: L1 materials discovery scaling，证明 AI for Science 可大规模产出

L2 · Production Agent Systems¶

Devin: Autonomous AI Software Engineer · Cognition Labs, 2024 - Venue: Blog + demo, no formal paper - Core idea: 第一个 production "AI software engineer"——给 GitHub issue，agent 自主 git clone / 写代码 / 跑测试 / 开 PR。Long-horizon planning + browser + IDE control。 - Example: 在 SWE-bench 上 13.86% solved rate（2024 launch，远超当时 SOTA 1.96%）。 - Code: 闭源 SaaS - L-anchor: L2 production coding agent 起点

Claude Code · Anthropic, 2024 - Venue: Product launch - Core idea: CLI tool + IDE integration，codebase-aware coding agent。72% SWE-bench Verified（Claude Sonnet 4.6, 2025）。 - Example: 用户在 terminal 直接对话，Claude Code 读 codebase / 写代码 / 跑测试。 - Code: 闭源 - L-anchor: L2 当前 production coding agent SOTA

SWE-Agent: Agent-Computer Interfaces Enable Software Engineering Language Models · Yang, Jimenez, et al. (Princeton, 2024) - Venue: NeurIPS 2024 - Core idea: 关键 insight——LLM agent 需要专门设计的 Agent-Computer Interface (ACI) 而不是直接给 raw bash。设计 file viewer / editor / search 工具，每个工具 prompt 优化过。 - Example: 在 SWE-bench Verified 上 12.5% solved（2024，open-source 第一）。 - Code: github.com/princeton-nlp/SWE-agent - L-anchor: L2 ACI 概念——任何 production agent 都需要

OpenHands (ex-OpenDevin): An Open Platform for AI Software Developers as Generalist Agents · OpenHands Team, 2024 - Venue: arXiv:2407.16741 - Core idea: 开源 Devin alternative——多 runtime backend（Docker / Kubernetes / local），plug-and-play 不同 LLM。 - Code: github.com/All-Hands-AI/OpenHands - L-anchor: L2 open-source production agent framework

Anthropic Computer Use · Anthropic, 2024-10 - Venue: Blog launch - Core idea: Claude 3.5 Sonnet 直接控制 desktop——读 screenshot，输出 mouse click 坐标 + keyboard input。 - Example: 自主完成"打开 spreadsheet → 填数据 → 保存 → 发邮件"全流程。 - L-anchor: L2 visual GUI agent 起点

OpenAI Operator · OpenAI, 2025 - Venue: Product - Core idea: Browser-based agent，专门处理 web task。给 user goal（"订一张周五飞旧金山的票"），agent 自己点点点完成。 - L-anchor: L2 task-oriented browser agent

OpenSeeker / OpenSeeker-v2: Democratizing Frontier Search Agents · GAIR-NLP / SJTU, 2025-2026 - Venue: arXiv:2603.15594 (v1), arXiv:2605.04036 (v2) - Core idea: Fact-grounded scalable controllable QA synthesis（反向构造）+ Denoised trajectory synthesis。v1 用 11.7k synthesized sample + 简单 SFT 击败用 CPT+SFT+RL 重训的工业级模型。 - Example: BrowseComp 29.5% (v1) → 46.0% (v2)；BrowseComp-ZH 48.4% 超过通义 DeepResearch 的 46.7%。 - Code: github.com/rui-ye/OpenSeeker - L-anchor: L2 开源 deep research agent，证明 data quality > training pipeline complexity

STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking · Shao et al. (Stanford, 2024) - Venue: NAACL 2024 - Core idea: 生成 Wikipedia-style 长文章——先做 outline，再用 multi-perspective question 探索每个 section。 - Code: github.com/stanford-oval/storm - L-anchor: L2 report generation 标杆，对接 Deep Research pipeline 第 4 阶段

GPT Deep Research / Perplexity Pro Deep Research / Gemini Deep Research · OpenAI / Perplexity / Google, 2024-2025 - Venue: Product launches - Core idea: 用户给一个研究问题，agent 自主多轮 web search + 合成长报告（10-30 分钟）。 - L-anchor: L2 closed-source deep research SOTA

Tongyi DeepResearch · Alibaba, 2025 - Venue: 工业级开源 - Core idea: 30B 级别的 deep research agent，专门优化中文 web。 - L-anchor: L2 中文 deep research

Devin / Claude Code 对照系列: - Cursor / Windsurf / Aider (IDE-integrated) - Agentless（反 agent loop, single-shot SWE） - AgentCoder / AutoCodeRover

L3 · LLM Agent Patterns & Frameworks¶

3.1 Reasoning Patterns¶

Chain of Thought (CoT) · Wei, Wang, Schuurmans, et al. (Google, 2022) - Venue: NeurIPS 2022, arXiv:2201.11903 - Core idea: 在 few-shot prompt 里展示"逐步思考"的例子，让 LLM 模仿这种 reasoning style。多步算术 / 常识推理 / 符号推理任务上大幅提升。 - Example: "John has 5 apples. He gives 2 to Mary..." → "Step 1: John starts with 5. Step 2: After giving 2, he has 3." - L-anchor: L3.1 reasoning pattern 之祖

Self-Consistency · Wang, Wei, et al. (Google, 2022) - Venue: ICLR 2023, arXiv:2203.11171 - Core idea: 同一 prompt 采样 N 次 CoT，选最多数答案。简单但显著提升。 - L-anchor: L3.1 inference-time aggregation

Tree of Thoughts (ToT) · Yao, Yu, Zhao, et al. (Princeton, 2023) - Venue: NeurIPS 2023, arXiv:2305.10601 - Core idea: 不再是 linear CoT，而是树形——每个 thought 是一个 state，LLM 同时评估多个 thought，选最有希望的扩展。BFS / DFS / beam search 可选。 - Example: 在 Game of 24 上 ToT 4% → 74% 解题率（GPT-4 base）。 - Code: github.com/princeton-nlp/tree-of-thought-llm - L-anchor: L3.1 tree-structured reasoning

Graph of Thoughts (GoT) · Besta et al. (ETH Zurich, 2023) - Venue: AAAI 2024, arXiv:2308.09687 - Core idea: ToT 的 generalization——任意 DAG 的 thought 拓扑（合并 / 分支 / 反馈）。 - L-anchor: L3.1 reasoning 拓扑的最一般形式

LATS (Language Agent Tree Search) · Zhou, Yu, et al. (UIUC, 2023) - Venue: ICML 2024, arXiv:2310.04406 - Core idea: ToT + MCTS + Reflexion = 集大成。Tree search with value function（LLM 估值）+ verbal reflection on failures。 - L-anchor: L3.1 reasoning tree + RL search 融合

Reflexion: Language Agents with Verbal Reinforcement Learning · Shinn, Cassano, et al. (Northeastern, 2023) - Venue: NeurIPS 2023, arXiv:2303.11366 - Core idea: 失败的 trace 让 LLM 写一段"reflection"，下一次 attempt 把 reflection 作为 prompt。等于用语言做 RL。 - Example: 在 HumanEval / AlfWorld 上比 ReAct 提升 10%+。 - Code: github.com/noahshinn/reflexion - L-anchor: L3.1 verbal RL

Self-Refine: Iterative Refinement with Self-Feedback · Madaan et al. (CMU + 多家, 2023) - Venue: NeurIPS 2023, arXiv:2303.17651 - Core idea: 三步循环——initial output → critique（同一 LLM）→ refine。无需训练，纯 prompting。 - L-anchor: L3.1 自我修正最简范式

Program-Aided Language (PAL) · Gao, Madaan, et al. (CMU, 2022) - Venue: ICML 2023, arXiv:2211.10435 - Core idea: 用代码作为 reasoning trace——LLM 写 Python solving step，Python 执行得答案。比 CoT 数值准确性显著高。 - L-anchor: L3.1 code-as-reasoning

3.2 Planning Patterns¶

ReAct: Synergizing Reasoning and Acting in Language Models · Yao, Zhao, et al. (Princeton + Google, 2022) - Venue: ICLR 2023, arXiv:2210.03629 - Core idea: 在每一步 LLM 输出 Thought + Action（调工具）+ Observation 循环。思考和行动交错而不是分离。 - Example: 在 HotpotQA / FEVER / ALFWorld 上比 chain-of-thought-only baseline 提升 15-30%。 - Code: github.com/ysymyth/ReAct - L-anchor: L3.2 planning + action 的最基础模式，几乎所有现代 agent 用

Plan-and-Execute / Plan-and-Solve · Wang et al. (Salesforce, 2023) - Venue: ICLR 2023, arXiv:2305.04091 - Core idea: 先做完整 plan，再 step-by-step 执行。和 ReAct 互补——ReAct 更适合 reactive task，Plan-and-Execute 适合 predictable subgoal。 - L-anchor: L3.2 plan-first 范式

ReWOO (Reasoning without Observation) · Xu et al. (THU + Microsoft, 2023) - Venue: arXiv:2305.18323 - Core idea: 把 ReAct 拆成 3 个 module——Planner 一次性生成所有 plan / Worker 执行 / Solver 整合。避免每步都 call LLM 的 cost。 - L-anchor: L3.2 cost-optimized planning

3.3 Tool Use & Integration¶

Toolformer: Language Models Can Teach Themselves to Use Tools · Schick et al. (Meta AI, 2023) - Venue: NeurIPS 2023, arXiv:2302.04761 - Core idea: 自监督学怎么用工具——LLM 自己在 corpus 里标"这里调用 X 工具有用"，再 fine-tune。 - L-anchor: L3.3 self-supervised tool use 起点

OpenAI Function Calling · OpenAI, 2023-06 - Venue: Product - Core idea: 给 LLM 一组 function schema (JSON)，输出可直接 parse 成 function call 的结构化 JSON。 - L-anchor: L3.3 structured tool use 的工业标准

MCP (Model Context Protocol) · Anthropic, 2024-11 - Venue: Open spec - Core idea: agent-tool 通信的"HTTP for AI"——server 暴露 resources / tools / prompts，client 标准化访问。 - Code: github.com/modelcontextprotocol - L-anchor: L3.3 tool integration 标准

Agent-to-Agent (A2A) · Google, 2025 - Venue: Open spec - Core idea: 跨 vendor agent 通信——agent A 怎么 discover 和调用 agent B。Agent Cards 描述 capability。 - L-anchor: L3.3 multi-agent interop 标准

3.4 Memory Architectures¶

MemGPT: Towards LLMs as Operating Systems · Packer, Wooders, Lin, et al. (Berkeley, 2023) - Venue: arXiv:2310.08560 - Core idea: 把 OS memory hierarchy 思想搬到 LLM——working memory (context) + external memory (persistent storage)，agent 自己决定"page in/out"。 - Example: 在 multi-document QA + 长对话场景显著优于 fixed-context baselines。 - Code: github.com/cpacker/MemGPT - L-anchor: L3.4 memory hierarchy 经典

Voyager: An Open-Ended Embodied Agent with Large Language Models · Wang et al. (NVIDIA + Caltech, 2023) - Venue: arXiv:2305.16291 - Core idea: 在 Minecraft 里 lifelong learning——LLM 写代码（skill），存进 skill library，遇到新任务从 library 检索。 - Example: 自主探索 Minecraft 1.5x faster, 3.3x more unique items than baselines。 - Code: github.com/MineDojo/Voyager - L-anchor: L3.4 skill library 范式

Generative Agents: Interactive Simulacra of Human Behavior · Park et al. (Stanford + Google, 2023) - Venue: UIST 2023, arXiv:2304.03442 - Core idea: 25 个 LLM agent 在 Smallville 模拟小镇——每个有 memory stream + reflection + planning。emergent 社交行为。 - Code: github.com/joonspk-research/generative_agents - L-anchor: L3.4 agent simulation + memory

3.5 Multi-Agent Frameworks¶

CAMEL: Communicative Agents for Mind Exploration of Large Language Model Society · Li et al. (KAUST, 2023) - Venue: NeurIPS 2023, arXiv:2303.17760 - Core idea: 两个 LLM agent 通过 role-play 完成 task——一个 "user"，一个 "assistant"，自主对话推进。 - Code: github.com/camel-ai/camel - L-anchor: L3.5 role-play multi-agent 起点

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation · Wu et al. (Microsoft, 2023) - Venue: arXiv:2308.08155 - Core idea: 通用 multi-agent 框架，agent 之间自由对话，支持 group chat 和自定义 conversation pattern。 - Example: Coding + 文档生成 + 数据分析 multi-agent 协作 demo。 - Code: github.com/microsoft/autogen - L-anchor: L3.5 multi-agent framework 工业标杆

MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework · Hong et al. (DeepWisdom, 2023) - Venue: ICLR 2024, arXiv:2308.00352 - Core idea: 把 software company SOP（Standard Operating Procedure）烧进 multi-agent 流程——Product Manager / Architect / Engineer / QA 角色固定，输出 structured artifact（design doc / API spec / code）。 - Code: github.com/geekan/MetaGPT - L-anchor: L3.5 SOP-based multi-agent

ChatDev: Communicative Agents for Software Development · Qian et al. (THU, 2023) - Venue: ACL 2024, arXiv:2307.07924 - Core idea: 软件公司 metaphor 完整版——CEO / CTO / Programmer / Tester / Designer。 - Code: github.com/OpenBMB/ChatDev - L-anchor: L3.5 software-company simulation

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors · Chen et al. (THU + Microsoft, 2023) - Venue: ICLR 2024, arXiv:2308.10848 - Core idea: 通用 multi-agent 实验框架，支持 expert recruitment + decision-making + action execution + evaluation。 - Code: github.com/OpenBMB/AgentVerse - L-anchor: L3.5 simulation env

LangGraph · LangChain, 2024 - Venue: Open framework - Core idea: 用 graph data structure（节点 = step，边 = transition）描述 agent workflow。支持 cycle、conditional routing、state persistence。 - Code: github.com/langchain-ai/langgraph - L-anchor: L3.5 graph-based orchestration

CrewAI · 开源，2024 - Core idea: "crew" metaphor——role + task + tools，更轻量。 - Code: github.com/crewAIInc/crewAI - L-anchor: L3.5 lightweight multi-agent

Swarm · OpenAI, 2024 - Core idea: 极简 handoff pattern——一个 agent 把控制权 "handoff" 给另一个。 - Code: github.com/openai/swarm - L-anchor: L3.5 handoff pattern

L4 · Foundation Model Tech Stack¶

Attention is All You Need · Vaswani et al. (Google, 2017) - Venue: NeurIPS 2017 - Core idea: Transformer architecture——self-attention 替代 RNN。所有现代 LLM 的根基。 - L-anchor: L4.1 架构起点

Scaling Laws for Neural Language Models · Kaplan et al. (OpenAI, 2020) - Venue: arXiv:2001.08361 - Core idea: Loss 是 (compute / data / params) 的 power law。指导 GPT-3 scale。 - L-anchor: L4.1 scaling 经典

Training Compute-Optimal Large Language Models (Chinchilla) · Hoffmann et al. (DeepMind, 2022) - Venue: NeurIPS 2022, arXiv:2203.15556 - Core idea: 修正 Kaplan——data 应该和 params 等比例扩。Chinchilla 70B 在 1.4T token 训练，beat GPT-3 175B。 - L-anchor: L4.1 现代 LLM scaling 规则

DeepSeek-V3 Technical Report · DeepSeek, 2024-12 - Venue: arXiv:2412.19437 - Core idea: 671B params / 37B active MoE + MLA (Multi-head Latent Attention) + MTP (Multi-Token Prediction) + Fp8 训练。Beat GPT-4 on many benchmarks at 1/10 cost。 - Code: github.com/deepseek-ai/DeepSeek-V3 - L-anchor: L4.1 open-source MoE frontier

DeepSeek-V4 Technical Report · DeepSeek, 2026-04 - Venue: Technical report - Core idea: V4-Pro (1.6T / 49B active) + V4-Flash (284B / 13B)。三创新：Hybrid Attention（CSA + HCA，1M context FLOPs 减 73%）+ mHC (Manifold-Constrained Hyper-Connections) + Muon optimizer。33T 训练 token，重点 long-doc + agentic trace。 - Example: 1M context，SWE-bench Verified 80.6%（匹敌 Gemini 3.1 Pro），Terminal Bench 67.9%。Agent workload 比竞品便宜 60-80%。 - L-anchor: L4.1 2026 frontier open model

InstructGPT (RLHF) · Ouyang et al. (OpenAI, 2022) - Venue: NeurIPS 2022 - Core idea: SFT + Reward Model + PPO 三段式对齐。ChatGPT 的方法学基础。 - L-anchor: L4.2 RLHF 起点

Direct Preference Optimization (DPO) · Rafailov et al. (Stanford, 2023) - Venue: NeurIPS 2023, arXiv:2305.18290 - Core idea: 推导出闭式解——可以直接用 preference data 训 LLM，绕过 reward model + PPO。 - L-anchor: L4.2 simpler alignment

Constitutional AI · Bai et al. (Anthropic, 2022) - Venue: arXiv:2212.08073 - Core idea: 用 principles（constitution）替代人类 labeler——AI 自我评估是否符合 principles。 - L-anchor: L4.2 scalable oversight

DeepSeek-R1: Incentivizing Reasoning in LLMs via RL · DeepSeek, 2025-01 - Venue: arXiv:2501.12948 - Core idea: 用 GRPO (Group Relative Policy Optimization) 在 reasoning tasks 上做 RL，不需要 reward model。R1-Zero 纯 RL 不用 SFT 也能 emerge reasoning。 - Example: R1 671B 在 AIME / MATH 上接近 o1。 - Code: github.com/deepseek-ai/DeepSeek-R1 - L-anchor: L4.2 + L4.3 开源 reasoning model 起点

OpenAI o1 / o3 · OpenAI, 2024 / 2025 - Venue: Product launches - Core idea: Inference-time hidden CoT + RL。test-time compute scaling 范式开创者。 - L-anchor: L4.3 test-time compute paradigm

Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Gu, Dao (CMU + Princeton, 2023) - Venue: arXiv:2312.00752 - Core idea: Selective SSM——sub-quadratic 序列建模。挑战 Transformer 在长序列任务上的统治。 - Code: github.com/state-spaces/mamba - L-anchor: L4.1 post-Transformer 候选

Encoding Recurrence into Transformers · Huang, Lu, Cai, Qin, Fang, Tian, Guodong Li (HKU, ICLR 2023 Oral) - Venue: ICLR 2023, OpenReview - Core idea: REM (Recurrence Encoding Matrix) + RSA (Self-Attention with Recurrence)——把 RNN lightweight encode 进 attention。Better sample efficiency than baseline Transformer。 - Code: github.com/neithen-Lu/encoding_recurrence_into_transformers - L-anchor: L4.1 architectural prior anchor for RQ1

L5 · Algorithmic Foundations¶

Q-Learning · Watkins, 1989 (PhD thesis Cambridge) - Core idea: Off-policy temporal difference control。value-based RL 之祖。 - L-anchor: L5.1 RL 经典

Playing Atari with Deep Reinforcement Learning (DQN) · Mnih et al. (DeepMind, 2013, Nature 2015) - Venue: Nature 518, pages 529–533 (2015) - Core idea: Deep Q-Network——CNN + Q-learning。第一次 deep RL 大规模成功。 - L-anchor: L5.1 deep RL 起点

Mastering the Game of Go with Deep Neural Networks and Tree Search (AlphaGo) · Silver et al. (DeepMind, Nature 2016) - Venue: Nature 529, pages 484–489 (2016) - Core idea: Policy network + Value network + MCTS。打败 Lee Sedol。 - L-anchor: L5.1 + L5.2 NN + MCTS 融合

Mastering Chess and Shogi by Self-Play with a General RL Algorithm (AlphaZero) · Silver et al. (DeepMind, 2017) - Venue: arXiv:1712.01815 - Core idea: 单一算法 + self-play 学 Go/Chess/Shogi。零先验。 - L-anchor: L5.1 general game RL

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) · Schrittwieser et al. (DeepMind, 2019, Nature 2020) - Venue: Nature 588, pages 604–609 (2020) - Core idea: 不需要知道 game rules——用 learned world model + MCTS planning。 - L-anchor: L5.1 model-based RL 顶点

Proximal Policy Optimization (PPO) · Schulman et al. (OpenAI, 2017) - Venue: arXiv:1707.06347 - Core idea: TRPO 简化版——clip 策略更新幅度。RLHF / InstructGPT 用的就是它。 - L-anchor: L5.1 policy optimization 工业标准

Decision Transformer: Reinforcement Learning via Sequence Modeling · Chen et al. (Berkeley + Stanford, 2021) - Venue: NeurIPS 2021, arXiv:2106.01345 - Core idea: 把 RL 重新表述为 sequence modeling task——给定 return-to-go + state，预测 action。把 Transformer 当 RL 算法。 - L-anchor: L5.1 RL as sequence

GFlowNet: Generative Flow Networks · Bengio et al. (Mila, 2021+) - Venue: NeurIPS 2021, JMLR 2023 - Core idea: 学到的不是 argmax policy 而是和 reward 成比例的 distribution。explore 多 mode。 - L-anchor: L5.2 structured search，AlphaSAGE 灵感来源

Monte Carlo Tree Search Tutorial · Browne et al., 2012 - Venue: IEEE TCIAIG - Core idea: UCT 等 MCTS 变体的标准 reference。 - L-anchor: L5.2 search 经典

Sequential Predictive Conformal Inference for Time Series · Xu, Xie (Georgia Tech, 2022) - Venue: arXiv:2212.03463 - Core idea: 时序数据 non-exchangeable 怎么做 conformal prediction——split + online recalibration。 - L-anchor: L5.3 conformal for TS，RQ2 直接用

Temporal Conformal Prediction (TCP) · 2025 - Venue: arXiv:2507.05470 - Core idea: Distribution-free + ML quantile forecaster + Robbins-Monro 校准。 - L-anchor: L5.3 conformal TS latest

End-of-document¶

这份 doc 总共 ~70 paper，每篇有 author / year / venue / core idea / example / code / L-anchor。作为 search index 用——你被问任何 paper 都能在这里 30 秒定位。

配套的 PPT 在 agent_landscape_technical_deck.pptx（next file）。