Agent Research Landscape · From Alpha Auto Search Outward¶
把你的 alpha auto search 工作放在更大的 agent 研究背景里,6 层同心圆结构——从最内层(你的项目本身)一层层 zoom out 到 cognitive architecture 历史脉络。每一层标注:scope / key surveys / representative systems / 你工作的 positioning / 对 HKU 汇报的引用建议。
2026-05-19 · Maintained by Paul Weng
同心圆结构图¶
L5 · Cognitive Architectures / AGI Frameworks 最外层(理论根基)
└─ L4 · Agentic AI / RL Foundations
└─ L3 · LLM-Based Agents (general)
└─ L2 · AI for Science / AI for Math
└─ L1 · Autonomous Research Agents
└─ L0 · Alpha Auto Search 最内层(你的工作)
L0 是你的研究对象本身;L1 是它最直接的 academic parent;L5 是最远的智识根基。汇报时建议显式声明你在哪一层 ground,避免被问"为什么不做更大/更小"时被动。
L0 · Alpha Auto Search(最内层 / 你的核心工作)¶
Scope: 自动化发现/搜索 alpha 因子或 trading signal。Search unit = formula / program / NN weights / portfolio。
Key Surveys (2025-2026)¶
| Title | Authors | Venue | arXiv |
|---|---|---|---|
| Survey on LLM-based Alpha Mining | — | FITEE 2025 | 10.1631/FITEE.2500386 |
| AlphaEval: Comprehensive Eval for Formula Alpha Mining | Ding et al. | 2025 | 2508.13174 |
Representative Systems¶
| Tradition | Systems |
|---|---|
| Classical GP | gplearn / AutoAlpha / gpquant / AlphaForge / AlphaSAGE (GFlowNet) / AlphaPROBE |
| DL factor | FactorVAE / HIST / HireVAE / RVRAE / FactorGCL |
| LLM-driven formula | AlphaAgent / Alpha Jungle (LLM-MCTS) / QuantaAlpha / FactorMAD / Alpha-GPT |
| Benchmark | AlphaBench (ICLR 2026) / AlphaEval / Crypto-Alpha-Bench (你的提案) |
你的工作 positioning¶
- Verifier 侧已交付(M8.6 walk-forward + microstructure gate + adaptive state controller)
- Generator 侧未做——这是你 RQ3 (Cognition Base) + Researcher Agent 要补的
- Benchmark contribution——Crypto-Alpha-Bench 是 alpha auto search 的 ImageNet moment 提案
汇报引用¶
- Slide 5 / 7-10 全部 L0 内容
- 已有 artifact:
alpha_search_baselines.md/alpha_search_survey_taxonomy_and_bibliography.md/financial_sota_agent_survey.md
L1 · Autonomous Research Agents(最直接的 academic parent)¶
Scope: LLM-powered agents 自主完成研究工作流(hypothesis → experiment → analysis → writeup)。Alpha auto search 是这一层在 finance 的 instantiation。
Key Surveys¶
| Title | Authors | Year | URL |
|---|---|---|---|
| From Copilots to Colleagues: A Survey of Autonomous Research Agents | Deli Chen | 2026 early | victorchen96.github.io |
| Deep Research: A Survey of Autonomous Research Agents | Zhang et al. | 2025-08 | 2508.12752 |
| Deep Research Agents: A Systematic Examination And Roadmap | — | 2025-06 | 2506.18096 |
| Deep Research: A Systematic Survey | — | 2025-12 | 2512.02038 |
| Reinforcement Learning Foundations for Deep Research Systems | — | 2025-09 | 2509.06733 |
Key Frameworks¶
- L1-L5 autonomy taxonomy (Chen 2026)——类比 SAE 自动驾驶;现 frontier 在 L4,L5 aspirational
- 4-stage Deep Research pipeline (Zhang 2025)——planning / question-developing / web exploration / report generation
- 4 architectures——single-agent loop / multi-agent / hierarchical / tool-augmented
Representative Systems¶
| Domain | Systems |
|---|---|
| ML research | AI Scientist v1/v2 (Sakana) / MLR-Copilot / RD-Agent(Q) / AgentRxiv |
| Architecture discovery | ASI-ARCH (GAIR 2025) |
| Math/algorithms | FunSearch / AlphaProof / AlphaEvolve / AlphaGeometry |
| Chemistry | Coscientist / ChemCrow |
| Biology | BioPlanner / MedAgents |
| General research | GPT Deep Research / Perplexity Pro / STORM / Tongyi DeepResearch |
6 Open Problems (Chen 2026)¶
- Cognitive loop trap (反复陷入失败策略)
- Context window limits
- Novelty evaluation(survey 说是 "fundamentally unsolved... philosophical")
- Reproducibility / determinism (SWE-bench std 5-15%)
- Safety / dual-use
- Cost ($100-1000/research campaign)
你的工作 positioning¶
- Alpha auto search = autonomous research agent 在 finance domain 的 instantiation
- FunSearch 被 Chen 2026 点为 "nearest L5",因为 verifiable novelty——alpha 因子天然 inherit 这个属性(IC/Sharpe/PnL 是 mechanical verifier)
- Chen 2026 survey 完全没覆盖 finance domain——这是 Crypto-Alpha-Bench 的 academic whitespace
- 6 个 open problems 几乎全部映射到 alpha search(详见
agent_research_landscape.md§L1 后续 mapping)
汇报引用¶
- Slide 7 改用 L1-L5 ladder 替代 "4 common patterns"
- Slide 9(Crypto-Alpha-Bench gap)显式 claim "first finance-domain research-agent benchmark"
- Q&A: 引 Chen 2026 6 open problems → 每个映射到 alpha search
L2 · AI for Science / AI for Math(智识 lineage 的母层)¶
Scope: AI 在科学发现 / 数学定理证明 / 算法发明上的应用。Autonomous research agents 是这条线的近期分支。
Key Surveys¶
| Title | Authors | Year | URL |
|---|---|---|---|
| Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions | — | 2025-03 | 2503.08979 |
Landmark Systems (按时间)¶
| System | Year | Domain | Key |
|---|---|---|---|
| AlphaFold (2) | 2020 / 2021 | Protein structure | Diff-equiv + evoformer; 解锁结构生物学 |
| AlphaTensor | 2022 | Matrix multiplication | RL + game tree 找新算法 |
| FunSearch | 2023 (Nature) | Math / bin packing | LLM + evolution + 程序级 verification |
| AlphaGeometry / AlphaGeometry 2 | 2024 / 2025 | IMO geometry | Neuro-symbolic + 形式化证明 |
| AlphaProof | 2024-2025 | IMO algebra/number theory | Gemini + AlphaZero MCTS + Lean kernel |
| AlphaEvolve | 2025 | Algorithm discovery | FunSearch + Pareto + 长程序 |
| GNoME | 2023 (Nature) | Materials science | GNN + active learning, 2.2M 新材料 |
| ASI-ARCH | 2025-07 | Linear attention archs | Multi-agent + Scaling Law for Discovery |
Core Pattern (4 共同模式)¶
- Generator-verifier separation
- Cognition base / knowledge grounding
- Multi-agent decomposition
- Compute-scaled discovery
你的工作 positioning¶
- Alpha auto search 共享 AI for Science 的全部 4 模式
- 关键 differentiator: 金融 verifier 不是 mechanical(math/code 那样)但是 statistical——需要 PBO / DSR / multiple testing 替代 Lean 内核
- 你的 self-evolution research reference 实际上是 AI for Science 范式的 finance-specific safety adaptation
汇报引用¶
- Slide 2 thesis & Slide 7 frontier evolution
- 已有 artifact:
alpha_search_baselines.md主战场
L3 · LLM-Based Agents (general)(更广的 agent 文献)¶
Scope: 任何用 LLM 作为 reasoning engine 的 agent 系统——不限于 research,包括 coding / web browsing / robotics / business automation。
Key Surveys("必读"级别)¶
| Title | Authors | Year | URL |
|---|---|---|---|
| A Survey on LLM-Based Autonomous Agents | Wang et al. (Renmin U) | 2023 → 2025 | 2308.11432 |
| The Rise and Potential of LLM-Based Agents: A Survey | Xi et al. (Fudan) | 2023 | 2309.07864 |
| Large Language Model Agent: A Survey on Methodology, Applications and Challenges | — | 2025-03 | 2503.21460 |
| Evaluation and Benchmarking of LLM Agents: A Survey | Mohammadi et al. | 2025-07 | 2507.21504 |
| LLM-based Agentic Reasoning Frameworks: A Survey | — | 2025-08 | 2508.17692 |
| LLM-Based Human-Agent Collaboration | — | 2025-05 | 2505.00753 |
Foundational Techniques¶
| Technique | Origin | Key paper |
|---|---|---|
| Tool use / function calling | OpenAI 2023 / Toolformer | Schick 2023 |
| ReAct (Reason + Act 交错) | Princeton 2022 | Yao 2210.03629 |
| Reflexion (verbal RL) | Northeastern 2023 | Shinn 2303.11366 |
| Chain of Thought / Tree of Thoughts | Google 2022 / Princeton 2023 | Wei / Yao |
| LATS (Language Agent Tree Search) | 2023 | Zhou 2310.04406 |
| Self-Refine / Self-Critique | CMU 2023 | Madaan 2303.17651 |
| Plan-and-Execute / Plan-and-Solve | 2023 | Wang |
| Voyager (open-world skill learning) | NVIDIA 2023 | Wang 2305.16291 |
| MemGPT / Memory hierarchies | Berkeley 2023 | Packer 2310.08560 |
| MCP (Model Context Protocol) | Anthropic 2024 | open standard for tool integration |
| A2A (Agent-to-Agent Protocol) | Google 2025 | cross-vendor agent communication |
Multi-Agent Frameworks¶
| Framework | Origin | Key paper |
|---|---|---|
| CAMEL (role-play conversation) | KAUST 2023 | Li 2303.17760 |
| AutoGen | Microsoft 2023 | Wu 2308.08155 |
| MetaGPT (SOP-based software co.) | DeepWisdom 2023 | Hong 2308.00352 |
| ChatDev | THU 2023 | Qian 2307.07924 |
| AgentVerse | THU 2023 | Chen 2308.10848 |
| LangGraph (graph-based orchestration) | LangChain 2024 | open-source framework |
Production Coding Agents (frontier L4)¶
- Devin (Cognition Labs 2024) — 第一个 production "AI software engineer"
- Claude Code (Anthropic 2024) — 72% SWE-bench Verified
- Cursor / Aider — IDE-integrated coding agents
- SWE-Agent (Princeton 2024) — open-source SWE benchmark champion
- OpenHands (formerly OpenDevin)
- Agentless / AgentCoder / AutoCodeRover — research baselines
你的工作 positioning¶
- 你 production system 的 L1-L6 LLM agent stack 是 L3 层的 finance-specific instantiation
- L3 的核心技术(ReAct / Reflexion / 工具调用 / memory)你都已在 production 用过——这件事在汇报里可以一句话带过
- 不要把汇报放在 L3——这是和 vision/coding agents 共享的层,你的 differentiator 在 L1(research agent in finance)
汇报引用¶
- 不在 main slide,但 Q&A 备答需要熟悉
- 韩教授可能问 ReAct / MCP / agent reliability 等问题——这些都是 L3 概念
L4 · Agentic AI / RL Foundations(agent 范式的方法学根基)¶
Scope: Agent 范式的数学和 RL 基础——比 LLM 早几十年。决策、规划、学习的形式化。
Key Surveys¶
| Title | Authors | Year | URL |
|---|---|---|---|
| The Landscape of Agentic Reinforcement Learning for LLMs | — | 2025-09 | 2509.02547 |
| A Survey of Frontiers in LLM Reasoning | — | 2025-04 | 2504.09037 |
| Logical Reasoning in LLMs: A Survey | — | 2025-02 | 2502.09100 |
Classic RL Foundations¶
- Sutton & Barto Reinforcement Learning: An Introduction (2018, 2nd ed) — bible
- Q-learning (Watkins 1989) / Policy gradient (Williams 1992)
- DQN (Mnih 2013 / Nature 2015) — first deep RL breakthrough
- AlphaGo (Silver 2016, Nature) → AlphaZero (2017) → MuZero (2019)
- PPO (Schulman 2017) — RLHF 主力算法
- Decision Transformer (Chen 2021) — sequence modeling for RL
- World models / Dreamer V1-V3 (Hafner 2019-2023) — model-based RL
LLM × RL 现代脉络¶
| Technique | Year | Key |
|---|---|---|
| RLHF | 2022 (InstructGPT) | Ouyang et al. |
| DPO (Direct Preference Optimization) | 2023 | Rafailov 2305.18290 |
| RLAIF | 2023 | Anthropic |
| Constitutional AI | 2022 | Anthropic 2212.08073 |
| DeepSeek-R1 (GRPO + RL on reasoning) | 2025 | DeepSeek |
| OpenAI o1 / o3 | 2024 / 2025 | inference-time RL / test-time compute |
| GFlowNet | 2021-ongoing | Bengio; structured policy learning |
你的工作 positioning¶
- 你的 walk-forward + adaptive state controller 在概念上接近 offline RL with conservative policy improvement
- Alpha search 中"探索 vs 利用"是经典 RL 问题
- 不直接 use RL(production 决策是 deterministic),但理论上可以把 RQ2 (open-world LLM safety) ground 在 offline RL safety guarantees 上
汇报引用¶
- Backup slide 用:被问"为什么不用 RL 做 alpha 搜索"时
- Conformal prediction × offline RL 的交叉是 RQ2 真正的方法学位置
L5 · Cognitive Architectures / AGI Frameworks(最外层 / 智识祖辈)¶
Scope: agent 范式的哲学和认知科学根基。一般不直接 cite,但理解它能让你 framing 更深。
Classic Cognitive Architectures¶
- SOAR (Newell 1990) — symbolic problem-solving, production rules
- ACT-R (Anderson 1976-) — cognitive psychology + computational model
- BDI (Bratman 1987) — Belief / Desire / Intention 框架
- Society of Mind (Minsky 1986)
- Global Workspace Theory (Baars 1988)
Modern AGI / Foundation Model Debates¶
| Topic | Key paper |
|---|---|
| Sparks of AGI | Bubeck 2023 2303.12712 — GPT-4 early试金石 |
| Foundation Models | Bommasani 2021 2108.07258 — Stanford CRFM |
| Generalist agents (Gato) | DeepMind 2022 |
| Embodied AGI debates | ongoing |
你的工作 positioning¶
- 不要在汇报里引这一层——离得太远
- 但心里要知道:你做的 production trading agent 已经实现了一个 simplified BDI 架构(Belief = market state / Desire = user trader intent / Intention = schema-validated order)。这是 L5 层的实际落地。
汇报引用¶
- 无。Q&A 极端深问时("AGI 的边界在哪")可以提一句
总结:从同心圆看你的 strategic positioning¶
你的工作的层级 anchoring¶
L0 alpha auto search ← 你做的具体事
↑ anchor here for technical depth
L1 autonomous research agents ← 你的 academic argument 在这一层
↑ anchor here for thesis framing
L2 AI for Science / AI for Math ← 你的 intellectual lineage
↑ cite here for inspiration / pattern
L3 LLM-Based Agents ← 你的 tooling layer
↑ Q&A 备答
L4 Agentic AI / RL ← 方法学根基
↑ backup slide
L5 Cognitive Architectures ← 哲学背景,一般不显式 cite
HKU 汇报 slide-by-level 建议¶
| Slide | 适合的 layer |
|---|---|
| Slide 1 Title | L1 framing ("autonomous research agents in finance") |
| Slide 2 Thesis | L1 + L0 |
| Slide 4 What I Built | L0 + L3(生产 stack 用了 L3 技术) |
| Slide 6 Negative Result | L0 |
| Slide 7 Frontier Pattern | L1-L5 ladder + 4 mode pattern(替换原 "4 common patterns") |
| Slide 8 Gap | L1 (Chen 2026 6 open problems) + L0 (alpha-specific) |
| Slide 9-10 Crypto-Alpha-Bench | L1 (autonomous research bench) + L2 (AI for Science verifier philosophy) |
| Slide 11 RQs | L1 (RQ2/RQ3) + L2 (RQ1 architectural prior from time series) |
| Slide 12 Ask | L1 (which use case for first paper) |
一句话 elevator pitch(嵌套 framing 版)¶
"I built a production-grade verification testbed for alpha auto search (L0), which is an instantiation of autonomous research agents (L1) in the finance domain. My work inherits the verifiable-novelty success pattern from FunSearch / AI for Science (L2), and the engineering stack from LLM-based agents (L3). The contribution is Crypto-Alpha-Bench——the field's first research-agent benchmark in finance, addressing 6 open problems identified by Chen 2026."
这句话覆盖 4 个 layer,30 秒可念完,被任何老师打断都能优雅退出。
推荐进一步阅读(按优先级)¶
高 priority(汇报必备): - Chen 2026 "From Copilots to Colleagues" — L1 锚定 - Zhang 2025 "Deep Research Survey" arXiv 2508.12752 — L1 互补
中 priority(research agenda 深化): - Wang 2023 "LLM-Based Autonomous Agents" arXiv 2308.11432 — L3 经典综述 - "Agentic RL for LLMs" arXiv 2509.02547 — L4 现代版
低 priority(背景了解): - Sutton & Barto RL bible — L4 经典 - Bubeck 2023 "Sparks of AGI" — L5 文化背景
End of landscape document.
这份 doc 不是为了汇报现场用,是为了你自己心里有清晰的 nested framing——这样无论老师从 L0 到 L5 哪一层提问,你都知道在哪个层级回应、引用哪一个 anchor paper、向哪个相邻层做 transition。