Financial Agent → General Agent · Technical Landscape¶
从 Financial Agent(最具体)一层层 zoom out 到 General Agent(最通用)的技术全景图。每一层强调技术原理 + 算法 + 架构模式,论文只作 anchor。
6 层同心圆 · L0 → L5 · Maintained by Paul Weng · 2026-05-19
同心圆结构¶
L5 · Algorithmic Foundations (RL · Search · Optimization · Probabilistic)
└─ L4 · Foundation Model Tech Stack (Pretrain · Post-train · Test-time compute)
└─ L3 · LLM Agent Patterns (Reasoning · Planning · Tool · Memory · Multi-agent)
└─ L2 · Production Agent Systems (Coding · Computer use · Deep research)
└─ L1 · Domain Research Agents (Science · Math · Chemistry · Biology)
└─ L0 · Financial Agents (Trading · Alpha mining · Quant R&D · Risk)
Reading 顺序建议:
- 自下而上(L0→L5):你做的具体事如何 inherit 上游 tech
- 自上而下(L5→L0):foundation 技术如何 propagate 到 finance application
- 横切单层:理解 sibling techniques 的 trade-off
L0 · Financial Agents(最内层)¶
Scope¶
任何用 LLM/agent paradigm 处理金融领域 task 的系统。和 L1 的区别:domain 特定的 cost model、execution constraint、regulatory boundary 是 first-class concern。
L0 内部细分(5 个 sub-class)¶
L0.1 · Customer-Facing Financial Agents¶
Tech primitive:retrieval + 合规检查 + multi-turn dialogue + KYC integration
| System | 核心技术 |
|---|---|
| BloombergGPT (2023) | 50B params, finance-corpus pretraining |
| FinGPT (open-source) | LoRA fine-tune 的金融 chatbot |
| Klarna AI agent | Customer service 自动化 |
| 银行 robo-advisor | 风险问卷 + portfolio construction |
核心难点:regulatory disclosure、不允许 hallucinate 投资建议、需要 audit trail。
L0.2 · Trading-Decision Agents¶
Tech primitive:multi-modal signal fusion(news + price + tape)+ reasoning + action selection
| System | 核心技术 | arXiv |
|---|---|---|
| TradingAgents | Multi-agent debate(Fundamental / Sentiment / Technical analyst + Bull/Bear + Manager + Trader + Risk) | 2412.20138 |
| QuantAgent | Short-horizon HFT-style structured-signal decision | — |
| FinMem | Memory-augmented LLM trading(episodic + semantic + working memory) | 2311.13743 |
| FinAgent | Multimodal foundation trading agent | 2402.18485 |
核心难点:action space 大、reward latency 高、market 是 partially observable、adversarial。
L0.3 · Alpha Mining / Factor Discovery Agents(你的工作主战场)¶
Tech primitive:symbolic search / LLM generation + statistical verification + multi-testing correction
| System | 搜索方法 | Generator | Verifier |
|---|---|---|---|
| gplearn / gpquant | Genetic Programming | symbolic mutation | IC-based fitness |
| AutoAlpha | Hierarchical EA | — | walk-forward |
| AlphaAgent | LLM-driven | LLM with regularized exploration | multi-regime backtest |
| Alpha Jungle | LLM + MCTS | LLM prior on tree | backtest reward |
| QuantaAlpha | LLM + evolution | self-evolving trajectory | IC + return metrics |
| AlphaSAGE | GFlowNet | structured policy | multi-faceted reward |
| AlphaPROBE | GNN + on-graph evolution | retrieval-augmented LLM | walk-forward |
Tech depth note: AlphaSAGE 用 GFlowNet 是关键创新——和 RL 相比,GFlowNet 学的是和 reward 成比例的 distribution,能 explore multiple modes 而不只是 argmax,对 alpha mining 这种"长尾多解"问题更适合。
L0.4 · Quant R&D Agents(full workflow)¶
Tech primitive:hypothesis generation + code synthesis + experiment execution + feedback iteration
| System | Workflow | 创新点 |
|---|---|---|
| RD-Agent(Q) (Microsoft) | Research → Development (Co-STEER code gen) → Feedback → MAB scheduler | Multi-armed bandit 选研究方向 |
| Beyond Prompting | Autonomous systematic factor investing | OOS validation + economic rationale |
| MLR-Copilot | Idea / experiment / analysis / writing | 把 ML research 拆成 4 阶段 |
Tech depth: MAB scheduler 处理 explore-exploit—决定下一轮该做的是 "deepen a promising direction" 还是 "try new hypothesis"。
L0.5 · Risk / Compliance / Safety Agents¶
Tech primitive:rule engine + LLM explanation + audit trail
- Hubble — safe/reproducible LLM alpha discovery,AST sandbox + cross-sectional metrics
- FactorMiner — self-evolving with redundancy/correlation control
- CogAlpha — LLM-driven code evolution with robustness checks
核心 design principle: LLM never on irreversible action path——你 production system 体现的就是这条。
L0 → L1 transition¶
Financial agents 是 domain research agents 的特殊情况,但带 3 个独有约束: 1. Statistical verifier instead of mechanical — 没有 Lean 内核;只有 PBO/DSR/walk-forward 2. Adversarial environment — 不像化学/生物的物理常数,market 会反过来 game your strategy 3. Cost/fill realism — 学术 RL agents 假设 free execution;金融 agent 不能
L1 · Domain Research Agents¶
Scope¶
LLM-powered agents 自主完成 domain-specific 研究——hypothesis → experiment → analysis。FROM Chen 2026 survey:当前 frontier 在 L4 autonomy(task-bounded autonomy),L5(self-directed agenda)仍 aspirational。
Sub-domain technical patterns¶
| Domain | Key systems | Verifier 类型 | 独特技术 |
|---|---|---|---|
| ML research | AI Scientist v1/v2 (Sakana), MLR-Copilot, ADAS | training loss / accuracy | Agentic Tree Search (v2) |
| Architecture discovery | ASI-ARCH (GAIR 2025) | benchmark loss | Cognition Base + Researcher/Engineer/Analyst |
| Math / algorithms | FunSearch (DeepMind), AlphaProof, AlphaEvolve, AlphaGeometry 2 | Lean kernel / 程序级 | Island-based evolution, neuro-symbolic |
| Chemistry | Coscientist (CMU/Emerald), ChemCrow (EPFL) | wet-lab execution / molecular sim | Tool integration (synthesis robots, RDKit) |
| Biology | BioPlanner, MedAgents | clinical evidence / structural sim | Domain-knowledge grounding |
| Materials | GNoME (DeepMind 2023, Nature) | DFT calculations | GNN + active learning, 2.2M new materials |
4 Common Architectural Patterns(Chen 2026 综述)¶
- Single-agent loop (e.g., AI Scientist v1) — 一个 LLM 反复 plan-act-reflect
- Multi-agent (e.g., MetaGPT, AutoGen) — role specialization
- Hierarchical (e.g., ChatDev, RD-Agent) — supervisor 协调下层
- Tool-augmented (e.g., ChemCrow) — agent 主要价值在工具编排
6 Open Problems(直接对接 L0 alpha search 痛点)¶
| Open problem | Alpha 中的对应 |
|---|---|
| Cognitive loop trap | 反复 overfit 同一 regime |
| Context window limits | 多 symbol × long history 装不下 |
| Novelty evaluation | Alpha decay + lookahead bias |
| Reproducibility | Walk-forward std 太大 |
| Safety / dual-use | Trading agent 直接接 OMS |
| Cost | $100-1000/research campaign |
L1 → L2 transition¶
Domain research agents 是 production agents 的子集,但输出是 paper / report / artifact,不是 customer-facing service。
L2 · Production Agent Systems¶
Scope¶
真正部署在用户生产环境的 agent。与 L1 区别:reliability、latency、cost 是 first-class metric;correctness 不是 in-domain academic sense 而是 user-perceived。
Major Sub-classes¶
L2.1 · Coding Agents(最成熟的 production category)¶
| System | Frontier metric | 技术亮点 |
|---|---|---|
| Devin (Cognition Labs 2024) | First production "AI software engineer" | Long-horizon planning + browser + IDE control |
| Claude Code (Anthropic 2024) | 72% SWE-bench Verified | Codebase-aware + tool use |
| Cursor / Windsurf / Aider | IDE-integrated | Diff-based edits, multi-file edits |
| SWE-Agent (Princeton) | Open-source bench champion | Agent-Computer Interface (ACI) |
| OpenHands (ex-OpenDevin) | Open framework | Multi-runtime backends |
| Agentless | Anti-thesis to agent loops | No iteration, single-shot |
Tech depth: SWE-Agent 的 ACI 概念关键——为 agent 设计专门的"text-based UI"而不是直接给 raw terminal,能显著降 hallucination。
L2.2 · Computer-Use / Browser Agents¶
| System | Year | 模态 |
|---|---|---|
| Anthropic Computer Use | 2024 | Screenshot + mouse/keyboard |
| OpenAI Operator | 2025 | Browser-based VL agent |
| Google Project Mariner | 2024 | Chrome integration |
| WebArena / VisualWebArena | Benchmarks | Real-world web tasks |
核心难点: Visual grounding (where on screen), action precision, multi-step state tracking.
L2.3 · Deep Research Agents(参考 Zhang 2025 综述)¶
4-stage pipeline:planning → question developing → web exploration → report generation
| System | 来源 |
|---|---|
| GPT Deep Research | OpenAI 2024 |
| Perplexity Pro Deep Research | 2025 |
| Google Deep Research (Gemini) | 2024 |
| Tongyi DeepResearch | Alibaba 2025 |
| OpenSeeker / OpenSeeker-v2 | GAIR open-source |
| STORM | Stanford |
| AgentRxiv | Collaborative autonomous research |
Tech depth: OpenSeeker v2 用 1.17 万合成 trajectory + 简单 SFT 击败用 CPT+SFT+RL 的工业级模型——证明 trajectory quality > training pipeline complexity。
L2.4 · Business Automation Agents¶
- Salesforce Agentforce / Microsoft Copilot Studio / Google Agentspace
- 重点 task: customer service / ERP workflow / supply chain
- 技术亮点: SOP-based orchestration + permission graphs + human-in-loop
L2 → L3 transition¶
Production agents 全部建立在 L3 的 reusable patterns 上。L3 是"agent 编程语言",L2 是"用这些语言写的 application"。
L3 · LLM Agent Patterns & Frameworks(Agent 通用技术核心)¶
Scope¶
不依赖具体 domain / application 的 agent 通用 building block。任何 L0-L2 系统都是这些 pattern 的组合。
3.1 Reasoning Patterns(LLM 的"思考方式")¶
| Pattern | 一句话 | Origin |
|---|---|---|
| CoT (Chain of Thought) | "Let's think step by step" | Wei 2022 |
| Few-shot CoT | 给 example 引导 | Wei 2022 |
| Self-Consistency | 多次采样 + 投票 | Wang 2022 |
| ToT (Tree of Thoughts) | 树搜索 + state value | Yao 2023 |
| GoT (Graph of Thoughts) | 任意 DAG 的 thought 拓扑 | Besta 2023 |
| LATS (Language Agent Tree Search) | ToT + MCTS-style action search | Zhou 2023 |
| Reflexion | Verbal RL,把 失败 reflection 当下次的 prompt | Shinn 2023 |
| Self-Refine | 自我评估 + 自我改进的迭代 | Madaan 2023 |
| CoT with Code (PAL) | 用代码作为 reasoning trace | Gao 2022 |
| ReWOO | Plan 一次性出,避免多 round LLM call | Xu 2023 |
3.2 Planning Patterns¶
| Pattern | 适用场景 |
|---|---|
| ReAct (Reasoning + Acting 交错) | 通用 tool-using agent |
| Plan-and-Execute | Long-horizon, predictable subgoal |
| Plan-and-Solve | 数学 / code |
| HTN + LLM | 已有 task hierarchy 时 |
| PDDL + LLM | LLM 把自然语言转 PDDL,经典 planner 求解 |
| MCTS + LLM | LLM 提供 prior,MCTS 处理 exploration(Alpha Jungle 模板) |
3.3 Tool Use & Integration¶
| Tech | 含义 |
|---|---|
| Function calling | OpenAI 2023 引入;structured JSON 输出 |
| Toolformer | Self-supervised 学怎么用工具 |
| MCP (Model Context Protocol) | Anthropic 2024,agent-tool 通信标准 |
| A2A (Agent-to-Agent) | Google 2025,跨 vendor agent 通信 |
| ACI (Agent-Computer Interface) | SWE-Agent 概念,专为 agent 设计的 abstract layer |
| agents.txt / ARDP | Agent discovery 协议(早期) |
3.4 Memory Architectures¶
| Memory type | 实现 |
|---|---|
| Working | Current context window |
| Episodic | 过往 interactions 记录 |
| Semantic | 抽象 facts / 知识 |
| Procedural | Learned skills / tools |
代表系统: - MemGPT (Berkeley 2023) — OS-like memory hierarchy with paging - Voyager (NVIDIA 2023) — skill library that grows - A-Mem / MemoryBank — long-term memory variants - Generative Agents (Stanford 2023) — Park et al. Smallville simulation
3.5 Multi-Agent Frameworks¶
| Framework | Pattern | Origin |
|---|---|---|
| CAMEL | Role-play conversation | KAUST 2023 |
| AutoGen | Group chat + 自定义 conversation pattern | Microsoft 2023 |
| MetaGPT | SOP(standard operating procedure)-based | DeepWisdom 2023 |
| ChatDev | 软件公司 metaphor | THU 2023 |
| AgentVerse | Multi-agent simulation env | THU 2023 |
| LangGraph | Graph-based orchestration | LangChain 2024 |
| CrewAI | Crew metaphor,role + task | 开源 |
| Swarm | Lightweight handoff pattern | OpenAI 2024 |
3.6 Orchestration Topologies¶
Supervisor-Worker ┌─Supervisor─┐
│ │
▼ ▼
Worker A Worker B
Hierarchical Manager → Lead A → Worker A1, A2
→ Lead B → Worker B1, B2
Graph 任意 DAG,节点是 agent,边是 message
Pipeline A → B → C → D(无 cycle)
Peer / Debate N agents 平等讨论
L3 → L4 transition¶
L3 patterns 全部依赖底层 LLM 的能力。L4 是这些 pattern 能 work 的前提——没有 long context / reliable instruction following / strong reasoning,所有 L3 模式都崩。
L4 · Foundation Model Tech Stack(LLM 自身的技术栈)¶
Scope¶
foundation model 的训练、推理、扩展技术。这一层进展每 3-6 个月翻一轮。
4.1 Pretraining¶
| Tech | Key papers |
|---|---|
| Transformer | Vaswani 2017 |
| Scaling laws | Kaplan 2020, Chinchilla (Hoffmann 2022) |
| MoE (Mixture of Experts) | GShard / Switch / DeepSeek-V3 / Mixtral |
| Long context | RoPE / ALiBi / YaRN / NTK-aware scaling |
| Hybrid attention | DeepSeek-V4 (CSA + HCA), Mamba + Transformer |
| Linear attention | Linformer / Performer / Mamba / RWKV / ASI-ARCH 系列 |
| Multi-token prediction | Better data efficiency (DeepSeek-V3) |
4.2 Post-Training(Alignment & Capability)¶
| Tech | 一句话 |
|---|---|
| SFT (Supervised Fine-Tuning) | 给 demonstration 数据微调 |
| RLHF (RL from Human Feedback) | InstructGPT 2022 |
| DPO (Direct Preference Optimization) | Rafailov 2023; offline,免 reward model |
| GRPO (Group Relative Policy Optimization) | DeepSeek 2024; PPO 简化版 |
| ORPO / KTO / SimPO | DPO 变体 |
| RLAIF (RL from AI Feedback) | Anthropic 2023 |
| Constitutional AI | 用 principles 替代人类 labeler |
| Self-play / Self-rewarding | 模型自己生成 + 自己评分 |
4.3 Test-Time Compute(推理时算力扩展)¶
| Tech | Origin |
|---|---|
| o1 / o3 style reasoning | OpenAI 2024-2025; 推理时 hidden CoT |
| DeepSeek-R1 | 2025; GRPO + 推理 trace |
| Gemini Thinking | Google 2024 |
| Best-of-N sampling | 多次采样 + verifier 选 |
| Self-Consistency | 投票 |
| Process Reward Models | 评分中间步骤而不只是最终答案 |
| Speculative decoding | 推理加速 |
4.4 Architecture Innovations (近 18 个月)¶
| Innovation | 系统 |
|---|---|
| Mamba / SSM | Selective state space models, sub-quadratic |
| DeepSeek MLA (Multi-head Latent Attention) | KV cache 压缩 |
| DeepSeek mHC (Manifold-Constrained Hyper-Connections) | V4,超大模型训练稳定性 |
| Muon optimizer | DeepSeek V4 |
| CSA + HCA | DeepSeek V4, 1M context 下 FLOPs 减 73% |
| Engram memory | DeepSeek V4 探索 |
4.5 Open vs Closed Frontier (2026-05 状态)¶
| Tier | Closed | Open |
|---|---|---|
| Reasoning frontier | OpenAI o3, Gemini 2.5 Pro Thinking, Claude Opus 4.6 | DeepSeek-R1 / R2, Qwen3-Reasoner |
| Coding frontier | Claude Sonnet 4.6, Opus 4.6 | DeepSeek-V4-Pro (80.6% SWE-bench), Qwen3-Coder |
| Multimodal frontier | GPT-5 / Gemini 3 / Claude 4.6 | Qwen-VL / DeepSeek-VL |
| Long context | Gemini (1M+), Claude (200K+) | DeepSeek-V4-Pro (1M), Qwen3 (1M) |
L4 → L5 transition¶
Foundation model 的所有 post-training 技术本质都是 L5 算法的应用(RLHF = policy gradient + reward model;test-time compute = MCTS / search)。
L5 · Algorithmic Foundations(最外层 / 理论根基)¶
Scope¶
不依赖 LLM 的经典 AI / ML 算法。Agent 范式的真正"地基"。
5.1 Reinforcement Learning¶
| Family | Algorithms |
|---|---|
| Value-based | Q-learning, DQN, Rainbow DQN |
| Policy gradient | REINFORCE, A2C, A3C |
| Actor-critic | PPO, TRPO, SAC, IMPALA |
| Model-based | Dyna-Q, World Models, Dreamer V1-V3 |
| Hierarchical RL | Options framework, Feudal RL |
| Multi-agent RL | MADDPG, QMIX, COMA |
| Offline RL | CQL, IQL, AWAC, Decision Transformer |
| Inverse RL / Imitation | GAIL, BC, AIRL |
LLM-relevant: PPO (RLHF 主力), GRPO (DeepSeek), DPO (offline preference)
5.2 Search & Planning¶
| Algorithm | Use |
|---|---|
| MCTS (Monte Carlo Tree Search) | AlphaGo, AlphaZero, MuZero, LATS |
| A* / Best-first | Pathfinding, classical planning |
| Beam search | NLP decoding, LLM 生成 |
| Evolutionary search | GA, ES, CMA-ES, NES |
| Bayesian Optimization | Hyperparameter, expensive black-box |
| Genetic Programming | gplearn, FunSearch family |
| GFlowNet | Bengio 2021+; AlphaSAGE uses this |
5.3 Probabilistic Inference¶
| Tech | Use |
|---|---|
| Variational inference | VAE, BNN |
| MCMC (Markov Chain MC) | Bayesian posterior |
| Conformal Prediction | Distribution-free uncertainty——你 RQ2 的核心方法学 |
| Importance sampling | Off-policy correction |
| Particle filters | Sequential Monte Carlo |
5.4 Optimization¶
| Family | Algorithm |
|---|---|
| Gradient-based | SGD, Adam, AdamW, Muon (新), Lion |
| Second-order | Newton, LBFGS, K-FAC |
| Zeroth-order | ES, CMA-ES, FD |
| Combinatorial | DP, ILP, LP relaxation |
| Convex | SDP, QP, projected gradient |
5.5 Game Theory(multi-agent foundation)¶
- Nash equilibrium, Stackelberg, mechanism design
- CFR (Counterfactual Regret Minimization) — Poker AI
- Self-play, fictitious play
- Markov games / stochastic games
L5 → 边界¶
再往外是 cognitive science / 神经科学 / 数学逻辑 / 复杂性理论——一般不在 ML 论文里 cite 但是智识祖辈。
跨层关系图:你的工作如何 trace 到每一层¶
你的 production system + Crypto-Alpha-Bench
│
L0 · Financial Agent
├─ L0.3 alpha mining ← 你 RQ1 主战场
├─ L0.4 quant R&D ← 你 RQ3 (Cognition Base)
└─ L0.5 risk/safety ← 你 production verifier + RQ2
│
L1 · Domain Research Agent
├─ Pattern: generator-verifier separation ← 你 thesis 核心
└─ FunSearch verifiable-novelty model ← Crypto-Alpha-Bench 直接 inherit
│
L2 · Production Agent System
├─ Coding agent stack ← Claude Code 等是你 development tool
└─ Deep Research pipeline ← 你 RQ3 Researcher Agent 的 4 阶段对应
│
L3 · LLM Agent Patterns
├─ ReAct / Reflexion ← 你 L1-L6 LLM stack 用过
├─ Tool use / MCP ← 你 production 必备
└─ Multi-agent debate ← 未来 Researcher/Engineer/Analyst 分工
│
L4 · Foundation Model
├─ DeepSeek-V4-Pro / R1 ← 可作为 Crypto-Alpha-Bench Tier 2 LLM substrate baseline
└─ Long context (1M) ← 让 Cognition Base + Researcher 单 prompt 成为可能
│
L5 · Algorithmic Foundations
├─ Walk-forward / PBO / DSR ← 你 verifier 数学根基(probabilistic + statistical)
├─ Conformal prediction ← 你 RQ2 的核心方法学
└─ MCTS / GFlowNet ← 未来如果加 search 层会用到
技术决策树:当你设计 alpha agent 时,每一层都要选¶
| 层 | 决策 | 你的选择 |
|---|---|---|
| L5 算法 | 用什么 search?RL/MCTS/GFlowNet/GP? | GP + walk-forward 已用,未来加 LLM + MCTS |
| L4 FM | 哪个 LLM 当 substrate?开源 vs 闭源? | Claude / DeepSeek-V4 互补,cost-quality trade |
| L3 pattern | Single-agent 还是 multi-agent? | 当前 single (L1-L6 stack),未来 multi (Researcher/Engineer/Analyst) |
| L2 framework | LangGraph / AutoGen / 自研? | 自研(已在 production) |
| L1 paradigm | Generator-verifier 分离? | ✅ 已是核心 design |
| L0 application | Trading vs alpha mining vs R&D? | Alpha mining + quant R&D + risk/safety |
每个决策都和上下层耦合——这就是为什么 nested framing 重要。汇报里被问 "为什么不用 X" 时,你能回答 "因为在 L_n 层我选了 Y,X 在 L_n-1 层不兼容"。
给 HKU 汇报的 framing 终极版(嵌套 6 层)¶
*"At L5, our work rests on the statistical verifier tradition——walk-forward, PBO, DSR——which gives us a finance-grade analog to AlphaProof's Lean kernel.
At L4, we are LLM-substrate agnostic: open-source DeepSeek-V4 or closed Claude both fit our generator-verifier interface.
At L3, we use a standard ReAct + multi-agent pattern (Researcher / Engineer / Analyst), familiar from ASI-ARCH and AI Scientist.
At L2, our production system is comparable in engineering rigor to Claude Code or Devin, but specialized for trading execution.
At L1, we identify as an autonomous research agent in finance, inheriting FunSearch's verifiable-novelty pattern.
At L0, we are alpha-mining + quant R&D + risk-gating, joined under a single verifier substrate——Crypto-Alpha-Bench."*
这个 framing 让任何 layer 的提问都能优雅回应。
End-of-document 标注¶
这份 doc 的写作风格刻意区别于 Chen 2026——Chen 2026 偏 paper-level 综述,这份偏 technique stack landscape。两者互补:
- 想了解谁做了什么——读 Chen 2026
- 想了解技术怎么组合——读这一份
- 想知道你工作的具体 anchor 在哪——读
agent_research_landscape.md(concentric L0-L5 中性视角)
作者: Paul Weng(用 Claude 作为 research collaborator 共同生成) Status: living document; will be refined as L4/L5 technologies evolve