Financial Agent → General Agent · Technical Landscape¶

从 Financial Agent（最具体）一层层 zoom out 到 General Agent（最通用）的技术全景图。每一层强调技术原理 + 算法 + 架构模式，论文只作 anchor。

6 层同心圆 · L0 → L5 · Maintained by Paul Weng · 2026-05-19

同心圆结构¶

L5 · Algorithmic Foundations              (RL · Search · Optimization · Probabilistic)
└─ L4 · Foundation Model Tech Stack       (Pretrain · Post-train · Test-time compute)
   └─ L3 · LLM Agent Patterns             (Reasoning · Planning · Tool · Memory · Multi-agent)
      └─ L2 · Production Agent Systems    (Coding · Computer use · Deep research)
         └─ L1 · Domain Research Agents   (Science · Math · Chemistry · Biology)
            └─ L0 · Financial Agents      (Trading · Alpha mining · Quant R&D · Risk)

Reading 顺序建议：

自下而上（L0→L5）：你做的具体事如何 inherit 上游 tech
自上而下（L5→L0）：foundation 技术如何 propagate 到 finance application
横切单层：理解 sibling techniques 的 trade-off

L0 · Financial Agents（最内层）¶

Scope. 任何用 LLM/agent paradigm 处理金融领域 task 的系统。和 L1 的区别：domain 特定的 cost model、execution constraint、regulatory boundary 是 first-class concern。

L0 内部细分（5 个 sub-class）¶

L0.1 · Customer-Facing Financial Agents¶

Tech primitive：retrieval + 合规检查 + multi-turn dialogue + KYC integration

System	核心技术
BloombergGPT (2023)	50B params, finance-corpus pretraining
FinGPT (open-source)	LoRA fine-tune 的金融 chatbot
Klarna AI agent	Customer service 自动化
银行 robo-advisor	风险问卷 + portfolio construction

核心难点：regulatory disclosure、不允许 hallucinate 投资建议、需要 audit trail。

L0.2 · Trading-Decision Agents¶

Tech primitive：multi-modal signal fusion（news + price + tape）+ reasoning + action selection

System	核心技术	arXiv
TradingAgents	Multi-agent debate（Fundamental / Sentiment / Technical analyst + Bull/Bear + Manager + Trader + Risk）	2412.20138
QuantAgent	Short-horizon HFT-style structured-signal decision	—
FinMem	Memory-augmented LLM trading（episodic + semantic + working memory）	2311.13743
FinAgent	Multimodal foundation trading agent	2402.18485

核心难点：action space 大、reward latency 高、market 是 partially observable、adversarial。

L0.3 · Alpha Mining / Factor Discovery Agents（你的工作主战场）¶

Tech primitive：symbolic search / LLM generation + statistical verification + multi-testing correction

System	搜索方法	Generator	Verifier
gplearn / gpquant	Genetic Programming	symbolic mutation	IC-based fitness
AutoAlpha	Hierarchical EA	—	walk-forward
AlphaAgent	LLM-driven	LLM with regularized exploration	multi-regime backtest
Alpha Jungle	LLM + MCTS	LLM prior on tree	backtest reward
QuantaAlpha	LLM + evolution	self-evolving trajectory	IC + return metrics
AlphaSAGE	GFlowNet	structured policy	multi-faceted reward
AlphaPROBE	GNN + on-graph evolution	retrieval-augmented LLM	walk-forward

Tech depth note: AlphaSAGE 用 GFlowNet 是关键创新——和 RL 相比，GFlowNet 学的是和 reward 成比例的 distribution，能 explore multiple modes 而不只是 argmax，对 alpha mining 这种"长尾多解"问题更适合。

L0.4 · Quant R&D Agents（full workflow）¶

Tech primitive：hypothesis generation + code synthesis + experiment execution + feedback iteration

System	Workflow	创新点
RD-Agent(Q) (Microsoft)	Research → Development (Co-STEER code gen) → Feedback → MAB scheduler	Multi-armed bandit 选研究方向
Beyond Prompting	Autonomous systematic factor investing	OOS validation + economic rationale
MLR-Copilot	Idea / experiment / analysis / writing	把 ML research 拆成 4 阶段

Tech depth: MAB scheduler 处理 explore-exploit—决定下一轮该做的是 "deepen a promising direction" 还是 "try new hypothesis"。

L0.5 · Risk / Compliance / Safety Agents¶

Tech primitive：rule engine + LLM explanation + audit trail

Hubble — safe/reproducible LLM alpha discovery，AST sandbox + cross-sectional metrics
FactorMiner — self-evolving with redundancy/correlation control
CogAlpha — LLM-driven code evolution with robustness checks

核心 design principle: LLM never on irreversible action path——你 production system 体现的就是这条。

L0 → L1 transition. Financial agents 是 domain research agents 的特殊情况，但带 3 个独有约束： 1. Statistical verifier instead of mechanical — 没有 Lean 内核；只有 PBO/DSR/walk-forward 2. Adversarial environment — 不像化学/生物的物理常数，market 会反过来 game your strategy 3. Cost/fill realism — 学术 RL agents 假设 free execution；金融 agent 不能

L1 · Domain Research Agents¶

Scope. LLM-powered agents 自主完成 domain-specific 研究——hypothesis → experiment → analysis。FROM Chen 2026 survey：当前 frontier 在 L4 autonomy（task-bounded autonomy），L5（self-directed agenda）仍 aspirational。

Sub-domain technical patterns¶

Domain	Key systems	Verifier 类型	独特技术
ML research	AI Scientist v1/v2 (Sakana), MLR-Copilot, ADAS	training loss / accuracy	Agentic Tree Search (v2)
Architecture discovery	ASI-ARCH (GAIR 2025)	benchmark loss	Cognition Base + Researcher/Engineer/Analyst
Math / algorithms	FunSearch (DeepMind), AlphaProof, AlphaEvolve, AlphaGeometry 2	Lean kernel / 程序级	Island-based evolution, neuro-symbolic
Chemistry	Coscientist (CMU/Emerald), ChemCrow (EPFL)	wet-lab execution / molecular sim	Tool integration (synthesis robots, RDKit)
Biology	BioPlanner, MedAgents	clinical evidence / structural sim	Domain-knowledge grounding
Materials	GNoME (DeepMind 2023, Nature)	DFT calculations	GNN + active learning, 2.2M new materials

4 Common Architectural Patterns（Chen 2026 综述）¶

Single-agent loop (e.g., AI Scientist v1) — 一个 LLM 反复 plan-act-reflect
Multi-agent (e.g., MetaGPT, AutoGen) — role specialization
Hierarchical (e.g., ChatDev, RD-Agent) — supervisor 协调下层
Tool-augmented (e.g., ChemCrow) — agent 主要价值在工具编排

6 Open Problems（直接对接 L0 alpha search 痛点）¶

Open problem	Alpha 中的对应
Cognitive loop trap	反复 overfit 同一 regime
Context window limits	多 symbol × long history 装不下
Novelty evaluation	Alpha decay + lookahead bias
Reproducibility	Walk-forward std 太大
Safety / dual-use	Trading agent 直接接 OMS
Cost	$100-1000/research campaign

L1 → L2 transition. Domain research agents 是 production agents 的子集，但输出是 paper / report / artifact，不是 customer-facing service。

L2 · Production Agent Systems¶

Scope. 真正部署在用户生产环境的 agent。与 L1 区别：reliability、latency、cost 是 first-class metric；correctness 不是 in-domain academic sense 而是 user-perceived。

Major Sub-classes¶

L2.1 · Coding Agents（最成熟的 production category）¶

System	Frontier metric	技术亮点
Devin (Cognition Labs 2024)	First production "AI software engineer"	Long-horizon planning + browser + IDE control
Claude Code (Anthropic 2024)	72% SWE-bench Verified	Codebase-aware + tool use
Cursor / Windsurf / Aider	IDE-integrated	Diff-based edits, multi-file edits
SWE-Agent (Princeton)	Open-source bench champion	Agent-Computer Interface (ACI)
OpenHands (ex-OpenDevin)	Open framework	Multi-runtime backends
Agentless	Anti-thesis to agent loops	No iteration, single-shot

Tech depth: SWE-Agent 的 ACI 概念关键——为 agent 设计专门的"text-based UI"而不是直接给 raw terminal，能显著降 hallucination。

L2.2 · Computer-Use / Browser Agents¶

System	Year	模态
Anthropic Computer Use	2024	Screenshot + mouse/keyboard
OpenAI Operator	2025	Browser-based VL agent
Google Project Mariner	2024	Chrome integration
WebArena / VisualWebArena	Benchmarks	Real-world web tasks

核心难点: Visual grounding (where on screen), action precision, multi-step state tracking.

L2.3 · Deep Research Agents（参考 Zhang 2025 综述）¶

4-stage pipeline：planning → question developing → web exploration → report generation

System	来源
GPT Deep Research	OpenAI 2024
Perplexity Pro Deep Research	2025
Google Deep Research (Gemini)	2024
Tongyi DeepResearch	Alibaba 2025
OpenSeeker / OpenSeeker-v2	GAIR open-source
STORM	Stanford
AgentRxiv	Collaborative autonomous research

Tech depth: OpenSeeker v2 用 1.17 万合成 trajectory + 简单 SFT 击败用 CPT+SFT+RL 的工业级模型——证明 trajectory quality > training pipeline complexity。

L2.4 · Business Automation Agents¶

Salesforce Agentforce / Microsoft Copilot Studio / Google Agentspace
重点 task: customer service / ERP workflow / supply chain
技术亮点: SOP-based orchestration + permission graphs + human-in-loop

L2 → L3 transition. Production agents 全部建立在 L3 的 reusable patterns 上。L3 是"agent 编程语言"，L2 是"用这些语言写的 application"。

L3 · LLM Agent Patterns & Frameworks（Agent 通用技术核心）¶

Scope. 不依赖具体 domain / application 的 agent 通用 building block。任何 L0-L2 系统都是这些 pattern 的组合。

3.1 Reasoning Patterns（LLM 的"思考方式"）¶

Pattern	一句话	Origin
CoT (Chain of Thought)	"Let's think step by step"	Wei 2022
Few-shot CoT	给 example 引导	Wei 2022
Self-Consistency	多次采样 + 投票	Wang 2022
ToT (Tree of Thoughts)	树搜索 + state value	Yao 2023
GoT (Graph of Thoughts)	任意 DAG 的 thought 拓扑	Besta 2023
LATS (Language Agent Tree Search)	ToT + MCTS-style action search	Zhou 2023
Reflexion	Verbal RL，把失败 reflection 当下次的 prompt	Shinn 2023
Self-Refine	自我评估 + 自我改进的迭代	Madaan 2023
CoT with Code (PAL)	用代码作为 reasoning trace	Gao 2022
ReWOO	Plan 一次性出，避免多 round LLM call	Xu 2023

3.2 Planning Patterns¶

Pattern	适用场景
ReAct (Reasoning + Acting 交错)	通用 tool-using agent
Plan-and-Execute	Long-horizon, predictable subgoal
Plan-and-Solve	数学 / code
HTN + LLM	已有 task hierarchy 时
PDDL + LLM	LLM 把自然语言转 PDDL，经典 planner 求解
MCTS + LLM	LLM 提供 prior，MCTS 处理 exploration（Alpha Jungle 模板）

3.3 Tool Use & Integration¶

Tech	含义
Function calling	OpenAI 2023 引入；structured JSON 输出
Toolformer	Self-supervised 学怎么用工具
MCP (Model Context Protocol)	Anthropic 2024，agent-tool 通信标准
A2A (Agent-to-Agent)	Google 2025，跨 vendor agent 通信
ACI (Agent-Computer Interface)	SWE-Agent 概念，专为 agent 设计的 abstract layer
agents.txt / ARDP	Agent discovery 协议（早期）

3.4 Memory Architectures¶

Memory type	实现
Working	Current context window
Episodic	过往 interactions 记录
Semantic	抽象 facts / 知识
Procedural	Learned skills / tools

代表系统： - MemGPT (Berkeley 2023) — OS-like memory hierarchy with paging - Voyager (NVIDIA 2023) — skill library that grows - A-Mem / MemoryBank — long-term memory variants - Generative Agents (Stanford 2023) — Park et al. Smallville simulation

3.5 Multi-Agent Frameworks¶

Framework	Pattern	Origin
CAMEL	Role-play conversation	KAUST 2023
AutoGen	Group chat + 自定义 conversation pattern	Microsoft 2023
MetaGPT	SOP（standard operating procedure）-based	DeepWisdom 2023
ChatDev	软件公司 metaphor	THU 2023
AgentVerse	Multi-agent simulation env	THU 2023
LangGraph	Graph-based orchestration	LangChain 2024
CrewAI	Crew metaphor，role + task	开源
Swarm	Lightweight handoff pattern	OpenAI 2024

3.6 Orchestration Topologies¶

Supervisor-Worker        ┌─Supervisor─┐
                         │            │
                         ▼            ▼
                       Worker A    Worker B

Hierarchical            Manager → Lead A → Worker A1, A2
                              → Lead B → Worker B1, B2

Graph                   任意 DAG，节点是 agent，边是 message

Pipeline                A → B → C → D（无 cycle）

Peer / Debate           N agents 平等讨论

L3 → L4 transition. L3 patterns 全部依赖底层 LLM 的能力。L4 是这些 pattern 能 work 的前提——没有 long context / reliable instruction following / strong reasoning，所有 L3 模式都崩。

L4 · Foundation Model Tech Stack（LLM 自身的技术栈）¶

Scope. foundation model 的训练、推理、扩展技术。这一层进展每 3-6 个月翻一轮。

4.1 Pretraining¶

Tech	Key papers
Transformer	Vaswani 2017
Scaling laws	Kaplan 2020, Chinchilla (Hoffmann 2022)
MoE (Mixture of Experts)	GShard / Switch / DeepSeek-V3 / Mixtral
Long context	RoPE / ALiBi / YaRN / NTK-aware scaling
Hybrid attention	DeepSeek-V4 (CSA + HCA), Mamba + Transformer
Linear attention	Linformer / Performer / Mamba / RWKV / ASI-ARCH 系列
Multi-token prediction	Better data efficiency (DeepSeek-V3)

4.2 Post-Training（Alignment & Capability）¶

Tech	一句话
SFT (Supervised Fine-Tuning)	给 demonstration 数据微调
RLHF (RL from Human Feedback)	InstructGPT 2022
DPO (Direct Preference Optimization)	Rafailov 2023; offline，免 reward model
GRPO (Group Relative Policy Optimization)	DeepSeek 2024; PPO 简化版
ORPO / KTO / SimPO	DPO 变体
RLAIF (RL from AI Feedback)	Anthropic 2023
Constitutional AI	用 principles 替代人类 labeler
Self-play / Self-rewarding	模型自己生成 + 自己评分

4.3 Test-Time Compute（推理时算力扩展）¶

Tech	Origin
o1 / o3 style reasoning	OpenAI 2024-2025; 推理时 hidden CoT
DeepSeek-R1	2025; GRPO + 推理 trace
Gemini Thinking	Google 2024
Best-of-N sampling	多次采样 + verifier 选
Self-Consistency	投票
Process Reward Models	评分中间步骤而不只是最终答案
Speculative decoding	推理加速

4.4 Architecture Innovations (近 18 个月)¶

Innovation	系统
Mamba / SSM	Selective state space models, sub-quadratic
DeepSeek MLA (Multi-head Latent Attention)	KV cache 压缩
DeepSeek mHC (Manifold-Constrained Hyper-Connections)	V4，超大模型训练稳定性
Muon optimizer	DeepSeek V4
CSA + HCA	DeepSeek V4, 1M context 下 FLOPs 减 73%
Engram memory	DeepSeek V4 探索

4.5 Open vs Closed Frontier (2026-05 状态)¶

Tier	Closed	Open
Reasoning frontier	OpenAI o3, Gemini 2.5 Pro Thinking, Claude Opus 4.6	DeepSeek-R1 / R2, Qwen3-Reasoner
Coding frontier	Claude Sonnet 4.6, Opus 4.6	DeepSeek-V4-Pro (80.6% SWE-bench), Qwen3-Coder
Multimodal frontier	GPT-5 / Gemini 3 / Claude 4.6	Qwen-VL / DeepSeek-VL
Long context	Gemini (1M+), Claude (200K+)	DeepSeek-V4-Pro (1M), Qwen3 (1M)

L4 → L5 transition. Foundation model 的所有 post-training 技术本质都是 L5 算法的应用（RLHF = policy gradient + reward model；test-time compute = MCTS / search）。

L5 · Algorithmic Foundations（最外层 / 理论根基）¶

Scope. 不依赖 LLM 的经典 AI / ML 算法。Agent 范式的真正"地基"。

5.1 Reinforcement Learning¶

Family	Algorithms
Value-based	Q-learning, DQN, Rainbow DQN
Policy gradient	REINFORCE, A2C, A3C
Actor-critic	PPO, TRPO, SAC, IMPALA
Model-based	Dyna-Q, World Models, Dreamer V1-V3
Hierarchical RL	Options framework, Feudal RL
Multi-agent RL	MADDPG, QMIX, COMA
Offline RL	CQL, IQL, AWAC, Decision Transformer
Inverse RL / Imitation	GAIL, BC, AIRL

LLM-relevant: PPO (RLHF 主力), GRPO (DeepSeek), DPO (offline preference)

5.2 Search & Planning¶

Algorithm	Use
MCTS (Monte Carlo Tree Search)	AlphaGo, AlphaZero, MuZero, LATS
*A / Best-first**	Pathfinding, classical planning
Beam search	NLP decoding, LLM 生成
Evolutionary search	GA, ES, CMA-ES, NES
Bayesian Optimization	Hyperparameter, expensive black-box
Genetic Programming	gplearn, FunSearch family
GFlowNet	Bengio 2021+; AlphaSAGE uses this

5.3 Probabilistic Inference¶

Tech	Use
Variational inference	VAE, BNN
MCMC (Markov Chain MC)	Bayesian posterior
Conformal Prediction	Distribution-free uncertainty——你 RQ2 的核心方法学
Importance sampling	Off-policy correction
Particle filters	Sequential Monte Carlo

5.4 Optimization¶

Family	Algorithm
Gradient-based	SGD, Adam, AdamW, Muon (新), Lion
Second-order	Newton, LBFGS, K-FAC
Zeroth-order	ES, CMA-ES, FD
Combinatorial	DP, ILP, LP relaxation
Convex	SDP, QP, projected gradient

5.5 Game Theory（multi-agent foundation）¶

Nash equilibrium, Stackelberg, mechanism design
CFR (Counterfactual Regret Minimization) — Poker AI
Self-play, fictitious play
Markov games / stochastic games

L5 → 边界. 再往外是 cognitive science / 神经科学 / 数学逻辑 / 复杂性理论——一般不在 ML 论文里 cite 但是智识祖辈。

跨层关系图：你的工作如何 trace 到每一层¶

你的 production system + Crypto-Alpha-Bench
  │
L0 · Financial Agent
  ├─ L0.3 alpha mining ← 你 RQ1 主战场
  ├─ L0.4 quant R&D    ← 你 RQ3 (Cognition Base)
  └─ L0.5 risk/safety  ← 你 production verifier + RQ2
  │
L1 · Domain Research Agent
  ├─ Pattern: generator-verifier separation ← 你 thesis 核心
  └─ FunSearch verifiable-novelty model     ← Crypto-Alpha-Bench 直接 inherit
  │
L2 · Production Agent System
  ├─ Coding agent stack ← Claude Code 等是你 development tool
  └─ Deep Research pipeline ← 你 RQ3 Researcher Agent 的 4 阶段对应
  │
L3 · LLM Agent Patterns
  ├─ ReAct / Reflexion ← 你 L1-L6 LLM stack 用过
  ├─ Tool use / MCP    ← 你 production 必备
  └─ Multi-agent debate ← 未来 Researcher/Engineer/Analyst 分工
  │
L4 · Foundation Model
  ├─ DeepSeek-V4-Pro / R1 ← 可作为 Crypto-Alpha-Bench Tier 2 LLM substrate baseline
  └─ Long context (1M)    ← 让 Cognition Base + Researcher 单 prompt 成为可能
  │
L5 · Algorithmic Foundations
  ├─ Walk-forward / PBO / DSR ← 你 verifier 数学根基（probabilistic + statistical）
  ├─ Conformal prediction      ← 你 RQ2 的核心方法学
  └─ MCTS / GFlowNet            ← 未来如果加 search 层会用到

技术决策树：当你设计 alpha agent 时，每一层都要选¶

层	决策	你的选择
L5 算法	用什么 search？RL/MCTS/GFlowNet/GP？	GP + walk-forward 已用，未来加 LLM + MCTS
L4 FM	哪个 LLM 当 substrate？开源 vs 闭源？	Claude / DeepSeek-V4 互补，cost-quality trade
L3 pattern	Single-agent 还是 multi-agent？	当前 single (L1-L6 stack)，未来 multi (Researcher/Engineer/Analyst)
L2 framework	LangGraph / AutoGen / 自研？	自研（已在 production）
L1 paradigm	Generator-verifier 分离？	✅ 已是核心 design
L0 application	Trading vs alpha mining vs R&D？	Alpha mining + quant R&D + risk/safety

每个决策都和上下层耦合——这就是为什么 nested framing 重要。汇报里被问 "为什么不用 X" 时，你能回答 "因为在 L_n 层我选了 Y，X 在 L_n-1 层不兼容"。

给 HKU 汇报的 framing 终极版（嵌套 6 层）¶

*"At L5, our work rests on the statistical verifier tradition——walk-forward, PBO, DSR——which gives us a finance-grade analog to AlphaProof's Lean kernel.

At L4, we are LLM-substrate agnostic: open-source DeepSeek-V4 or closed Claude both fit our generator-verifier interface.

At L3, we use a standard ReAct + multi-agent pattern (Researcher / Engineer / Analyst), familiar from ASI-ARCH and AI Scientist.

At L2, our production system is comparable in engineering rigor to Claude Code or Devin, but specialized for trading execution.

At L1, we identify as an autonomous research agent in finance, inheriting FunSearch's verifiable-novelty pattern.

At L0, we are alpha-mining + quant R&D + risk-gating, joined under a single verifier substrate——Crypto-Alpha-Bench."*

这个 framing 让任何 layer 的提问都能优雅回应。

End-of-document 标注¶

这份 doc 的写作风格刻意区别于 Chen 2026——Chen 2026 偏 paper-level 综述，这份偏 technique stack landscape。两者互补：

想了解谁做了什么——读 Chen 2026
想了解技术怎么组合——读这一份
想知道你工作的具体 anchor 在哪——读 agent_research_landscape.md（concentric L0-L5 中性视角）

作者: Paul Weng（用 Claude 作为 research collaborator 共同生成） Status: living document; will be refined as L4/L5 technologies evolve