Skip to content

Financial Agent → General Agent · Technical Landscape

Financial Agent(最具体)一层层 zoom out 到 General Agent(最通用)的技术全景图。每一层强调技术原理 + 算法 + 架构模式,论文只作 anchor。

6 层同心圆 · L0 → L5 · Maintained by Paul Weng · 2026-05-19


同心圆结构

L5 · Algorithmic Foundations              (RL · Search · Optimization · Probabilistic)
└─ L4 · Foundation Model Tech Stack       (Pretrain · Post-train · Test-time compute)
   └─ L3 · LLM Agent Patterns             (Reasoning · Planning · Tool · Memory · Multi-agent)
      └─ L2 · Production Agent Systems    (Coding · Computer use · Deep research)
         └─ L1 · Domain Research Agents   (Science · Math · Chemistry · Biology)
            └─ L0 · Financial Agents      (Trading · Alpha mining · Quant R&D · Risk)

Reading 顺序建议

  • 自下而上(L0→L5):你做的具体事如何 inherit 上游 tech
  • 自上而下(L5→L0):foundation 技术如何 propagate 到 finance application
  • 横切单层:理解 sibling techniques 的 trade-off

L0 · Financial Agents(最内层)

Scope

任何用 LLM/agent paradigm 处理金融领域 task 的系统。和 L1 的区别:domain 特定的 cost model、execution constraint、regulatory boundary 是 first-class concern。

L0 内部细分(5 个 sub-class)

L0.1 · Customer-Facing Financial Agents

Tech primitive:retrieval + 合规检查 + multi-turn dialogue + KYC integration

System 核心技术
BloombergGPT (2023) 50B params, finance-corpus pretraining
FinGPT (open-source) LoRA fine-tune 的金融 chatbot
Klarna AI agent Customer service 自动化
银行 robo-advisor 风险问卷 + portfolio construction

核心难点:regulatory disclosure、不允许 hallucinate 投资建议、需要 audit trail。

L0.2 · Trading-Decision Agents

Tech primitive:multi-modal signal fusion(news + price + tape)+ reasoning + action selection

System 核心技术 arXiv
TradingAgents Multi-agent debate(Fundamental / Sentiment / Technical analyst + Bull/Bear + Manager + Trader + Risk) 2412.20138
QuantAgent Short-horizon HFT-style structured-signal decision
FinMem Memory-augmented LLM trading(episodic + semantic + working memory) 2311.13743
FinAgent Multimodal foundation trading agent 2402.18485

核心难点:action space 大、reward latency 高、market 是 partially observable、adversarial。

L0.3 · Alpha Mining / Factor Discovery Agents(你的工作主战场

Tech primitive:symbolic search / LLM generation + statistical verification + multi-testing correction

System 搜索方法 Generator Verifier
gplearn / gpquant Genetic Programming symbolic mutation IC-based fitness
AutoAlpha Hierarchical EA walk-forward
AlphaAgent LLM-driven LLM with regularized exploration multi-regime backtest
Alpha Jungle LLM + MCTS LLM prior on tree backtest reward
QuantaAlpha LLM + evolution self-evolving trajectory IC + return metrics
AlphaSAGE GFlowNet structured policy multi-faceted reward
AlphaPROBE GNN + on-graph evolution retrieval-augmented LLM walk-forward

Tech depth note: AlphaSAGE 用 GFlowNet 是关键创新——和 RL 相比,GFlowNet 学的是和 reward 成比例的 distribution,能 explore multiple modes 而不只是 argmax,对 alpha mining 这种"长尾多解"问题更适合。

L0.4 · Quant R&D Agents(full workflow)

Tech primitive:hypothesis generation + code synthesis + experiment execution + feedback iteration

System Workflow 创新点
RD-Agent(Q) (Microsoft) Research → Development (Co-STEER code gen) → Feedback → MAB scheduler Multi-armed bandit 选研究方向
Beyond Prompting Autonomous systematic factor investing OOS validation + economic rationale
MLR-Copilot Idea / experiment / analysis / writing 把 ML research 拆成 4 阶段

Tech depth: MAB scheduler 处理 explore-exploit—决定下一轮该做的是 "deepen a promising direction" 还是 "try new hypothesis"。

L0.5 · Risk / Compliance / Safety Agents

Tech primitive:rule engine + LLM explanation + audit trail

  • Hubble — safe/reproducible LLM alpha discovery,AST sandbox + cross-sectional metrics
  • FactorMiner — self-evolving with redundancy/correlation control
  • CogAlpha — LLM-driven code evolution with robustness checks

核心 design principle: LLM never on irreversible action path——你 production system 体现的就是这条。

L0 → L1 transition

Financial agents 是 domain research agents 的特殊情况,但带 3 个独有约束: 1. Statistical verifier instead of mechanical — 没有 Lean 内核;只有 PBO/DSR/walk-forward 2. Adversarial environment — 不像化学/生物的物理常数,market 会反过来 game your strategy 3. Cost/fill realism — 学术 RL agents 假设 free execution;金融 agent 不能


L1 · Domain Research Agents

Scope

LLM-powered agents 自主完成 domain-specific 研究——hypothesis → experiment → analysis。FROM Chen 2026 survey:当前 frontier 在 L4 autonomy(task-bounded autonomy),L5(self-directed agenda)仍 aspirational。

Sub-domain technical patterns

Domain Key systems Verifier 类型 独特技术
ML research AI Scientist v1/v2 (Sakana), MLR-Copilot, ADAS training loss / accuracy Agentic Tree Search (v2)
Architecture discovery ASI-ARCH (GAIR 2025) benchmark loss Cognition Base + Researcher/Engineer/Analyst
Math / algorithms FunSearch (DeepMind), AlphaProof, AlphaEvolve, AlphaGeometry 2 Lean kernel / 程序级 Island-based evolution, neuro-symbolic
Chemistry Coscientist (CMU/Emerald), ChemCrow (EPFL) wet-lab execution / molecular sim Tool integration (synthesis robots, RDKit)
Biology BioPlanner, MedAgents clinical evidence / structural sim Domain-knowledge grounding
Materials GNoME (DeepMind 2023, Nature) DFT calculations GNN + active learning, 2.2M new materials

4 Common Architectural Patterns(Chen 2026 综述)

  1. Single-agent loop (e.g., AI Scientist v1) — 一个 LLM 反复 plan-act-reflect
  2. Multi-agent (e.g., MetaGPT, AutoGen) — role specialization
  3. Hierarchical (e.g., ChatDev, RD-Agent) — supervisor 协调下层
  4. Tool-augmented (e.g., ChemCrow) — agent 主要价值在工具编排
Open problem Alpha 中的对应
Cognitive loop trap 反复 overfit 同一 regime
Context window limits 多 symbol × long history 装不下
Novelty evaluation Alpha decay + lookahead bias
Reproducibility Walk-forward std 太大
Safety / dual-use Trading agent 直接接 OMS
Cost $100-1000/research campaign

L1 → L2 transition

Domain research agents 是 production agents 的子集,但输出是 paper / report / artifact,不是 customer-facing service。


L2 · Production Agent Systems

Scope

真正部署在用户生产环境的 agent。与 L1 区别:reliability、latency、cost 是 first-class metric;correctness 不是 in-domain academic sense 而是 user-perceived。

Major Sub-classes

L2.1 · Coding Agents(最成熟的 production category)

System Frontier metric 技术亮点
Devin (Cognition Labs 2024) First production "AI software engineer" Long-horizon planning + browser + IDE control
Claude Code (Anthropic 2024) 72% SWE-bench Verified Codebase-aware + tool use
Cursor / Windsurf / Aider IDE-integrated Diff-based edits, multi-file edits
SWE-Agent (Princeton) Open-source bench champion Agent-Computer Interface (ACI)
OpenHands (ex-OpenDevin) Open framework Multi-runtime backends
Agentless Anti-thesis to agent loops No iteration, single-shot

Tech depth: SWE-Agent 的 ACI 概念关键——为 agent 设计专门的"text-based UI"而不是直接给 raw terminal,能显著降 hallucination。

L2.2 · Computer-Use / Browser Agents

System Year 模态
Anthropic Computer Use 2024 Screenshot + mouse/keyboard
OpenAI Operator 2025 Browser-based VL agent
Google Project Mariner 2024 Chrome integration
WebArena / VisualWebArena Benchmarks Real-world web tasks

核心难点: Visual grounding (where on screen), action precision, multi-step state tracking.

L2.3 · Deep Research Agents(参考 Zhang 2025 综述)

4-stage pipeline:planning → question developing → web exploration → report generation

System 来源
GPT Deep Research OpenAI 2024
Perplexity Pro Deep Research 2025
Google Deep Research (Gemini) 2024
Tongyi DeepResearch Alibaba 2025
OpenSeeker / OpenSeeker-v2 GAIR open-source
STORM Stanford
AgentRxiv Collaborative autonomous research

Tech depth: OpenSeeker v2 用 1.17 万合成 trajectory + 简单 SFT 击败用 CPT+SFT+RL 的工业级模型——证明 trajectory quality > training pipeline complexity

L2.4 · Business Automation Agents

  • Salesforce Agentforce / Microsoft Copilot Studio / Google Agentspace
  • 重点 task: customer service / ERP workflow / supply chain
  • 技术亮点: SOP-based orchestration + permission graphs + human-in-loop

L2 → L3 transition

Production agents 全部建立在 L3 的 reusable patterns 上。L3 是"agent 编程语言",L2 是"用这些语言写的 application"。


L3 · LLM Agent Patterns & Frameworks(Agent 通用技术核心

Scope

不依赖具体 domain / application 的 agent 通用 building block。任何 L0-L2 系统都是这些 pattern 的组合。

3.1 Reasoning Patterns(LLM 的"思考方式")

Pattern 一句话 Origin
CoT (Chain of Thought) "Let's think step by step" Wei 2022
Few-shot CoT 给 example 引导 Wei 2022
Self-Consistency 多次采样 + 投票 Wang 2022
ToT (Tree of Thoughts) 树搜索 + state value Yao 2023
GoT (Graph of Thoughts) 任意 DAG 的 thought 拓扑 Besta 2023
LATS (Language Agent Tree Search) ToT + MCTS-style action search Zhou 2023
Reflexion Verbal RL,把 失败 reflection 当下次的 prompt Shinn 2023
Self-Refine 自我评估 + 自我改进的迭代 Madaan 2023
CoT with Code (PAL) 用代码作为 reasoning trace Gao 2022
ReWOO Plan 一次性出,避免多 round LLM call Xu 2023

3.2 Planning Patterns

Pattern 适用场景
ReAct (Reasoning + Acting 交错) 通用 tool-using agent
Plan-and-Execute Long-horizon, predictable subgoal
Plan-and-Solve 数学 / code
HTN + LLM 已有 task hierarchy 时
PDDL + LLM LLM 把自然语言转 PDDL,经典 planner 求解
MCTS + LLM LLM 提供 prior,MCTS 处理 exploration(Alpha Jungle 模板)

3.3 Tool Use & Integration

Tech 含义
Function calling OpenAI 2023 引入;structured JSON 输出
Toolformer Self-supervised 学怎么用工具
MCP (Model Context Protocol) Anthropic 2024,agent-tool 通信标准
A2A (Agent-to-Agent) Google 2025,跨 vendor agent 通信
ACI (Agent-Computer Interface) SWE-Agent 概念,专为 agent 设计的 abstract layer
agents.txt / ARDP Agent discovery 协议(早期)

3.4 Memory Architectures

Memory type 实现
Working Current context window
Episodic 过往 interactions 记录
Semantic 抽象 facts / 知识
Procedural Learned skills / tools

代表系统: - MemGPT (Berkeley 2023) — OS-like memory hierarchy with paging - Voyager (NVIDIA 2023) — skill library that grows - A-Mem / MemoryBank — long-term memory variants - Generative Agents (Stanford 2023) — Park et al. Smallville simulation

3.5 Multi-Agent Frameworks

Framework Pattern Origin
CAMEL Role-play conversation KAUST 2023
AutoGen Group chat + 自定义 conversation pattern Microsoft 2023
MetaGPT SOP(standard operating procedure)-based DeepWisdom 2023
ChatDev 软件公司 metaphor THU 2023
AgentVerse Multi-agent simulation env THU 2023
LangGraph Graph-based orchestration LangChain 2024
CrewAI Crew metaphor,role + task 开源
Swarm Lightweight handoff pattern OpenAI 2024

3.6 Orchestration Topologies

Supervisor-Worker        ┌─Supervisor─┐
                         │            │
                         ▼            ▼
                       Worker A    Worker B

Hierarchical            Manager → Lead A → Worker A1, A2
                              → Lead B → Worker B1, B2

Graph                   任意 DAG,节点是 agent,边是 message

Pipeline                A → B → C → D(无 cycle)

Peer / Debate           N agents 平等讨论

L3 → L4 transition

L3 patterns 全部依赖底层 LLM 的能力。L4 是这些 pattern 能 work 的前提——没有 long context / reliable instruction following / strong reasoning,所有 L3 模式都崩。


L4 · Foundation Model Tech Stack(LLM 自身的技术栈

Scope

foundation model 的训练、推理、扩展技术。这一层进展每 3-6 个月翻一轮。

4.1 Pretraining

Tech Key papers
Transformer Vaswani 2017
Scaling laws Kaplan 2020, Chinchilla (Hoffmann 2022)
MoE (Mixture of Experts) GShard / Switch / DeepSeek-V3 / Mixtral
Long context RoPE / ALiBi / YaRN / NTK-aware scaling
Hybrid attention DeepSeek-V4 (CSA + HCA), Mamba + Transformer
Linear attention Linformer / Performer / Mamba / RWKV / ASI-ARCH 系列
Multi-token prediction Better data efficiency (DeepSeek-V3)

4.2 Post-Training(Alignment & Capability)

Tech 一句话
SFT (Supervised Fine-Tuning) 给 demonstration 数据微调
RLHF (RL from Human Feedback) InstructGPT 2022
DPO (Direct Preference Optimization) Rafailov 2023; offline,免 reward model
GRPO (Group Relative Policy Optimization) DeepSeek 2024; PPO 简化版
ORPO / KTO / SimPO DPO 变体
RLAIF (RL from AI Feedback) Anthropic 2023
Constitutional AI 用 principles 替代人类 labeler
Self-play / Self-rewarding 模型自己生成 + 自己评分

4.3 Test-Time Compute(推理时算力扩展)

Tech Origin
o1 / o3 style reasoning OpenAI 2024-2025; 推理时 hidden CoT
DeepSeek-R1 2025; GRPO + 推理 trace
Gemini Thinking Google 2024
Best-of-N sampling 多次采样 + verifier 选
Self-Consistency 投票
Process Reward Models 评分中间步骤而不只是最终答案
Speculative decoding 推理加速

4.4 Architecture Innovations (近 18 个月)

Innovation 系统
Mamba / SSM Selective state space models, sub-quadratic
DeepSeek MLA (Multi-head Latent Attention) KV cache 压缩
DeepSeek mHC (Manifold-Constrained Hyper-Connections) V4,超大模型训练稳定性
Muon optimizer DeepSeek V4
CSA + HCA DeepSeek V4, 1M context 下 FLOPs 减 73%
Engram memory DeepSeek V4 探索

4.5 Open vs Closed Frontier (2026-05 状态)

Tier Closed Open
Reasoning frontier OpenAI o3, Gemini 2.5 Pro Thinking, Claude Opus 4.6 DeepSeek-R1 / R2, Qwen3-Reasoner
Coding frontier Claude Sonnet 4.6, Opus 4.6 DeepSeek-V4-Pro (80.6% SWE-bench), Qwen3-Coder
Multimodal frontier GPT-5 / Gemini 3 / Claude 4.6 Qwen-VL / DeepSeek-VL
Long context Gemini (1M+), Claude (200K+) DeepSeek-V4-Pro (1M), Qwen3 (1M)

L4 → L5 transition

Foundation model 的所有 post-training 技术本质都是 L5 算法的应用(RLHF = policy gradient + reward model;test-time compute = MCTS / search)。


L5 · Algorithmic Foundations(最外层 / 理论根基

Scope

不依赖 LLM 的经典 AI / ML 算法。Agent 范式的真正"地基"。

5.1 Reinforcement Learning

Family Algorithms
Value-based Q-learning, DQN, Rainbow DQN
Policy gradient REINFORCE, A2C, A3C
Actor-critic PPO, TRPO, SAC, IMPALA
Model-based Dyna-Q, World Models, Dreamer V1-V3
Hierarchical RL Options framework, Feudal RL
Multi-agent RL MADDPG, QMIX, COMA
Offline RL CQL, IQL, AWAC, Decision Transformer
Inverse RL / Imitation GAIL, BC, AIRL

LLM-relevant: PPO (RLHF 主力), GRPO (DeepSeek), DPO (offline preference)

5.2 Search & Planning

Algorithm Use
MCTS (Monte Carlo Tree Search) AlphaGo, AlphaZero, MuZero, LATS
A* / Best-first Pathfinding, classical planning
Beam search NLP decoding, LLM 生成
Evolutionary search GA, ES, CMA-ES, NES
Bayesian Optimization Hyperparameter, expensive black-box
Genetic Programming gplearn, FunSearch family
GFlowNet Bengio 2021+; AlphaSAGE uses this

5.3 Probabilistic Inference

Tech Use
Variational inference VAE, BNN
MCMC (Markov Chain MC) Bayesian posterior
Conformal Prediction Distribution-free uncertainty——你 RQ2 的核心方法学
Importance sampling Off-policy correction
Particle filters Sequential Monte Carlo

5.4 Optimization

Family Algorithm
Gradient-based SGD, Adam, AdamW, Muon (新), Lion
Second-order Newton, LBFGS, K-FAC
Zeroth-order ES, CMA-ES, FD
Combinatorial DP, ILP, LP relaxation
Convex SDP, QP, projected gradient

5.5 Game Theory(multi-agent foundation)

  • Nash equilibrium, Stackelberg, mechanism design
  • CFR (Counterfactual Regret Minimization) — Poker AI
  • Self-play, fictitious play
  • Markov games / stochastic games

L5 → 边界

再往外是 cognitive science / 神经科学 / 数学逻辑 / 复杂性理论——一般不在 ML 论文里 cite 但是智识祖辈。


跨层关系图:你的工作如何 trace 到每一层

你的 production system + Crypto-Alpha-Bench
L0 · Financial Agent
  ├─ L0.3 alpha mining ← 你 RQ1 主战场
  ├─ L0.4 quant R&D    ← 你 RQ3 (Cognition Base)
  └─ L0.5 risk/safety  ← 你 production verifier + RQ2
L1 · Domain Research Agent
  ├─ Pattern: generator-verifier separation ← 你 thesis 核心
  └─ FunSearch verifiable-novelty model     ← Crypto-Alpha-Bench 直接 inherit
L2 · Production Agent System
  ├─ Coding agent stack ← Claude Code 等是你 development tool
  └─ Deep Research pipeline ← 你 RQ3 Researcher Agent 的 4 阶段对应
L3 · LLM Agent Patterns
  ├─ ReAct / Reflexion ← 你 L1-L6 LLM stack 用过
  ├─ Tool use / MCP    ← 你 production 必备
  └─ Multi-agent debate ← 未来 Researcher/Engineer/Analyst 分工
L4 · Foundation Model
  ├─ DeepSeek-V4-Pro / R1 ← 可作为 Crypto-Alpha-Bench Tier 2 LLM substrate baseline
  └─ Long context (1M)    ← 让 Cognition Base + Researcher 单 prompt 成为可能
L5 · Algorithmic Foundations
  ├─ Walk-forward / PBO / DSR ← 你 verifier 数学根基(probabilistic + statistical)
  ├─ Conformal prediction      ← 你 RQ2 的核心方法学
  └─ MCTS / GFlowNet            ← 未来如果加 search 层会用到

技术决策树:当你设计 alpha agent 时,每一层都要选

决策 你的选择
L5 算法 用什么 search?RL/MCTS/GFlowNet/GP? GP + walk-forward 已用,未来加 LLM + MCTS
L4 FM 哪个 LLM 当 substrate?开源 vs 闭源? Claude / DeepSeek-V4 互补,cost-quality trade
L3 pattern Single-agent 还是 multi-agent? 当前 single (L1-L6 stack),未来 multi (Researcher/Engineer/Analyst)
L2 framework LangGraph / AutoGen / 自研? 自研(已在 production)
L1 paradigm Generator-verifier 分离? ✅ 已是核心 design
L0 application Trading vs alpha mining vs R&D? Alpha mining + quant R&D + risk/safety

每个决策都和上下层耦合——这就是为什么 nested framing 重要。汇报里被问 "为什么不用 X" 时,你能回答 "因为在 L_n 层我选了 Y,X 在 L_n-1 层不兼容"。


给 HKU 汇报的 framing 终极版(嵌套 6 层)

*"At L5, our work rests on the statistical verifier tradition——walk-forward, PBO, DSR——which gives us a finance-grade analog to AlphaProof's Lean kernel.

At L4, we are LLM-substrate agnostic: open-source DeepSeek-V4 or closed Claude both fit our generator-verifier interface.

At L3, we use a standard ReAct + multi-agent pattern (Researcher / Engineer / Analyst), familiar from ASI-ARCH and AI Scientist.

At L2, our production system is comparable in engineering rigor to Claude Code or Devin, but specialized for trading execution.

At L1, we identify as an autonomous research agent in finance, inheriting FunSearch's verifiable-novelty pattern.

At L0, we are alpha-mining + quant R&D + risk-gating, joined under a single verifier substrate——Crypto-Alpha-Bench."*

这个 framing 让任何 layer 的提问都能优雅回应。


End-of-document 标注

这份 doc 的写作风格刻意区别于 Chen 2026——Chen 2026 偏 paper-level 综述,这份偏 technique stack landscape。两者互补:

  • 想了解谁做了什么——读 Chen 2026
  • 想了解技术怎么组合——读这一份
  • 想知道你工作的具体 anchor 在哪——读 agent_research_landscape.md(concentric L0-L5 中性视角)

作者: Paul Weng(用 Claude 作为 research collaborator 共同生成) Status: living document; will be refined as L4/L5 technologies evolve