Alpha Auto Search · Literature Survey¶

Stage 1: Taxonomy · Stage 2: Bibliography¶

整理：2026-05 范围：自动化 alpha 因子搜索的完整学术地图用途：(a) 这周 HKU 汇报的论文 grounding 增厚；(b) 长期作为研究 survey 基础

Stage 1 · 6 维 Taxonomy¶

每篇文献按以下 6 维打点。打完后整张表能 surface 出"哪类组合还没人做过"——即研究空白。

维度 D1 · 搜索单元 (Search Unit)¶

取值	含义	代表工作
`formula`	公式型因子表达式（AST 树或线性组合）	gplearn / AutoAlpha / AlphaSAGE / Alpha Jungle
`program`	完整可执行程序（Python 函数）	FunSearch / AlphaEvolve
`nn-weights`	直接搜神经网络权重 / 嵌入	FactorVAE / HIST / HireVAE
`portfolio`	直接搜投资组合权重	AlphaPortfolio / FinRL
`research-flow`	完整研究流程（hypothesis → experiment）	AI Scientist v2 / ASI-ARCH
`architecture`	模型架构本身	ASI-ARCH（这是 architecture discovery 的本意）

维度 D2 · 生成器 (Generator)¶

取值	含义
`GP`	Genetic Programming 或 Genetic Algorithm
`EA`	其他演化算法（CMA-ES、NES 等）
`RL`	Reinforcement Learning（PPO / REINFORCE / GFlowNet）
`LLM`	LLM-driven generation
`LLM+EA`	LLM 与演化结合（FunSearch / AlphaEvolve）
`LLM+MCTS`	LLM 与 Monte Carlo Tree Search 结合
`MAS`	Multi-Agent System（多 LLM agent 分工）
`hybrid`	神经符号混合等

维度 D3 · 验证器 (Verifier)¶

取值	含义
`single-scalar`	单一分数（IC / Sharpe）
`multi-eval`	多目标 Pareto 评估
`walk-forward`	滚动窗口验证
`formal`	形式化绝对可信验证（Lean 内核等）
`paper-shadow`	paper trading / shadow trading 验证
`LLM-judge`	LLM 自身作为评分员（风险)
`composite`	多层 verifier 组合

维度 D4 · Knowledge Grounding¶

取值	含义
`none`	无外部知识，纯数据驱动
`operators`	仅人工算子先验（GP 算子集）
`literature-light`	LLM prompt 里塞少量论文 reference
`literature-structured`	结构化文献知识库
`cognition-base`	ASI-ARCH 级别的完整 Cognition Base
`factor-zoo`	显式构建在 factor zoo / anomaly library 上

维度 D5 · 评估严格性 (Evaluation Rigor)¶

取值	含义
`single-oos`	一次 OOS holdout
`walk-forward`	滚动窗口
`block-bootstrap`	块自助 confidence interval
`deflated-sharpe`	Deflated Sharpe Ratio 多重检验校正
`pbo`	Probability of Backtest Overfitting
`purged-kfold`	López de Prado 的 purged k-fold + embargo
`multiple-testing`	Harvey-Liu 系列 multiple testing 框架
`composite-rigor`	上述多种组合

维度 D6 · 自进化层级 (Meta-level)¶

来自我们之前的讨论：

取值	含义
`L0`	固定框架，搜对象
`L1`	搜搜索策略（agent 协作拓扑）
`L2`	搜评估器本身
`L3`	搜 DSL（算子集）
`L4`	搜框架本身（self-improving）

Stage 2 · 按 6 个 Tradition 系统建 Bibliography¶

Tradition 1 · Classical Symbolic Regression / GP for Finance¶

为什么必读：你的 baseline 笔记里完全没覆盖。任何讲 LLM-driven alpha 工作的 talk，如果不能讲清楚"为什么 LLM 比 GP 强"，会被 Q&A 打穿。

Paper	Year	D1	D2	D3	D4	D5	D6	Key Insight	URL
gplearn (open-source package)	2015+	formula	GP	single-scalar	operators	single-oos	L0	Python 标准 GP 包，是几乎所有后续 formula alpha 工作的 baseline	github
AutoAlpha: Hierarchical Evolutionary Algorithm for Mining Alpha	2020	formula	EA(hierarchical)	single-scalar	operators	single-oos	L0	第一代显式优化 alpha mining 效率的 EA 工作	arXiv 2002.08245
gpquant (gplearn 改造)	2022+	formula	GP+ts-ops	single-scalar	operators	single-oos	L0	加入时间序列算子，是中文量化社区常用 baseline	github
Alpha Mining and Enhancing via Warm Start Genetic Programming	2024	formula	GP+warm-start	walk-forward	operators+literature-light	walk-forward	L0	用 LLM 给 GP 提供 warm start 起点，hybrid 思路	arXiv 2412.00896
Symbolic Regression for Financial Machine Learning	2023	formula	GP	multi-eval	operators	walk-forward	L0	较新的金融 SR survey 视角	ResearchGate
AlphaForge	2024	formula	DL-generator	multi-eval	none	walk-forward	L0	不是 GP，是 DL-based 生成 + 动态组合	arXiv 2406.18394
AlphaPROBE	2026	formula	GNN-encoder + on-graph evolution	multi-eval	literature-light	walk-forward	L0/L1	"原则性 retrieval" + on-graph biased evolution，是 LLM 时代的 GP 改造	arXiv 2602.11917
AlphaSAGE	2025	formula	GFlowNet+GNN	multi-eval	operators	walk-forward	L0	GFlowNet 用在 alpha mining 是新方向——比 PPO 收敛更快、更稳，且 explore 多个 high-reward mode	arXiv 2509.25055

Tradition 1 关键洞察（用于汇报）：

GP 范式的根本瓶颈是搜索盲目性——没有 financial intuition，在大搜索空间里随机游走。LLM-driven 工作的本质改进是把 LLM 当 "informed prior"。但 GFlowNet（AlphaSAGE）路线证明：不一定非要 LLM——一个学到 reward landscape 的 generator 也能达到类似效果，且成本更可控。这是和韩教授可以聊的 generative AI 跨域案例。

Tradition 2 · Deep Learning Factor Models¶

为什么必读：这条线是"用 NN 直接生成 alpha"而不是"搜表达式"，是 formulaic 路线的 competitor。汇报 Q&A 可能被问"为什么不直接用 NN 端到端预测"。

Paper	Year	D1	D2	D3	D4	D5	D6	Key Insight	URL
Deep Factor Model	2018	nn-weights	DL	single-scalar	none	single-oos	L0	早期把因子模型完全端到端化	arXiv 1810.01278
Deep Recurrent Factor Model	2019	nn-weights	DL+recurrence	single-scalar	none	single-oos	L0	引入 RNN，可解释性较好	arXiv 1901.11493
FactorVAE	2022 (AAAI)	nn-weights	VAE	multi-eval	none	walk-forward	L0	Probabilistic dynamic factor model，对未见过的股票 rebust——可作时间序列泛化讨论的 anchor	AAAI
HIST	2022	nn-weights	DL+graph	single-scalar	none	walk-forward	L0	从市场数据中提 hidden relations，是图神经因子方向代表	(常见 baseline 引用)
HireVAE	2023	nn-weights	hierarchical VAE + regime	multi-eval	none	walk-forward	L0	加入 regime-switch，online adaptive	arXiv 2306.02848
RVRAE	2024	nn-weights	variational recurrent AE	multi-eval	none	walk-forward	L0	RNN-VAE 路线最新成果	arXiv 2403.02500
FactorGCL	2025	nn-weights	hypergraph + contrastive	multi-eval	none	walk-forward	L0	把 contrastive learning 用到 factor 表示——是 RQ1 可以引用的方法	arXiv 2502.05218

Tradition 2 关键洞察：

DL factor models 优势是端到端学非线性表征；劣势是可解释性差、不容易和经济学先验对齐、容量上限存疑。在你汇报里可以用一句话定位："formulaic alpha 牺牲一些容量换可解释性和 ground 到经济机制的能力，DL factor 反过来。两条路线在 RQ3（Cognition Base）的框架下其实可以融合——把 DL 表征作为 Cognition Base 的一种 element。"

Tradition 3 · LLM-Driven Alpha Mining（我们已覆盖部分）¶

已在 baseline 笔记里展开过，这里只补充未覆盖的部分 + 一篇关键 survey。

Paper	Year	D1	D2	D3	D4	D5	D6	Key Insight	URL
AlphaAgent	2025	formula	LLM	multi-eval+正则	literature-light	walk-forward	L0	三 agent 闭环 + 三种 regularization 对抗 alpha decay	arXiv 2502.16789
Navigating the Alpha Jungle (LLM-MCTS)	2025	formula	LLM+MCTS	multi-eval	literature-light	walk-forward	L0/L1	AlphaProof 风格搬到 alpha mining——LLM prior + MCTS exploration	arXiv 2505.11122
QuantaAlpha	2026	formula	LLM+EA	multi-eval	literature-light	walk-forward	L0	开源框架完整度较高，可作起步	github
FactorMAD	2025	formula	MAS (debate)	LLM-judge	literature-light	walk-forward	L0/L1	多 agent debate 提升可解释性——但 LLM-judge 风险大	ACM
Alpha-GPT	2025 (EMNLP demo)	formula	LLM	single-scalar	literature-light	single-oos	L0	偏交互式人机协作，产品形态参考	aclanthology
QuantFactor REINFORCE	2024	formula	RL (REINFORCE)	multi-eval+variance-bound	none	walk-forward	L0	明确指出 PPO 在 alpha mining 里有问题，建议改 REINFORCE	arXiv 2409.05144
Synergistic Formulaic Alpha (RL)	2024	formula	RL	multi-eval	none	walk-forward	L0/L1	强调寻找 synergistic 组合而非单 best factor	arXiv 2401.02710
A Survey on LLM-based Alpha Mining（关键！）	2025 (FITEE)	meta	meta	meta	meta	meta	meta	2025 出的领域综述，必读，是 talk 里可以直接引的"已有 survey"——你的工作可以 positioning 在它的 framework 之外	Springer

Tradition 3 关键洞察：

LLM-driven 这条线 2025 年集中爆发，呈现两个分支——一是 LLM 作为 generator（AlphaAgent / Alpha-GPT / QuantaAlpha），二是 LLM 作为 prior + 搜索算法（Alpha Jungle 的 LLM-MCTS / AlphaPROBE 的 retrieval）。后者的 search efficiency 更高，且能融合非 LLM 的搜索算法（MCTS / GFlowNet）的统计性质。FactorMAD 那种纯 LLM-judge 的工作要警惕——是我们 baseline 笔记里 "LLM-as-Judge 闭环" 失败模式的典型例子。

Tradition 4 · AI for Science Transfer（我们的 baseline 笔记主战场）¶

已在 alpha_search_baselines.md 完整覆盖：

FunSearch (Nature 2023)
AlphaProof / AlphaGeometry 2 (Nature 2025)
AlphaEvolve (arXiv 2506.13131, 2025)
AI Scientist v2 (arXiv 2504.08066, 2025)
ASI-ARCH (arXiv 2507.18074, 2025) — "AlphaGo Moment for Model Architecture Discovery"
OpenSeeker / OpenSeeker-v2 (2025-2026)
AutoMR (Meta Reasoning Skeleton, OpenReview 2025)
Meta-Rewarding Language Models (arXiv 2407.19594, 2024)

Tradition 5 · 回测方法论严格化（李教授一定会问）¶

这是整个 survey 里最影响汇报抗压性的部分。

Paper	Year	核心贡献	对你工作的影响	URL
The Probability of Backtest Overfitting (Bailey, Borwein, López de Prado, Zhu)	2013	PBO via combinatorially symmetric cross-validation (CSCV)，估计回测过拟合的概率	你的 LightGBM val→test 反转应该报告 PBO；扩展 walk-forward 到 100 fold 后可做严格 CSCV	SSRN / PDF
Deflating the Sharpe Ratio (López de Prado)	2014	提出 Deflated Sharpe Ratio (DSR)，校正多重检验下的 Sharpe 选择偏差	必须在 walk-forward report 里加 DSR；当前 12 fold 不够，扩展到 100+ fold 才能严肃讲	SSRN
The Deflated Sharpe Ratio: Correcting for Selection Bias... (Bailey, López de Prado)	2014	DSR 的完整论文，含 non-Normality 校正	同上	SSRN
"...and the Cross-Section of Expected Returns" (Harvey, Liu, Zhu)	2016 (RFS)	奠基性多重检验框架——指出金融经过几十年 data mining，t-stat > 2.0 已不够，应该 > 3.0	直接的 quote material——证明你理解 academic finance 对 false discovery 的严格态度	SSRN / NBER
Lucky Factors (Harvey, Liu)	2021 (JFE)	用 bootstrap 框架做 multiple-testing 校正，识别哪些 factor 是 "lucky"	你的 walk-forward 是另一种 framework，可以引这篇作为对照	JFE / PDF
Replicating Anomalies (Hou, Xue, Zhang)	2020 (RFS)	用统一标准 retest 447 个学术声称的 anomaly，发现 65% 不能通过 t=1.96，82% 不能通过 t=2.78	RQ3 的 Cognition Base 必须建在这种 replication-aware 的基础上——不能把 published anomaly 直接当真	SSRN / NBER
Backtest overfitting in the ML era	2024	把 PBO/DSR 等 OOS 测试方法在 ML 时代重新 benchmark	最新的方法论 update，引用最方便	ScienceDirect

Tradition 5 关键洞察：

金融 ML 论文不引用这条线就是不严谨。你的 walk-forward + LightGBM/Optuna 负结果如果用 PBO + DSR 重新包装，立刻从"经验观察"升级为"标准方法学下的结论"。汇报第一幕 slide 12（Time-Slice Stability）可以补一句："我目前的 12 fold sample size 还不足以做严格的 Deflated Sharpe，下一步要扩到 100 fold + CSCV-based PBO"。这一句话就把整个负结果讲法升级了一个层次。

Tradition 6 · 时间序列 Foundation Models（李教授 RQ1 直接相关）¶

Paper	Year	核心思想	对 RQ1 的影响	URL
PatchTST: A Time Series is Worth 64 Words	2022 (ICLR 2023)	Univariate patching + channel-independence，简单 Transformer 直接打败复杂时序 Transformer	RQ1 baseline candidate——做 architectural prior 实验时必须对照 PatchTST	arXiv 2211.14730
TimesNet	2023	Series segmentation by dominant frequencies	频域归纳偏置作为 architectural prior 的一种	(常引用)
TimeMixer	2024	MLP-based with series decomposition mixing	表明不一定需要 attention，MLP + 好 prior 也能强	datasciencewithmarco blog
iTransformer	2024	Inverted Transformer，attention 在 variable 维而不是时间维	另一种 architectural prior 提案	(常引用)
Lag-Llama	2024	第一个 probabilistic time series foundation model	零样本 + 概率预测，crypto 适用性需要测	arXiv 2310.08278
Chronos	2024 (Amazon)	把时序当语言建模——连续值量化为 token，T5 encoder-decoder	RQ1 立刻可以 benchmark 的强 baseline；其量化 token 范式对 crypto 高频是否适用是个 open question	Medium summary
TimesFM	2024 (Google)	Decoder-only Transformer，Google 内部时序数据预训练	大模型 + zero-shot 路线代表；和 Encoding Recurrence 形成强对照	(Google paper)
Moirai	2024 (Salesforce)	Any-variate attention + LOTSA 数据集（27B observations）	多变量时序的 foundation model 范式	(常引用)
Encoding Recurrence into Transformers (Huang, Lu, Cai, Qin, Fang, Tian, Li Guodong)	2023 (ICLR Oral, Top 5%)	REM + RSA 模块——把 RNN 拆解成 lightweight positional encoding，注入 Transformer	李教授本人的工作，RQ1 的直接 anchor；论证 architectural prior > 模型容量	OpenReview / github
Position: There are no Champions in Long-Term TSF	2025	各种 SOTA 时序 Transformer 在不同 benchmark 上互有胜负，没有 dominant model	RQ1 的研究价值的反证——正因为没 champion，inductive bias 研究还有空间	arXiv 2502.14045

Tradition 6 关键洞察：

时序 foundation model 这条线 2024 出现了爆发，但 2025 年的 position paper 明确说"没有 champion"——任何单一模型都不能在所有 benchmark 上 dominate。这意味着 architectural prior 的研究空间还很大——这正好是 Li Guodong 在 ICLR 2023 工作里强调的 thesis。你的 RQ1 是把这个 thesis 从一般 time series 扩展到 crypto 微结构这一具体领域。

Tradition 7 · Conformal Prediction for Time Series & LLM Agents（RQ2 直接相关）¶

Paper	Year	核心思想	对 RQ2 的影响	URL
Conformal Time-Series Forecasting (Stankevičiūtė et al.)	2021 (NeurIPS)	第一篇将 CP 系统地用到时间序列预测	入门起点	NeurIPS
Sequential Predictive Conformal Inference	2022	解决 time series 非交换性问题	直接处理"金融数据不满足 CP 假设"的核心难题	arXiv 2212.03463
CP Algorithms for TSF: Methods and Benchmarking	2026	系统比较时序 CP 方法	RQ2 selection guide	arXiv 2601.18509
Temporal Conformal Prediction (TCP)	2025	Distribution-free + ML quantile forecaster + Robbins-Monro 校准	RQ2 最直接的 candidate——结合 quantile forecaster + online calibration	arXiv 2507.05470
Online CP for Multi-step TSF	2024	多步预测的 CP 自适应	适用于你 30m / 60m forward window	arXiv 2410.13115
Prune 'n Predict: Optimizing LLM Decision-making with CP	2025 (ICML)	LLM decision 上的 CP application	RQ2 的"LLM agent decision-time safety"直接相关	ICML poster
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with CP	2025 (EMNLP)	LLM-as-judge 评分 CP 化	把 CP 用于 LLM-judge 的可信度估计	arXiv 2509.18658
TECP: Token-Entropy Conformal Prediction for LLMs	2025	Token-level CP	细粒度的 LLM CP 方法	MDPI Mathematics

Tradition 7 关键洞察：

CP for time series 的核心挑战是非交换性（金融数据有时序依赖、分布漂移），近 2 年才有针对性的方法（TCP / Sequential CP）。CP for LLM agent 完全是 2025 年新兴方向。RQ2 (Open-World LLM Agent Safety) 处在两条线的交叉点——既是新方向，又有可借鉴的 methodology blocks。这是 RQ2 的研究 niche。

Tradition 8 · Factor Zoo & Anomaly Literature（RQ3 Cognition Base 内容来源）¶

Paper	Year	核心贡献	URL
Replicating Anomalies (Hou, Xue, Zhang)	2020 (RFS)	447 anomaly retest，65% 失败 single-test，82% 失败 multiple-test 标准	PDF
Taming the Factor Zoo: A Test of New Factors (Feng, Giglio, Xiu)	2020 (JF)	双重选择 LASSO 框架评估新 factor 的边际贡献	PDF
Is There a Replication Crisis in Finance? (Jensen, Kelly, Pedersen — JKP)	2023 (JF)	用一致方法 retest 153 anomaly，给出 JKP factor library	(公开 dataset)
The Cross-Section of Expected Returns (Harvey-Liu-Zhu)	2016 (RFS)	见 Tradition 5	NBER
Big Data Asset Pricing 4: Factor Zoo and Replication	2022	factor zoo 的综述视角	ResearchGate

Tradition 8 关键洞察：

构建 Cognition Base 第一件事不是收集论文，是 retest 论文。Hou-Xue-Zhang 已经证明 80%+ 的学术 anomaly 不能 robust 通过严格 multiple-testing。Cognition Base 必须区分"published claim"和"replication-verified result"。JKP factor library 是相对最 trustworthy 的起点。

Stage 1 + Stage 2 综合输出：Research Gap Heatmap¶

把 Tradition 1-8 的所有论文按 6 维 taxonomy 打点后，能 surface 出几个明确的研究空白——

Gap 1 · GFlowNet × Cognition Base 组合¶

AlphaSAGE 用 GFlowNet 探索 multiple high-reward modes，但其 knowledge grounding 只到 operators。没有人尝试 GFlowNet + structured Cognition Base 的组合——这可能是 RQ3 的一个 concrete proposal。

Gap 2 · Time Series Foundation Model × Architectural Prior¶

Chronos / TimesFM / Moirai 全部走 generic foundation model 路线；Li Guodong 的 Encoding Recurrence 走 architectural prior 路线。两条路线没有显著交叉工作——RQ1 正是填这个空白：把 architectural prior 思想加到 foundation model 训练（不是 finetune）阶段。

Gap 3 · Conformal Prediction × Multi-Agent LLM¶

CP for LLM 主要在 single-turn / single-judge 场景。Multi-agent LLM workflow（如 ASI-ARCH / AI Scientist）的 cumulative uncertainty 怎么估计？——这是 RQ2 的 niche。

Gap 4 · Deflated Sharpe × LLM-driven Alpha Mining¶

整个 LLM-driven alpha mining 文献（Tradition 3）几乎没有引用 DSR / PBO / Lucky Factors。这是一个非常明显的研究空白——把严格 multiple-testing 框架引入 LLM-driven 评估，能立刻在方法论严谨性上拉开和现有工作的距离。这件事你的项目已经走在前面（你的 walk-forward 已经做 fold 设计），扩展到 DSR/PBO 是 18 个月内可以做到的。

Gap 5 · Factor Replication × Crypto¶

学术 factor replication 工作（Hou-Xue-Zhang / JKP）几乎全部在 US equity 上。Crypto perpetuals 的 factor replication 没人做过 systematic study——你的 526 symbol × 15s × 12 fold infrastructure 是这件事的天然 testbed。

推进 stage 3-5 的建议¶

Stage 3 · 分层精读——

Tier 1（10-15 篇精读）：

Harvey-Liu-Zhu 2016 "...and the Cross-Section of Expected Returns"
Bailey-López de Prado 2014 "Deflated Sharpe Ratio"
Hou-Xue-Zhang 2020 "Replicating Anomalies"
ASI-ARCH (arXiv 2507.18074)
AlphaSAGE (arXiv 2509.25055)
Alpha Jungle LLM-MCTS (arXiv 2505.11122)
Encoding Recurrence into Transformers (ICLR 2023)
Chronos paper
AlphaEval (arXiv 2508.13174)
A Survey on LLM-based Alpha Mining (FITEE 2025)
Temporal Conformal Prediction (arXiv 2507.05470)
FunSearch (Nature 2023)
AlphaEvolve (arXiv 2506.13131)
AI Scientist v2 (arXiv 2504.08066)
QuantFactor REINFORCE (arXiv 2409.05144)

Tier 2（30-50 篇略读）：其余 Tradition 1-8 列出的论文。

Tier 3（其余）：通过 Tier 1 citation chasing 找到的 200-300 篇。

Stage 4 · 抽取——

每篇精读后用 standardized template：

Title | Year | Venue | Authors
D1-D6 taxonomy:
Search unit: 
Generator:
Verifier:
Knowledge grounding:
Eval rigor:
Meta-level:
---
Key claim:
Empirical setting:
Failure modes acknowledged in paper:
Failure modes I notice (not in paper):
Connection to my work:
Open questions raised:
What would I do differently:

Stage 5 · 综合输出——

最终一份 30-50 页的 systematic review draft，组织为：

Introduction: 为什么 alpha auto search 是值得 systematic survey 的领域
Taxonomy 6 维
Tradition 1-8 各章
Research Gap Heatmap (Gap 1-5)
Discussion: 三个值得深入的研究方向（RQ1-RQ3 学术化版本）
Conclusion + Future Work

参考链接索引¶

（按 tradition 顺序）

Tradition 1 (Classical GP): - https://github.com/trevorstephens/gplearn - https://ideas.repec.org/p/arx/papers/2002.08245.html (AutoAlpha) - https://github.com/UePG-21/gpquant - https://arxiv.org/html/2412.00896v1 (Warm Start GP) - https://arxiv.org/html/2406.18394v1 (AlphaForge) - https://arxiv.org/html/2602.11917v1 (AlphaPROBE) - https://arxiv.org/abs/2509.25055 (AlphaSAGE)

Tradition 2 (DL Factor Models): - https://arxiv.org/pdf/1810.01278 (Deep Factor) - https://cdn.aaai.org/ojs/20369/20369-13-24382-1-2-20220628.pdf (FactorVAE) - https://arxiv.org/pdf/2306.02848 (HireVAE) - https://arxiv.org/html/2403.02500v1 (RVRAE) - https://arxiv.org/html/2502.05218v1 (FactorGCL)

Tradition 3 (LLM-driven Alpha): - https://arxiv.org/abs/2502.16789 (AlphaAgent) - https://arxiv.org/abs/2505.11122 (Alpha Jungle) - https://github.com/QuantaAlpha/QuantaAlpha - https://link.springer.com/article/10.1631/FITEE.2500386 (Survey) - https://arxiv.org/html/2409.05144v1 (QuantFactor REINFORCE) - https://arxiv.org/html/2401.02710 (Synergistic Formulaic)

Tradition 4 (AI for Science): - 见 alpha_search_baselines.md

Tradition 5 (Backtest Rigor): - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253 (PBO) - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2465675 (Deflating Sharpe) - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2460551 (DSR full) - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2249314 (Cross-Section) - https://www.sciencedirect.com/science/article/abs/pii/S0304405X21001410 (Lucky Factors) - https://www.nber.org/system/files/working_papers/w23394/w23394.pdf (Replicating Anomalies)

Tradition 6 (Time Series FM): - https://arxiv.org/abs/2211.14730 (PatchTST) - https://openreview.net/forum?id=7YfHla7IxBJ (Encoding Recurrence) - https://arxiv.org/pdf/2310.08278 (Lag-Llama) - https://arxiv.org/html/2502.14045v1 (No Champions position)

Tradition 7 (Conformal Prediction): - https://proceedings.neurips.cc/paper/2021/file/312f1ba2a72318edaaa995a67835fad5-Paper.pdf (Stankevičiūtė) - https://arxiv.org/pdf/2212.03463 (Sequential CP) - https://arxiv.org/html/2507.05470v4 (TCP) - https://icml.cc/virtual/2025/poster/46415 (Prune n Predict)

Tradition 8 (Factor Zoo): - https://theinvestmentcapm.com/uploads/½/2/6/122679606/houxuezhang2019rfs.pdf - https://dachxiu.chicagobooth.edu/download/ZOO.pdf (Taming the Factor Zoo)

Evaluation Frameworks: - https://arxiv.org/abs/2508.13174 (AlphaEval) - https://github.com/LeoDingggg/AlphaEval

End of Stage 1 + Stage 2 deliverable.