Alpha Auto Search · Research Plan¶
A consolidated research plan on AI-driven alpha factor search — what's been done, what's missing, and what's worth doing next.
Last updated: 2026-05-18 · Maintained by Paul Weng (paulweng)
0. Executive Summary¶
In 2025, AI for Science / AI for Math entered the "AlphaGo Moment" era. DeepMind's ASI-ARCH paper ("AlphaGo Moment for Model Architecture Discovery", arXiv:2507.18074) demonstrated an empirical scaling law for scientific discovery itself — cumulative SOTA findings vs. compute trace a linear curve, meaning research output can be scaled computationally.
This repo systematically maps what these advances mean for automated alpha factor search in quantitative finance, identifies a clear research-shaped gap, and proposes a concrete agenda.
Core thesis (one sentence):
Alpha auto search has the same generator-verifier structure as AlphaProof / AlphaEvolve / ASI-ARCH, but the field has no unified baseline. Building one is both a research contribution in its own right and the prerequisite for everything else.
Concrete next move: Crypto-Alpha-Bench — a unified benchmark for alpha auto search, modeled on ImageNet / SWE-Bench style infrastructure papers. Section 5 below.
1. Background · Why This Matters Now¶
Three things happened in 2024-2026 that make this timing right:
1.1 AI for Science 从理论范式变成可复现的工程范式
- FunSearch (Nature 2023) 证明 LLM + 演化能解开数学开放问题
- AlphaProof / AlphaGeometry 2 (Nature 2025) 在 IMO 拿银牌,明确了"生成器 + 形式可信 verifier"范式
- AlphaEvolve (DeepMind 2025) 把这套工程化——发现 4x4 矩阵乘法 48 次乘法(57 年来首次改进)
- AI Scientist v2 (Sakana 2025) 首次有 AI 写的论文通过 peer review (ICLR 2025 workshop)
- ASI-ARCH (GAIR 2025) 在 20,000 GPU hours 发现 106 个新 SOTA 线性 attention 架构,且给出第一条 Scaling Law for Scientific Discovery
1.2 LLM-driven alpha mining 工作集中爆发
2025 一年内出现了 AlphaAgent (arXiv 2502.16789)、Navigating the Alpha Jungle (arXiv 2505.11122)、QuantaAlpha、FactorMAD、Alpha-GPT 等。但评估各自为政,互相不可比。FITEE 2025 出了一篇 "Survey on LLM-based Alpha Mining" (link.springer.com/article/10.1631/FITEE.2500386),确认这是一个正在形成的子领域。
1.3 严格统计方法学已经成熟,但尚未渗透 LLM-driven 工作
López de Prado 系列(PBO 2013、DSR 2014)+ Harvey-Liu-Zhu 2016 RFS + Lucky Factors 2021 JFE + Hou-Xue-Zhang 2020 RFS 已经给出金融多重检验严格化的完整工具箱。但翻遍 2025 LLM-driven alpha 文献,几乎没有引用这条线。这是一个非常明显的研究空白——把严格 multiple-testing 框架引入 LLM-driven 评估,能立刻在方法论严谨性上拉开和现有工作的距离。
2. 已完成的工作¶
本 repo 当前包含 3 份研究 artifact:
2.1 alpha_search_baselines.md — Frontier Baseline Notes¶
精读 5-7 篇 frontier paper,每篇按 "方法论拆解 + 对 alpha 搜索的迁移路径" 双视角组织:
- FunSearch (Nature 2023) — LLM + evolution + program search
- AlphaProof / AlphaGeometry 2 (Nature 2025) — neuro-symbolic + RL + formal verification
- AlphaEvolve (arXiv 2506.13131) — FunSearch 工程化升级
- AI Scientist v2 (arXiv 2504.08066) — full research loop + Agentic Tree Search
- ASI-ARCH (arXiv 2507.18074) — multi-agent + scaling law
- OpenSeeker (arXiv 2603.15594) — 开源小模型 + 合成训练数据范式
加上对 meta 化层级(L0-L4)的分析和金融领域已有迁移工作(AlphaAgent / Alpha Jungle / QuantaAlpha)的对照。
2.2 alpha_search_survey_taxonomy_and_bibliography.md — 8-Tradition Systematic Survey¶
为整个领域建立 6 维 taxonomy(搜索单元 / 生成器 / 验证器 / Knowledge grounding / 评估严格性 / 自进化层级),然后按 8 个 tradition 系统建 bibliography:
- Classical GP / Symbolic Regression for Finance(gplearn → AlphaSAGE/GFlowNet)
- Deep Learning Factor Models(FactorVAE / HIST / FactorGCL)
- LLM-Driven Alpha Mining(已覆盖 + FITEE 2025 survey 补充)
- AI for Science Transfer(baseline notes 主战场)
- 回测方法论严格化(PBO / DSR / Harvey-Liu / Hou-Xue-Zhang)—— 之前完全没碰过的盲区
- 时间序列 Foundation Models(PatchTST / Chronos / TimesFM / Moirai / Lag-Llama / Encoding Recurrence)
- Conformal Prediction for TS & LLM Agents(TCP / Sequential CP / Prune n Predict)
- Factor Zoo & Anomaly Replication(Hou-Xue-Zhang / JKP / Taming the Factor Zoo)
最后 surface 出 5 个明确的研究空白(Gap 1-5),其中两个直接对应下面的 RQ。
2.3 alpha_search_deep_reads.md — Three Deep-Read Clusters¶
为关键问题准备的深读 cluster:
- Cluster 1: López de Prado 系列 + Harvey-Liu "Lucky Factors" + Hou-Xue-Zhang Replicating Anomalies——这一线提供"如何把经验观察讲成方法学结论"的语言
- Cluster 2: 时序 Foundation Models + Encoding Recurrence into Transformers (Li Guodong, ICLR 2023 Oral)——为 RQ1 提供 positioning 的对照面
- Cluster 3: AlphaEval (arXiv 2508.13174)——2025 年最新的"alpha mining unified evaluation framework"
3. Core Thesis¶
读完前沿 + 系统 survey + 深读后形成的判断:
3.1 Pattern Recognition¶
近 3 年所有突破性工作共享 4 个 pattern:
- Generator-Verifier Separation —— 创造性与保证性解耦
- Cognition Base / Knowledge Grounding —— 结构化领域先验是 scaling law 的隐藏因果变量
- Multi-Agent Decomposition —— 职责分离比模型能力更重要
- Scaling Law for Discovery —— 算力可以直接换发现产出(前提是前 3 条都成立)
3.2 Mapping to Alpha Search¶
| Frontier Pattern | 当前 alpha auto search 现状 | 我已交付的(生产系统) |
|---|---|---|
| Generator-Verifier Separation | LLM-driven 工作部分实现,但 verifier 太弱 | ✅ Production-grade(详见 whatsapp_AI_Trading_Agent) |
| Hard Verifier | 大多用单一 IC/Sharpe,无 multiple-testing 校正 | ✅ Walk-forward + microstructure gate + adaptive state controller |
| Cognition Base | 几乎没人做 | ❌ 完全没做 |
| Researcher Agent | 部分实现(AlphaAgent / Alpha Jungle) | ❌ 完全没做 |
| Compute-scaled discovery | 没有 controlled experiment | ❌ 单人本地规模 |
我有 Verification 这一半,缺的是 Generation/Discovery 这一半,且整个领域缺 unified baseline。
3.3 Key Insight: The Field Has No ImageNet Moment¶
读完整个 Tradition 1-8 后最深的判断 ——
alpha auto search 领域目前没有公认 baseline。每篇 paper 自己定义数据、cost model、metric、fold structure,互相不可比。所谓 "SOTA" 都是 hand-picked 评估下的声称。这件事本身就是阻碍 scaling law 在金融上成立的根本原因——没有 fixed evaluation,所谓"compute → discovery"的因果关系无法验证。
4. 三个核心研究问题¶
这三个 RQ 互相支撑构成研究三角:
RQ1 · Architectural Priors for Crypto Microstructure¶
Encoding Microstructure Recurrence: Architectural Inductive Biases for High-Frequency Financial Time Series
Hypothesis:
(OHLCV × microstructure × time) 联合空间里的决策边界含有一种递归性结构——价格行为反馈到 microstructure,microstructure 再反馈回价格——而 tabular / treelike model 把这种结构当 noise 学。这是 LightGBM / Optuna 在 chronological split 下显著过拟合(val MTM +44 → test MTM -86)的根本原因。
Approach:
将 Li Guodong 2023 ICLR Oral 的 Encoding Recurrence into Transformers 核心思想——为递归结构设计 explicit architectural prior——扩展到金融微结构数据。具体形式:
- bookTicker → OHLCV → bookTicker 的反馈循环作为 cross-modality recurrence
- 不同时间尺度(15s tick / 1m bar / 1h trend)的层次化递归
- 在 526 symbol × 12 fold walk-forward pipeline 上 benchmark (i) tree models (ii) standard time-series Transformers (iii) inductive-bias-aware architecture
Why it matters:
- 学术意义:把 architectural prior 研究扩展到有 hard verification(real PnL)的领域
- Positioning:在 2024-2026 时序 foundation model 的 scaling 路线和 architectural prior 路线之间取舍,2025 position paper "No Champions in Long-Term TSF" 表明这件事远未定论
- Connection to Prof. Guodong Li: 直接延伸 ICLR 2023 Oral 工作
RQ2 · Open-World LLM Agent Safety¶
Beyond Schema Validation: Statistical Safety Guarantees for LLM Agents in High-Stakes Decision Settings
Hypothesis:
In an open-world setting, the failure modes you can enumerate are not the failure modes that will actually happen.
当前 LLM agent 安全机制全是 deterministic rules(schema validation / cap stack / reduce-only / KMS 抽象 / audit append-only)。在 production 是必要的,但它假设你能写出所有规则。
Approach:
将 conformal prediction 应用到 LLM agent output 的可信度估计——给每个 LLM 生成的 intent 一个 distribution-free coverage guarantee。设计 dynamic safety margin 机制——当 conformal interval 变宽(分布漂移信号),系统自动收紧 deterministic gate 阈值。用现有 production system 做 testbed,L1-L6 中的某些 layer 装上这个机制做对照实验。
Why it matters:
- 学术意义:当前 LLM safety 研究主要在 alignment / RLHF 层面,缺少统计上有保证的 production-time safety mechanism
- Connection to Prof. Kai Han: Open-world reliability 是 Visual AI Lab 的 stated mission;crypto markets 是 open-world 的极端实例
- Cross-modal: 方法论原则上能扩展到 vision agent、robotic agent 等其他 open-world 设置
RQ3 · Cognition Base for Financial Domains¶
Building a Compute-Scalable Cognition Base for Discovery in Open Financial Domains
Hypothesis:
Cognition Base 的质量是 scaling law 能否成立的隐藏因果变量。
ASI-ARCH 的 scaling law 之所以成立,根本前提是 Cognition Base 注入了人类几十年的领域先验。在金融领域复制这套架构,最大的工程挑战不是 multi-agent 框架,是 Cognition Base 的构建——且这个 Cognition Base 必须是 replication-aware 的。
Approach:
跨数据源结构化:
- Hou-Xue-Zhang factor zoo + JKP factor library(优先 verified anomalies)
- 微结构论文(market microstructure literature)
- 监管文件 + 上市公司公告
- 历史 anomaly 复盘 + 失效记录
每个 entry 抽取 {机制 / 检验方法 / 适用 universe / 失效 regime / replication strength / time-stamp 验证}。
关键操作判断(由 Hou-Xue-Zhang 2020 直接驱动):
Hou-Xue-Zhang 2020 retest 447 published anomaly,发现 65% 失败 single-test,82% 失败 multiple-test。Cognition Base 不能是 published anomaly 的 mere collection,必须是 replication-aware structured knowledge——每个 entry 带 replication strength weight。Researcher Agent query 时按 weight 排序。
Ablation study:
同一 multi-agent 框架 + 不同 Cognition Base(劣质 / 中质 / 高质)→ 比较 scaling 曲线。如果质量显著影响 scaling,就证明了"Cognition Base 是 scaling law 因果驱动"的 hypothesis。
5. The Core Proposal · Crypto-Alpha-Bench¶
这是上面三个 RQ 的前提性基础设施——也是单独可以作为 standalone publication 的研究 contribution。
5.1 Motivation: Why a Benchmark Now¶
读完整个 8-tradition survey 后最强烈的感受——这个领域缺一个 ImageNet 时刻。
| 现状问题 | 后果 |
|---|---|
| 每篇 paper 自己定义评估 | 跨方法无法对比,SOTA 声称无法验证 |
| 没有 fixed cost model | 同一方法在 optimistic 和 pessimistic cost 下结果差几倍 |
| 没有 compute-controlled comparison | LLM 方法的 SOTA 可能只来自 compute 优势 |
| 几乎不报告 Deflated Sharpe / PBO | False discovery rate 不知道 |
| 没有 negative-control baseline | "我比 random 强"不是合法 contribution claim |
| 工业 ALPHA 和学术 ALPHA 没有 distinguish | 高 IR 但不可执行的因子被当 winner |
核心论断:在 unified baseline 出现之前,alpha auto search 的"compute → discovery"因果关系无法验证,scaling law 无法在金融上 establish。先建 benchmark,再谈 scaling。
5.2 Six Requirements of a Good Baseline¶
R1 · Fixed public dataset
明确 universe(Binance USD-M top-200 perp)+ 时间窗口(2022-01 → 2025-12)+ 频率(1m / 15s / tick)。Public-releasable format(parquet + manifest),发到 HuggingFace Datasets。
R2 · 三档 cost model
| Tier | Slippage | Spread | Queue Priority Penalty |
|---|---|---|---|
| Optimistic | 0 | mid-price | 无 |
| Realistic | 经验冲击 | half-spread | partial fill probability |
| Pessimistic | 上限冲击 | full spread | queue priority hard rejection |
强制报告全部三档下的 metric。
R3 · Compute-controlled budget
固定 token budget / GPU hours / wall-clock time。LLM-driven 和 GP 在 compute 上差 1-2 个数量级,不控制就是不公平对比。
R4 · 5+ 维评估强制报告
| 维度 | Source |
|---|---|
| Predictive Power | AlphaEval 2025 |
| Stability | AlphaEval 2025 |
| Robustness to Market Perturbations | AlphaEval 2025 |
| Financial Logic | AlphaEval 2025 |
| Diversity | AlphaEval 2025 |
| Capacity | 我额外加(产业相关) |
| Deflated Sharpe Ratio | López de Prado 2014 |
| PBO | Bailey et al. 2013 |
不能 cherry-pick。
R5 · Synthetic ground-truth sub-task
在 main benchmark 之外加 synthetic data 子任务——已知真实 alpha 生成数据,测试方法是否 recover。isolate "方法能力" from "数据运气"。这件事在 vision 早就标配(toy task with known structure),alpha mining 几乎没人做。
R6 · Replication-aware "must-beat" baseline
JKP factor library 上验证过的 anomaly 作为最低 baseline。任何新方法必须 beat JKP-verified baseline 才能讨论 contribution。不允许"我比 random 强"作为合法 claim。
5.3 Reference Baselines to Implement¶
最低 6-7 个 reference baseline,覆盖各 tradition:
| Baseline | Tradition | 用途 |
|---|---|---|
| Random Search | Control | Negative control floor |
| JKP-Verified Anomaly Pool | Factor Zoo | Must-beat baseline |
| gplearn (default config) | Classical GP | Tradition 1 representative |
| FactorVAE | DL Factor | Tradition 2 representative |
| AlphaAgent | LLM-driven | Tradition 3 representative |
| Frozen-LLM-prompting (GPT-4 direct) | LLM | Naïve LLM baseline |
| M8.6 Walk-forward + Adaptive State | Tradability gate | 我现有 production system |
最后一项是关键——把我已有的 walk-forward + microstructure gate + adaptive state controller 包装成 standard tradability gate baseline,distinguish "学术 alpha" 和 "可执行 alpha"。
5.4 Strategic Positioning¶
为什么 benchmark paper 比 method paper 更值得做(在这个 stage):
- 审稿人友好:method paper 要 beat SOTA(受 reviewer discretion),benchmark paper 只要 protocol 严谨。NeurIPS Datasets & Benchmarks track、ICLR Benchmarks 都有专门 venue
- Leverage 高:所有后续 alpha auto search 论文都得 cite / 用 protocol,比单篇 method 影响 10x+
- 现有工程优势对齐:我已经有 70% infrastructure,别人复制要 6-12 个月
- 三个 RQ 的发表载体:建立 benchmark 后,RQ1 / RQ2 / RQ3 自然变成"在 benchmark 上 establish SOTA 或 negative result"
- 对 HKU 申请的双重 alignment:李教授 own 统计严谨性部分,韩教授 own open-world / adversarial robustness 部分
5.5 Risks & Open Questions¶
R-1 · 数据可发布性:Binance USD-M 历史数据本身公开,但我清洗、对齐、去 gap 后的版本是否能 release 需要确认。HuggingFace Datasets 上有类似 crypto OHLCV,但 15s 高频 + 全市场 + cleaned 的没人发过。
R-2 · 数据漂移问题:crypto 演化快,2022 年数据对 2026 年没 representative。benchmark 需要 versioned(v1 / v2 / v3 每 6 个月 refresh)。这反而是 feature 不是 bug——distribution drift 本身就是 benchmark 想 evaluate 的能力。
R-3 · Cost model 主观性:三档 cost 的具体参数怎么定?建议从公开数据反推:用 Binance fills 数据估计 realistic slippage / queue priority,公开校准过程。
R-4 · 谁来维护 leaderboard:第一年我自己维护。第二年起需要 community / 机构 sponsor(HKU lab?arxiv 资助?)。
R-5 · Compute budget 是否阻碍参与:100 GPU hour budget 对学术 lab 可承受,对个人玩家偏紧。可以分 small / medium / large 三档 budget tier,按 budget 分别排 leaderboard。
6. Connection to HKU¶
6.1 Prof. Kai Han (CDS / Visual AI Lab)¶
- Open-world reliability: Crypto perpetuals 是 open-world learning 的极端实例(non-stationary、adversarial、unknown failure modes、costly mistakes)
- Foundation model 视角: LLM L1-L6 agent stack 是 foundation model 在高风险决策环境下的部署案例
- RQ2 直接相关: Open-world LLM Agent Safety = his lab's open-world theme + my production testbed
- Benchmark 的 robustness/adversarial 子模块他可以 own
6.2 Prof. Guodong Li (SAAS)¶
- Time series + financial econometrics 直接对口
- ICLR 2023 Oral Encoding Recurrence 是 RQ1 直接 anchor:我的 RQ1 是把这个 thesis 从 standard TSF 扩展到 crypto microstructure
- High-dimensional ML + Quantile Regression: 对 cross-symbol 联合建模、conditional distribution 建模都有方法学 contribution
- Benchmark 的统计严谨性部分(PBO / DSR / synthetic ground-truth)他可以 own
- 小样本时间序列(他和华为 2018 合作方向)和 crypto regime stability 问题有方法学交集
6.3 Combined Value Proposition¶
I bring production-grade infrastructure for the verification half of alpha auto search. HKU's Visual AI Lab brings open-world ML methodology; SAAS brings time series statistical rigor. Together we build the field's first unified benchmark.
7. 12-Week Roadmap¶
| Phase | Week | Deliverable |
|---|---|---|
| 0. 汇报 | 0 (this week) | HKU talk to Prof Han + Prof Li; share this RESEARCH_PLAN |
| 1. Proposal sharpening | 1-2 | Based on talk feedback, write 8-page benchmark proposal |
| 2. Dataset prep | 3-5 | Clean + release Crypto-Alpha-Bench v0 dataset on HuggingFace |
| 3. Protocol & metrics | 4-6 | Write protocol spec; implement evaluation infrastructure (5+ metrics, 3 cost tiers, DSR + PBO) |
| 4. Reference baselines | 5-8 | Implement 6-7 reference baselines on benchmark |
| 5. Synthetic ground-truth | 6-9 | Build synthetic data generator with known alpha; run reference baselines on it |
| 6. Public leaderboard | 9-10 | Launch v0 leaderboard; document protocol |
| 7. Benchmark paper draft | 10-12 | NeurIPS Datasets & Benchmarks submission draft |
Parallel track(RQ1 first preliminary):
| Week | Sub-deliverable |
|---|---|
| 4-7 | RQ1: implement REM-inspired architectural prior; baseline against PatchTST / Chronos zero-shot |
| 7-10 | RQ1: walk-forward results + statistical significance (DSR + PBO) |
| 10-12 | RQ1: combined with benchmark paper as "use case" section |
RQ2 + RQ3 留到 Phase 7+(benchmark 立住后再做)。
8. Risks · Open Questions · Reality Checks¶
8.1 What if benchmark idea doesn't fly?¶
如果 HKU 反馈"benchmark 不是研究 contribution",备选 plan B 是:
- 直接做 RQ1 作为方法论 paper(architectural prior for crypto microstructure)
- 用现有 M8.6 infrastructure 作为 evaluation testbed(不公开发布)
- 在 RQ1 paper 里 implicit 推 benchmark idea
8.2 What if I'm wrong about "no champion"?¶
如果时序 foundation model(Chronos / TimesFM)在 crypto 高频上其实够好,RQ1 的研究 niche 缩小。preliminary experiment 必须先 verify "foundation model 在 crypto 15s 上确实表现差"。这件事 2-3 周能跑出来。
8.3 What if Cognition Base 投入产出比太差?¶
构建 replication-aware Cognition Base 是数据工程,可能 6-12 个月看不到 method-level 突破。Plan B 是把 Cognition Base 范围缩窄到 crypto microstructure literature(百篇量级),先证明 minimum viable case。
8.4 单人 → 团队的 transition risk¶
我从 single-developer 进入研究环境,最大风险是协作 friction。preemptive mitigation:
- 所有代码 / data / protocol 走严格 docs + reproducibility 标准
- 一开始就用 git + CI + standardized 流程
- 把 self-evolution research reference 类型的 "design philosophy" 文档作为 onboarding material
8.5 学术压力 vs production discipline 的张力¶
学术鼓励 novelty,production 鼓励 reliability。我 explicit aware 这种张力,且认为"benchmark + 严格 verification" 是两者的 natural intersection——learn 学术 novelty,保留 production discipline。
9. Repository Contents Index¶
alpha-search-frontier-notes/
├── README (this file is RESEARCH_PLAN.md – also serves as entry point)
├── RESEARCH_PLAN.md ← you are here
├── alpha_search_baselines.md ← 5-7 篇 frontier paper 精读
├── alpha_search_survey_taxonomy_and_bibliography.md ← 8-tradition systematic bibliography
├── alpha_search_deep_reads.md ← 三个深读 cluster(汇报版摘要)
├── alpha_search_deep_reads_expanded.md ← 深读扩展版:逐篇方法解析 + 实验落地
├── financial_sota_agent_survey.md ← 金融 SOTA agent / benchmark gap 详版整理
├── crypto_alpha_bench_risk_analysis.md ← benchmark 反方风险分析
├── human_expert_in_loop_research_direction.md ← human expert baseline / tacit knowledge 方向修订
├── HKU_MEETING_PREP_2026-05-20.md ← HKU 会前准备总包
├── HKU_ONE_PAGE_HANDOUT_2026-05-20.md ← 可发给老师的一页摘要
├── HKU_12_SLIDE_DECK_2026-05-20.md ← 12 页精简版 slide draft
├── HKU_REPORT_OUTLINE_2026-05-20.md ← 明日汇报大纲 + Q&A anchor
├── HKU_REPORT_OUTLINE_COMPACT_2026-05-20.md ← 聚焦版:recent work → review → baseline suite
├── HKU_30_SLIDE_DECK_2026-05-20.md ← 30 页扩展版 slide outline
├── HKU_Crypto_Alpha_Bench_Report_2026-05-20.pptx ← 明日汇报 PPTX(完整 benchmark 叙事版)
├── HKU_Baseline_Suite_Compact_Report_2026-05-20.pptx ← 明日汇报 PPTX(聚焦 outline 版)
└── HKU_30_Slide_Baseline_Suite_Report_2026-05-20.pptx ← 明日汇报 PPTX(30 页扩展版)
Reading order suggested:
- First-time visitor → this RESEARCH_PLAN.md only(30 min)
- Specific frontier paper →
alpha_search_baselines.md(Section 1-5) - Wide-coverage survey →
alpha_search_survey_taxonomy_and_bibliography.md - Q&A preparation / 方法学严谨性 →
alpha_search_deep_reads.md - Deep technical preparation →
alpha_search_deep_reads_expanded.md - SOTA agent / benchmark gap →
financial_sota_agent_survey.md - Tomorrow's talk →
HKU_30_SLIDE_DECK_2026-05-20.mdfirst; thenHKU_30_Slide_Baseline_Suite_Report_2026-05-20.pptx
10. Contact & Collaboration¶
Maintained by: Paul Weng (paulweng) Related project: whatsapp_AI_Trading_Agent — production crypto CTA system that provides the verification testbed referenced throughout this plan
Current status (2026-05-18):
- Phase 0 (HKU talk this week)
- Solo research, single-developer
- Looking for academic collaboration to scale from "production system + research thinking" to "production system + research community + scaled compute"
This document is a living research plan. Section 7 roadmap will be updated weekly during Phase 1-2; quarterly thereafter.
"My production system already implements the philosophy of generator-verifier separation that AlphaProof articulated. The next step — turning a deterministic execution platform into a self-evolving research platform, and building the field's first unified benchmark — is exactly where I want to do my research."