Skip to content

HKU Research Discussion Outline · 2026-05-20

Topic: From Production AI Trading Agent to Crypto-Alpha-Bench
Audience: Prof. Kai Han, Prof. Guodong Li
Target length: 18-22 min presentation + discussion
Core goal: get directional feedback on whether to pursue benchmark-first or method-first.


0. One-Sentence Thesis

我已经做出 AI-driven alpha discovery 的 verification half。现在最值得推进的研究贡献不是“再做一个交易 agent”,而是建立一个可执行、可复现、可比较的 Crypto-Alpha-Bench,让金融里的 compute-scaled discovery 可以被严格检验。

English version:

I have built the verification half of an AI-driven alpha discovery platform. The missing research substrate is an executable crypto alpha-search benchmark.


Act 1 · I Already Built the Verifier

目的:证明你不是从 paper idea 出发,而是从 production-grade system 出发。

Must say:

  • LLM never reaches OMS directly.
  • Natural language only becomes schema intent.
  • Real-money actions pass deterministic risk gate, button confirmation, OMS, audit, reconciliation, kill switch.
  • M8.6 walk-forward testbed covers 526 Binance USD-M perpetual symbols, 15s bars, 12 rolling folds, microstructure gates, adaptive state controller.

Key transition:

This is not a trading pitch. It is a research infrastructure pitch: I already have the hard verifier.

Act 2 · The Negative Result Is the Research Signal

目的:把 recent project 的失败结果讲成研究问题,而不是工程失败。

Use the strongest numbers:

  • LightGBM chronological validation MTM: +44.65
  • LightGBM chronological test MTM: -86.87
  • Optuna search objective: +44.46
  • Optuna chronological test objective: -105.78

Diagnosis:

The bottleneck is time-slice stability, not model capacity.

Prof. Li hook:

This motivates architectural priors for crypto microstructure recurrence, rather than simply scaling generic models.

Act 3 · Existing SOTA Is Close, But Not Enough

目的:主动处理“是不是别人已经做了”的问题。

Must say:

  • AlphaBench already covers formulaic alpha mining benchmark.
  • RD-Agent(Q) already covers multi-agent quant R&D.
  • Hubble / FactorMiner / CogAlpha push safe generation, memory, and code evolution.
  • TradingAgents / QuantAgent are trading-decision agents, not alpha-search benchmarks.

Clean positioning:

The claim is not that nothing exists. The claim is that existing work stops before executable crypto alpha under cost, fill, statistical, and compute constraints.

Act 4 · Proposal: Crypto-Alpha-Bench

目的:给出一个可以发表、可以开源、可以协作的 research artifact。

Minimum requirements:

  1. Fixed public crypto perpetual dataset.
  2. Three transaction-cost tiers.
  3. Fill / tradability gate.
  4. DSR + PBO + null-search baseline.
  5. Compute-controlled search budget.
  6. Synthetic ground-truth tasks.
  7. Reference baselines, including AlphaBench-style search, RD-Agent-style workflow, GFlowNet/AlphaSAGE, M8.6 tradability gate, and optional human expert baseline.

Act 5 · HKU Fit + Ask

Prof. Han hook:

Crypto markets are an extreme open-world reliability setting for LLM agents: non-stationary regimes, adversarial counterparties, unknown failure modes, and costly mistakes.

Prof. Li hook:

The statistical core is time-series stability, financial econometrics, and recurrence-aware modeling under strict walk-forward verification.

Final ask:

Should I pursue Crypto-Alpha-Bench first, or narrow the first paper to RQ1: architectural priors for crypto microstructure recurrence?


2. Slide-by-Slide Outline

# Slide Claim Proof Object
1 Title This is a research infrastructure pitch, not a trading pitch. Title + thesis rail
2 Meeting Thesis I have the verifier; the field lacks the benchmark. Three-part argument map
3 What I Built The production system already enforces generator-verifier separation. Architecture flow
4 The Hard Boundary LLMs assist language and research, never irreversible actions. Control boundary diagram
5 M8.6 Testbed The strongest asset is a crypto executable-alpha verifier. Metric rail + walk-forward pipeline
6 Negative Result Time-slice stability, not model capacity, is the bottleneck. Val/test MTM bar proof
7 Frontier Pattern AI discovery systems scale only when generation is separated from verification. Pattern mapping
8 SOTA Reality Check Existing work is close; the gap is narrower and sharper. Agent landscape matrix
9 Benchmark Gap Current systems optimize formula quality or trade decisions, not executable crypto alpha. Gap funnel
10 Crypto-Alpha-Bench The contribution is a hard benchmark substrate. 7 requirement pillars
11 Baselines The benchmark should beat or absorb the current SOTA, not ignore it. Baseline ladder
12 Research Triangle Benchmark-first makes RQ1/RQ2/RQ3 measurable. HKU alignment map
13 8-12 Week Plan Start with a small v0 that tests the riskiest assumptions. Roadmap + risk ledger
14 Ask The meeting should decide benchmark-first vs method-first. Decision slide

3. Compressed 5-Minute Version

  1. I built a production crypto trading agent where LLMs are structurally separated from real-money execution.
  2. The research asset is the verifier: 526 Binance perps, 15s bars, rolling walk-forward, microstructure gates, adaptive state.
  3. LightGBM/Optuna showed validation wins but chronological test failures, suggesting time-slice instability.
  4. The latest SOTA already covers formula alpha mining and quant R&D agents, so the benchmark claim must be narrower.
  5. My proposal is Crypto-Alpha-Bench: executable crypto alpha search under cost, fill, DSR/PBO, and compute control.
  6. I want advice on whether to publish benchmark-first or start with RQ1 microstructure recurrence.

4. Q&A Anchors

If asked "Isn't AlphaBench already this?"

AlphaBench is the closest sibling. It benchmarks LLM formula alpha mining. Crypto-Alpha-Bench extends the target to executable crypto mid-frequency alpha under cost, fill, statistical correction, and compute constraints.

If asked "Why crypto?"

Crypto is narrow, but it is a clean open-world testbed: 24/7, public, non-stationary, adversarial, microstructure-rich, and execution-sensitive.

If asked "Why are you the right person?"

I may not have the largest academic platform, but I have production tradability infrastructure, real LLM-agent safety constraints, and a working verifier that academic teams would take months to reproduce.

If asked "What is the smallest next experiment?"

ETH validation experiment: rerun the LightGBM/Optuna time-slice stability test on ETH/USDT. If the validation-to-test reversal reproduces, benchmark motivation strengthens.


5. Closing Line

I am not proposing to let LLMs trade. I am proposing to benchmark whether AI systems can discover alphas that survive reality.