Crypto-Alpha-Bench · One-Page Handout¶
Presenter: Paul Weng
Meeting: HKU research discussion · 2026-05-20
Core theme: From production AI trading infrastructure to reproducible alpha auto search research.
1. One-Sentence Thesis¶
My production trading system already implements the verification half of AI-driven alpha discovery: LLMs are kept away from the order path, and every irreversible action must pass deterministic risk gates. The next research step is to build the field's first unified benchmark for alpha auto search, so that compute-scaled discovery can be tested rigorously in finance.
2. What I Have Built¶
- Production-grade crypto trading agent, from domain model / OMS / live adapter / audit / risk gate to LLM L1-L6 assistant layers.
- Strict generator-verifier separation:
- LLM only parses, explains, and assists.
- Deterministic risk gate, kill switch, KMS abstraction, append-only audit, and OMS state machine control real-money actions.
- M8.6 walk-forward verification system:
- 526 Binance USD-M perpetual symbols.
- 15s bar infrastructure.
- Microstructure gate.
- Adaptive state controller.
- Chronological validation discipline.
3. Key Empirical Observation¶
LightGBM / Optuna experiments showed strong validation performance but poor chronological test performance.
Interpretation:
The bottleneck is time-slice stability, not model capacity.
This motivates research into architectural priors for crypto microstructure, rather than simply scaling generic models.
4. Field-Level Gap¶
Recent AI-for-science systems share a pattern:
- Generator-verifier separation.
- Knowledge grounding / cognition base.
- Multi-agent decomposition.
- Compute-scaled discovery.
But alpha auto search lacks a unified benchmark:
- Different papers use different datasets.
- Cost models are not standardized.
- Compute budgets are not controlled.
- DSR / PBO / multiple-testing correction are rarely reported.
- Academic alpha and executable alpha are often mixed together.
Without a fixed benchmark, "compute → discovery" cannot be tested in finance.
5. Proposal: Crypto-Alpha-Bench¶
A unified benchmark for alpha auto search on crypto perpetual futures.
Minimum requirements:
- Fixed public dataset.
- Three transaction-cost tiers: optimistic / realistic / pessimistic.
- Compute-controlled budgets.
- Multi-metric evaluation: predictive power, stability, robustness, financial logic, diversity, capacity, DSR, PBO.
- Synthetic ground-truth tasks with known alpha.
- Replication-aware must-beat baselines.
Reference baselines:
- Random search.
- JKP-verified anomaly pool.
- gplearn.
- FactorVAE.
- AlphaAgent / LLM-driven baseline.
- Frozen LLM prompting.
- My M8.6 walk-forward + adaptive state tradability baseline.
- Optional extension: human expert discretionary baseline.
6. Research Use Cases¶
| RQ | Question | Natural HKU Connection |
|---|---|---|
| RQ1 | Can architectural recurrence priors improve crypto microstructure modeling? | Prof. Guodong Li: time series, financial econometrics, Encoding Recurrence into Transformers |
| RQ2 | Can LLM-agent safety in open-world financial settings get statistical guarantees? | Prof. Kai Han: open-world reliability, foundation / agentic AI |
| RQ3 | Does cognition-base quality causally affect discovery scaling? | Cross-disciplinary: knowledge grounding + rigorous verifier |
7. Ask¶
I would like feedback on one decision:
Should I pursue Crypto-Alpha-Bench as the first research contribution, or narrow the first paper to RQ1: architectural priors for crypto microstructure recurrence?
Either path uses the same production verification infrastructure as the research testbed.