HKU 12-Slide Deck Draft · 2026-05-20¶
Compact version for a 15-20 min research discussion. Use
hku_talk_script_and_ppt.mdas the long-form script and this file as the slide build source.
Slide 1 · From Production AI Trading Agent to Self-Evolving Research¶
Subtitle: A path toward Crypto-Alpha-Bench
On slide
- Paul Weng
- HKU research discussion
- 2026-05-20
Speaker note
Thank both professors. Say upfront: this is not a trading pitch; it is a research infrastructure pitch.
Slide 2 · Core Thesis¶
On slide
I have built the verification half of an AI-driven alpha discovery platform.
The missing research substrate is a unified benchmark.
Speaker note
My current system already embodies generator-verifier separation. The research question is how to turn that into a reproducible alpha auto search platform.
Slide 3 · What I Built¶
On slide
- Production crypto trading agent.
- OMS + live adapter + audit + risk engine.
- Telegram / WhatsApp-style human interface.
- LLM L1-L6 assistant layers.
Speaker note
Emphasize solo build and production-grade discipline, but keep this short. The meeting is not about listing features.
Slide 4 · The Hard Boundary¶
On slide
LLM never reaches the OMS directly.
Pipeline:
Natural language → schema intent → deterministic risk gate → button confirmation → OMS
Speaker note
Compare to AlphaProof's Lean-kernel idea only at the architectural-philosophy level. The verifier is not mathematically perfect, but it is independent of LLM generation.
Slide 5 · Walk-Forward Verification Testbed¶
On slide
- 526 Binance USD-M symbols.
- 15s bar infrastructure.
- 12-fold rolling walk-forward.
- Microstructure gate.
- Adaptive state controller.
Speaker note
This is the strongest differentiator: a real verification environment for executable alpha, not just paper IC.
Slide 6 · Negative Result: Time-Slice Stability¶
On slide
Validation looked good; chronological test failed.
- LightGBM: validation MTM positive, test MTM negative.
- Optuna: searched rules overfit chronological window.
Diagnosis
Bottleneck = time-slice stability, not model capacity.
Speaker note
This is the bridge to Prof. Li. Say current evidence is a production observation, not final statistical proof.
Slide 7 · Frontier Pattern¶
On slide
FunSearch → AlphaProof → AlphaEvolve → AI Scientist → ASI-ARCH
Common pattern:
- Generator-verifier separation.
- Cognition base.
- Multi-agent decomposition.
- Compute-scaled discovery.
Speaker note
I have verification. I do not yet have discovery. But more importantly, the alpha-search field cannot compare discovery methods cleanly.
Slide 8 · Mapping My System to the Frontier¶
On slide
| Component | Status |
|---|---|
| Generator-verifier separation | Done |
| Hard risk verifier | Done |
| Walk-forward testbed | Done |
| Cognition base | Missing |
| Researcher agent | Missing |
| Compute-scaled discovery | Missing |
| Unified field benchmark | Missing |
Speaker note
The missing benchmark is not just my gap; it is a field gap.
Slide 9 · The Field Has No ImageNet Moment¶
On slide
Alpha auto search today:
- Different datasets.
- Different costs.
- Different folds.
- Different compute budgets.
- Weak multiple-testing correction.
- No standard negative control.
Claim
Without fixed evaluation, "compute → discovery" cannot be tested in finance.
Speaker note
This is the strongest claim. Say it calmly and invite pushback.
Slide 10 · Proposal: Crypto-Alpha-Bench¶
On slide
Six requirements:
- Fixed public crypto perp dataset.
- Three cost tiers.
- Compute-controlled budgets.
- Multi-metric evaluation + DSR + PBO.
- Synthetic ground-truth tasks.
- Replication-aware must-beat baselines.
Speaker note
This is the concrete research artifact. It can be a benchmark paper and the platform for later method papers.
Slide 11 · Three Research Use Cases¶
On slide
| RQ | Claim | HKU connection |
|---|---|---|
| RQ1 | Microstructure recurrence priors | Prof. Li |
| RQ2 | Open-world LLM-agent safety | Prof. Han |
| RQ3 | Cognition base causal effect | Both |
Speaker note
The RQs are not three disconnected proposals. They become benchmark use cases.
Slide 12 · Ask¶
On slide
I want feedback on one decision:
Should I pursue Crypto-Alpha-Bench first, or narrow the first paper to RQ1: architectural priors for crypto microstructure recurrence?
Specific asks
- Is benchmark-first academically credible?
- Is crypto-only too narrow?
- What is the smallest 8-12 week experiment that would convince you?
Speaker note
End with openness. Do not ask for endorsement; ask for sharpening.