Skip to content

HKU 12-Slide Deck Draft · 2026-05-20

Compact version for a 15-20 min research discussion. Use hku_talk_script_and_ppt.md as the long-form script and this file as the slide build source.


Slide 1 · From Production AI Trading Agent to Self-Evolving Research

Subtitle: A path toward Crypto-Alpha-Bench

On slide

  • Paul Weng
  • HKU research discussion
  • 2026-05-20

Speaker note

Thank both professors. Say upfront: this is not a trading pitch; it is a research infrastructure pitch.


Slide 2 · Core Thesis

On slide

I have built the verification half of an AI-driven alpha discovery platform.
The missing research substrate is a unified benchmark.

Speaker note

My current system already embodies generator-verifier separation. The research question is how to turn that into a reproducible alpha auto search platform.


Slide 3 · What I Built

On slide

  • Production crypto trading agent.
  • OMS + live adapter + audit + risk engine.
  • Telegram / WhatsApp-style human interface.
  • LLM L1-L6 assistant layers.

Speaker note

Emphasize solo build and production-grade discipline, but keep this short. The meeting is not about listing features.


Slide 4 · The Hard Boundary

On slide

LLM never reaches the OMS directly.

Pipeline:

Natural language → schema intent → deterministic risk gate → button confirmation → OMS

Speaker note

Compare to AlphaProof's Lean-kernel idea only at the architectural-philosophy level. The verifier is not mathematically perfect, but it is independent of LLM generation.


Slide 5 · Walk-Forward Verification Testbed

On slide

  • 526 Binance USD-M symbols.
  • 15s bar infrastructure.
  • 12-fold rolling walk-forward.
  • Microstructure gate.
  • Adaptive state controller.

Speaker note

This is the strongest differentiator: a real verification environment for executable alpha, not just paper IC.


Slide 6 · Negative Result: Time-Slice Stability

On slide

Validation looked good; chronological test failed.

  • LightGBM: validation MTM positive, test MTM negative.
  • Optuna: searched rules overfit chronological window.

Diagnosis

Bottleneck = time-slice stability, not model capacity.

Speaker note

This is the bridge to Prof. Li. Say current evidence is a production observation, not final statistical proof.


Slide 7 · Frontier Pattern

On slide

FunSearch → AlphaProof → AlphaEvolve → AI Scientist → ASI-ARCH

Common pattern:

  1. Generator-verifier separation.
  2. Cognition base.
  3. Multi-agent decomposition.
  4. Compute-scaled discovery.

Speaker note

I have verification. I do not yet have discovery. But more importantly, the alpha-search field cannot compare discovery methods cleanly.


Slide 8 · Mapping My System to the Frontier

On slide

Component Status
Generator-verifier separation Done
Hard risk verifier Done
Walk-forward testbed Done
Cognition base Missing
Researcher agent Missing
Compute-scaled discovery Missing
Unified field benchmark Missing

Speaker note

The missing benchmark is not just my gap; it is a field gap.


Slide 9 · The Field Has No ImageNet Moment

On slide

Alpha auto search today:

  • Different datasets.
  • Different costs.
  • Different folds.
  • Different compute budgets.
  • Weak multiple-testing correction.
  • No standard negative control.

Claim

Without fixed evaluation, "compute → discovery" cannot be tested in finance.

Speaker note

This is the strongest claim. Say it calmly and invite pushback.


Slide 10 · Proposal: Crypto-Alpha-Bench

On slide

Six requirements:

  1. Fixed public crypto perp dataset.
  2. Three cost tiers.
  3. Compute-controlled budgets.
  4. Multi-metric evaluation + DSR + PBO.
  5. Synthetic ground-truth tasks.
  6. Replication-aware must-beat baselines.

Speaker note

This is the concrete research artifact. It can be a benchmark paper and the platform for later method papers.


Slide 11 · Three Research Use Cases

On slide

RQ Claim HKU connection
RQ1 Microstructure recurrence priors Prof. Li
RQ2 Open-world LLM-agent safety Prof. Han
RQ3 Cognition base causal effect Both

Speaker note

The RQs are not three disconnected proposals. They become benchmark use cases.


Slide 12 · Ask

On slide

I want feedback on one decision:

Should I pursue Crypto-Alpha-Bench first, or narrow the first paper to RQ1: architectural priors for crypto microstructure recurrence?

Specific asks

  • Is benchmark-first academically credible?
  • Is crypto-only too narrow?
  • What is the smallest 8-12 week experiment that would convince you?

Speaker note

End with openness. Do not ask for endorsement; ask for sharpening.