HKU 12-Slide Deck Draft · 2026-05-20¶

Compact version for a 15-20 min research discussion. Use hku_talk_script_and_ppt.md as the long-form script and this file as the slide build source.

Slide 1 · From Production AI Trading Agent to Self-Evolving Research¶

Subtitle: A path toward Crypto-Alpha-Bench

On slide

Paul Weng
HKU research discussion
2026-05-20

Speaker note

Thank both professors. Say upfront: this is not a trading pitch; it is a research infrastructure pitch.

Slide 2 · Core Thesis¶

On slide

I have built the verification half of an AI-driven alpha discovery platform.
The missing research substrate is a unified benchmark.

Speaker note

My current system already embodies generator-verifier separation. The research question is how to turn that into a reproducible alpha auto search platform.

Slide 3 · What I Built¶

On slide

Production crypto trading agent.
OMS + live adapter + audit + risk engine.
Telegram / WhatsApp-style human interface.
LLM L1-L6 assistant layers.

Speaker note

Emphasize solo build and production-grade discipline, but keep this short. The meeting is not about listing features.

Slide 4 · The Hard Boundary¶

On slide

LLM never reaches the OMS directly.

Pipeline:

Natural language → schema intent → deterministic risk gate → button confirmation → OMS

Speaker note

Compare to AlphaProof's Lean-kernel idea only at the architectural-philosophy level. The verifier is not mathematically perfect, but it is independent of LLM generation.

Slide 5 · Walk-Forward Verification Testbed¶

On slide

526 Binance USD-M symbols.
15s bar infrastructure.
12-fold rolling walk-forward.
Microstructure gate.
Adaptive state controller.

Speaker note

This is the strongest differentiator: a real verification environment for executable alpha, not just paper IC.

Slide 6 · Negative Result: Time-Slice Stability¶

On slide

Validation looked good; chronological test failed.

LightGBM: validation MTM positive, test MTM negative.
Optuna: searched rules overfit chronological window.

Diagnosis

Bottleneck = time-slice stability, not model capacity.

Speaker note

This is the bridge to Prof. Li. Say current evidence is a production observation, not final statistical proof.

Slide 7 · Frontier Pattern¶

On slide

FunSearch → AlphaProof → AlphaEvolve → AI Scientist → ASI-ARCH

Common pattern:

Generator-verifier separation.
Cognition base.
Multi-agent decomposition.
Compute-scaled discovery.

Speaker note

I have verification. I do not yet have discovery. But more importantly, the alpha-search field cannot compare discovery methods cleanly.

Slide 8 · Mapping My System to the Frontier¶

On slide

Component	Status
Generator-verifier separation	Done
Hard risk verifier	Done
Walk-forward testbed	Done
Cognition base	Missing
Researcher agent	Missing
Compute-scaled discovery	Missing
Unified field benchmark	Missing

Speaker note

The missing benchmark is not just my gap; it is a field gap.

Slide 9 · The Field Has No ImageNet Moment¶

On slide

Alpha auto search today:

Different datasets.
Different costs.
Different folds.
Different compute budgets.
Weak multiple-testing correction.
No standard negative control.

Claim

Without fixed evaluation, "compute → discovery" cannot be tested in finance.

Speaker note

This is the strongest claim. Say it calmly and invite pushback.

Slide 10 · Proposal: Crypto-Alpha-Bench¶

On slide

Six requirements:

Fixed public crypto perp dataset.
Three cost tiers.
Compute-controlled budgets.
Multi-metric evaluation + DSR + PBO.
Synthetic ground-truth tasks.
Replication-aware must-beat baselines.

Speaker note

This is the concrete research artifact. It can be a benchmark paper and the platform for later method papers.

Slide 11 · Three Research Use Cases¶

On slide

RQ	Claim	HKU connection
RQ1	Microstructure recurrence priors	Prof. Li
RQ2	Open-world LLM-agent safety	Prof. Han
RQ3	Cognition base causal effect	Both

Speaker note

The RQs are not three disconnected proposals. They become benchmark use cases.

Slide 12 · Ask¶

On slide

I want feedback on one decision:

Should I pursue Crypto-Alpha-Bench first, or narrow the first paper to RQ1: architectural priors for crypto microstructure recurrence?

Specific asks

Is benchmark-first academically credible?
Is crypto-only too narrow?
What is the smallest 8-12 week experiment that would convince you?

Speaker note

End with openness. Do not ask for endorsement; ask for sharpening.