Human Expert in the Loop · A Research Direction Revision¶

Tape-reading phenomenon, why ML hasn't captured it, and a revised diagnostic on what the project's hidden differentiator actually is.

2026-05-18 · Paul Weng

0. Context¶

This document records a critical pivot in how I think about my own project and the Crypto-Alpha-Bench proposal. The pivot was triggered by one observation:

顶级人类 trader 能高效阅读盘口信号、做出好的择时，捕捉交易机会。

This is real. It's documented in finance literature. And it has profound implications for what "breakthrough" means for this project.

1. The Tape-Reading Phenomenon — Real, Documented, Still Open¶

Top-tier discretionary traders demonstrably do things ML hasn't replicated:

识别 spoofing vs real intent：通过 cancellation pattern / 时间节律判断挂单真伪
检测 absorption / iceberg：价格在某价位反复但不穿透 → hidden liquidity 信号
读 aggressor identity：通过订单 size 分布、频率、节律推断 retail / MM / institution
识别 liquidity vacuum：order book 突变薄的几百毫秒预示反向 momentum
读 micro-momentum from tape：连续同向 print 的 micro-regime
多 venue 同步：跨所微观时差里的 latent arbitrage

Relevant literature¶

Mark Fisher — Tape Reading（古典框架）
Robert Almgren — order flow analysis
Rama Cont — order flow imbalance, LOB modeling
Justin Sirignano — DeepLOB, Universal LOB models
Steidlmayer / Dalton — Market Profile

The honest gap in academic ML¶

学术 LOB modeling 报告 short-horizon prediction accuracy 55-65%。但扣除交易成本后能赚到的 alpha 大部分时候是 0 甚至负的。

顶级 tape reader 居然能在这条线上有 sustained edge，本身是一个未解科学问题。

2. Why ML Hasn't Captured This — Four Real Reasons¶

Reason 1 · Snapshot features 丢掉了 dynamics¶

我之前用的 microstructure gate 看的是某时刻的 spread / depth5 / bookTicker 快照。但 tape reader 看的是这些数值的变化率和模式——

depth5 stair-step 100k→80k→60k→20k = liquidity vanishing
depth5 直接跳 100k→20k = spoof being pulled
静态 snapshot 完全相同，意味相反

ML 能学这种 dynamics 吗？需要 时序 representation（LSTM/Transformer over order book deltas），不是 snapshot feature。15s OHLCV bar 已经 aggregate 掉这层信息——看不到 15s 内 200ms 颗粒度的 order book delta sequence。

Reason 2 · Label 是 strategy-specific，不是 microstructure-specific¶

我训 LightGBM 的 label 是"接下来 30-60 min 内能否干净 TP"。Tape reader 看的是接下来 5-30 秒的 micro-momentum direction。两个时间尺度差 100x。

信号衰减 timescale × 预测 horizon 不匹配，过拟合是必然的。

Reason 3 · Counterfactual reasoning 缺失¶

Tape reader 在做 off-policy reasoning：

"如果我现在买入，order book 会怎么变？我的单子会被 absorb 还是 push 价格？"

ML 从历史 fills 数据学不到——历史 fills 是执行决策被做了之后的数据。这是 offline RL 核心难题，当前没有 deployable solution。

真高手同时在看：当前 symbol order book + tape + BTC 主图 + funding/OI + news + session 切换 + 历史 high/low。

他的 implicit model 是 multi-modal + long-context。LightGBM 只能 process single-symbol snapshot——已经 strip 掉 99% 他在用的 context。

3. Diagnostic Revision¶

My previous claim (now partially wrong)¶

"你 99% 时间花在 execution filter，0% 花在 alpha generation。"

Corrected claim¶

你 99% 时间花在 "systematic capture of execution intuition"，0% 时间花在 "如何把交易员的盘口直觉系统化"。

The hidden assumption I missed¶

项目 README 写得清楚——"alpha 来自他的盘感和市场判断"。系统是注意力保护层。

我 missed 的 assumption：

"刘总捞鱼" (passive maker + TP/SL) 是交易员盘感的合理 vehicle。

可能不是。如果交易员真实优势是 directional tape reading（5-30 秒级 micro-momentum 识别），让他通过 passive limit order TP/SL 框架交易——相当于让橄榄球四分卫去打台球。技能不匹配。

Revised hypothesis¶

项目没突破不是没做 alpha generation——是 "vehicle 形态" 和 "交易员真实能力" 不 match，且 "human expert + system" 这个本来就是 differentiator 的 setup 没被充分 leverage。

4. Three Breakthrough Directions¶

Direction A · Vehicle Pivot to Directional Tape-Aware Execution¶

不再用 passive maker + TP/SL。改成——

交易员标记 directional intent（"BTC 接下来 5 分钟向上"）
系统帮他做 micro-timing（5 分钟窗口内找最佳入场点）
Execution algo 是 aggressive limit ladder 或 sniper order，不是 passive maker
TP/SL 让交易员动态决定，系统不强制

Infrastructure 复用：OMS / KMS / Telegram UX / KillSwitch 直接用。只换 vehicle。

New components: - Directional intent input UI - Micro-timing execution algo (5-60 秒) - Microstructure gate 用法变成 timing optimizer，不是 tradability filter

Direction B · LLM Tacit Knowledge Extraction（最有学术价值）¶

每次交易员做决策时——

系统记录完整 multi-modal market state（order book + tape + cross-symbol + funding + news context）
LLM agent prompt 交易员用自然语言解释决策（"为什么现在买"）
Log (state, decision, rationale, outcome) tuple
大量样本后，fine-tune LLM agent 学 tacit knowledge → 解释链 → 决策的 mapping

研究意义：

处理 Polanyi 的 tacit knowledge 经典问题
Vision-language model for expert demonstration 的金融版本（直接对接韩老师 lab）
Time series decision making with discretionary supervision（直接对接李老师方向）
数据基础设施已有；唯一缺 systematic interview / labeling protocol

为什么这是 HKU 学生通常做不出来的研究方向：他们没有 access 到一个真实做单的高手 + production-grade 记录系统。我两件都有。

Direction C · Microstructure 颗粒度升级 + 时间尺度对齐¶

如果坚持现有 vehicle，最低限度要做的——

数据从 15s OHLCV → tick-by-tick / 100ms order book delta
ML target 从 60min TP → 5-30 秒方向预测
Feature 从 snapshot → rolling window dynamics（cancellation rate / refresh rate / depth velocity）
模型从 LightGBM → LSTM/Transformer over delta sequence（DeepLOB family）

学术上有路径（Sirignano / Cont 那条线），但单独做要 6-12 个月，且 deployable edge 未必跑得过 HFT firms。ROI 一般。

5. Implication for Crypto-Alpha-Bench¶

关键 add-on¶

Crypto-Alpha-Bench 的 reference baselines 加一个 "Human Expert Discretionary Baseline"。

具体——

交易员在 benchmark dataset 上做 discretionary 决策（一段时间样本）
Log 决策 + outcome
把 human discretionary 作为 baseline，和 gplearn / FactorVAE / AlphaAgent 等并列

为什么这件事 transformative¶

大多数 alpha mining paper 用合成数据或自动化方法，没有人类专家 baseline。Crypto-Alpha-Bench 加这一档：

立刻让 benchmark 有学术界没有的 differentiator
回应 "为什么是你做这个 benchmark 而不是 GAIR/Stanford"——他们没有这个 access
把 Direction B（LLM tacit knowledge extraction）和 benchmark 同 paper 嵌入

Updated benchmark baseline table¶

Baseline	Tradition	New / Old
Random Search	Control	—
JKP-Verified Anomaly Pool	Factor Zoo	—
gplearn	Classical GP	—
FactorVAE	DL Factor	—
AlphaAgent	LLM-driven	—
Frozen-LLM-prompting	LLM	—
M8.6 Walk-forward + Adaptive State	Tradability gate	—
Human Expert Discretionary	Discretionary	NEW
LLM agent fine-tuned on expert	Expert imitation	NEW (Direction B)

6. Summary of What Changed in My Thinking¶

主题	Before this conversation	After this conversation
Why no breakthrough?	Wrong question (filter not alpha)	Right question (vehicle mismatch + human-in-loop not leveraged)
Project's biggest asset	Production verification infrastructure	Production verification infrastructure + real-money discretionary expert in the loop
Breakthrough direction	Pivot to alpha generation (stat arb / funding arb)	Pivot to human-expert-augmented system + alpha generation 作为 secondary
Benchmark differentiator	Cleaned 526-symbol × 15s data	Data + Human Expert Baseline + LLM-from-expert baseline
RQ priority	RQ1 (architectural prior) first	New potential RQ-A: Learning tape-reading from production trading decisions 可能比 RQ1 还有学术 originality

7. Pending Decisions (For Me to Think About)¶

我 still 需要想清楚的事——

Vehicle Pivot (Direction A) 是否真的值得做？还是 Direction B 直接跳过 vehicle 改造，专注做 tacit knowledge extraction？
如何说服我的交易员配合 systematic logging + rationale extraction？这是 Direction B 的 enabling prerequisite，社交工程量不小
Direction B 的 paper potential 和 Crypto-Alpha-Bench 比，应该谁优先？两者不冲突，但精力有限
HKU 汇报里要不要把 Direction B 加进去？加了讲稿要再改一遍；不加则错过最锐利的 angle
ETH validation experiment 还做吗？如果 Direction B 是真方向，ETH validation 的 marginal value 下降

8. Honest Caveat¶

以上整个 reframing 建立在一个 strong assumption：我的交易员的真实 edge 主要是 tape reading + directional timing，而不是某种我没看出来的其他能力。

如果这个 assumption 错了——比如交易员的 edge 实际是 macro view + holding period 1-7 天的 swing trading——那 Direction A/B 全都不适用。这件事需要交易员本人确认，不是我能 infer 的。

End of document. Time to think.