DexHand Sensing System
System Concept (DexHand × Sensing × Learning)
What problem should the sensing system solve?
Dexterous manipulation fails less because the robot “can’t see”, and more because contact is nonlinear: friction is unknown, occlusion is common, tabletop collisions are real, and small geometry errors amplify. A DexHand sensing system should be designed to make contact measurable, controllable, and replayable.
Sensor stack (from minimum viable to “robot skin”)
- Level-0 (default): joint encoders (q/dq), motor current/voltage, temperature/power.
- Level-1 (stable grasp): fingertip normal force, optional wrist 6D F/T, tactile array CoP/contact area.
- Level-2 (micro-manipulation): shear/slip cues (direct or inferred), pre-slip detection.
- Level-3 (high information density): visuotactile or dynamic tactile arrays (learn contact geometry).
- Level-4 (future): palm-scale skin, nail/edge sensing for prying/insertion, near-field proximity.
Three-loop architecture (fast reflex / tactile coordination / slow semantics)
DexHand Sensing/Control: three-layer closed loop
Slow (5~15 Hz): semantics & planning
┌───────────────────────────────────────────────────────────┐
│ VLM/VLA: stage machine, goal selection, recovery policy │
└──────────────────────────────┬────────────────────────────┘
│ goals + constraints
▼
Mid (60~200 Hz): tactile coordination
┌───────────────────────────────────────────────────────────┐
│ Slip/Force controller: friction margin, Δgrip, micro-move │
└──────────────────────────────┬────────────────────────────┘
│ impedance/targets
▼
Fast (500~1000 Hz): on-hand reflex & safety
┌───────────────────────────────────────────────────────────┐
│ RT loop: encoder/current control, over-current, thermal │
│ derating, collision detection, emergency stop │
└───────────────────────────────────────────────────────────┘
This separation makes “hold steady” independent of 30 Hz vision, and makes safety independent of PC scheduling jitter.
Time sync & calibration (the system’s foundation)
- Timestamp-at-source: tag at the device when data is produced (not when received).
- Unified clock: PTP (IEEE 1588) preferred; hardware trigger as a fallback.
- Alignment: ring buffer + ZOH / linear interpolation to fuse 30 Hz vision with 1 kHz proprioception.
- Tactile calibration: taxel/visuotactile pixels → finger frame; raw signal → force proxy; cross-modal alignment with vision.
Learning-ready data schema (what to log)
- Vision: wrist/palm RGB(D) (+ timestamps)
- Proprioception: q/dq/current/temp (+ timestamps)
- Tactile: taxel map / CoP / contact area / slip flag (or visuotactile image)
- Action: Δq/Δx or impedance parameters (what you commanded)
- QC labels:
replay_ok,recovery_event
Future direction: tactile expectation & recovery as “first-class skills”
The next step is to make tactile not only an input, but a predictable signal: learn a tactile expectation model; when real tactile deviates from predicted tactile, trigger recovery. This aligns with the idea that diverse, messy, fail-retry data is crucial for robust policies: Spirit-v1.5 Blog.
系统构想(灵巧手 × 传感 × 学习)
传感系统到底要解决什么问题?
灵巧操作失败更常见的原因不是“看不见”,而是接触的非线性: 摩擦未知、遮挡常态、桌面硬碰撞真实存在,微小几何误差会被放大。 因此系统目标应是让接触可测、可控、可复现。
传感栈(从最小可用到“皮肤”)
- Level-0(默认):编码器 q/dq、电机电流/电压、温度/功耗。
- Level-1(稳定抓取):指尖法向力、可选腕部 6D F/T、触觉阵列的接触面积/压力中心。
- Level-2(微操):剪切/滑移线索与预滑检测。
- Level-3(高信息密度):视触觉或动态触觉阵列(学习接触几何)。
- Level-4(未来):掌心大面积皮肤、指甲/硬边缘传感、近场接近传感。
三层闭环(快反射 / 触觉协调 / 慢语义)
DexHand Sensing/Control: three-layer closed loop
Slow (5~15 Hz): semantics & planning
┌───────────────────────────────────────────────────────────┐
│ VLM/VLA: stage machine, goal selection, recovery policy │
└──────────────────────────────┬────────────────────────────┘
│ goals + constraints
▼
Mid (60~200 Hz): tactile coordination
┌───────────────────────────────────────────────────────────┐
│ Slip/Force controller: friction margin, Δgrip, micro-move │
└──────────────────────────────┬────────────────────────────┘
│ impedance/targets
▼
Fast (500~1000 Hz): on-hand reflex & safety
┌───────────────────────────────────────────────────────────┐
│ RT loop: encoder/current control, over-current, thermal │
│ derating, collision detection, emergency stop │
└───────────────────────────────────────────────────────────┘
这种分层能让“拿稳”不依赖 30Hz 视觉,让“安全”不依赖上位机线程调度抖动。
同步与标定(系统的生命线)
- 设备端打戳:采集瞬间打全局时间戳,而不是接收时打戳。
- 统一时钟:优先 PTP(IEEE 1588),必要时用硬件触发补偿。
- 对齐:环形缓冲 + ZOH/线性插值,把 30Hz 视觉与 1kHz 本体对齐。
- 触觉标定:触觉像素/Taxel → 指尖坐标系;原始信号 → 力的代理量;与视觉点云对齐构造监督。
面向学习的数据结构(你应该记录什么)
- 视觉:wrist/palm RGB(D)(含 source timestamp)
- 本体:q/dq/current/temp(含 source timestamp)
- 触觉:taxel map/CoP/contact area/slip flag(或视触觉图像)
- 动作:Δq/Δx 或阻抗参数(你到底下发了什么)
- 质控标签:
replay_ok、recovery_event
未来:触觉预期与恢复动作成为“第一类技能”
未来触觉不只是输入,而应成为可预测的信号: 学触觉预期,一旦真实触觉偏离预期就触发恢复。这与“多样化、失败-重试”的数据更利于泛化的观点一致: Spirit-v1.5 Blog。