DexHand Sensing System

← Back to Blog ← Back to Home ← Research Details

System Concept (DexHand × Sensing × Learning)

What problem should the sensing system solve?

Dexterous manipulation fails less because the robot “can’t see”, and more because contact is nonlinear: friction is unknown, occlusion is common, tabletop collisions are real, and small geometry errors amplify. A DexHand sensing system should be designed to make contact measurable, controllable, and replayable.

Contact Wrench/Friction Slip/Pre-slip Replayability

Sensor stack (from minimum viable to “robot skin”)

Level-0 (default): joint encoders (q/dq), motor current/voltage, temperature/power.
Level-1 (stable grasp): fingertip normal force, optional wrist 6D F/T, tactile array CoP/contact area.
Level-2 (micro-manipulation): shear/slip cues (direct or inferred), pre-slip detection.
Level-3 (high information density): visuotactile or dynamic tactile arrays (learn contact geometry).
Level-4 (future): palm-scale skin, nail/edge sensing for prying/insertion, near-field proximity.

Three-loop architecture (fast reflex / tactile coordination / slow semantics)

DexHand Sensing/Control: three-layer closed loop

  Slow (5~15 Hz): semantics & planning
  ┌───────────────────────────────────────────────────────────┐
  │ VLM/VLA: stage machine, goal selection, recovery policy   │
  └──────────────────────────────┬────────────────────────────┘
                                 │ goals + constraints
                                 ▼
  Mid (60~200 Hz): tactile coordination
  ┌───────────────────────────────────────────────────────────┐
  │ Slip/Force controller: friction margin, Δgrip, micro-move  │
  └──────────────────────────────┬────────────────────────────┘
                                 │ impedance/targets
                                 ▼
  Fast (500~1000 Hz): on-hand reflex & safety
  ┌───────────────────────────────────────────────────────────┐
  │ RT loop: encoder/current control, over-current, thermal    │
  │ derating, collision detection, emergency stop              │
  └───────────────────────────────────────────────────────────┘

This separation makes “hold steady” independent of 30 Hz vision, and makes safety independent of PC scheduling jitter.

Time sync & calibration (the system’s foundation)

Timestamp-at-source: tag at the device when data is produced (not when received).
Unified clock: PTP (IEEE 1588) preferred; hardware trigger as a fallback.
Alignment: ring buffer + ZOH / linear interpolation to fuse 30 Hz vision with 1 kHz proprioception.
Tactile calibration: taxel/visuotactile pixels → finger frame; raw signal → force proxy; cross-modal alignment with vision.

Learning-ready data schema (what to log)

Vision: wrist/palm RGB(D) (+ timestamps)
Proprioception: q/dq/current/temp (+ timestamps)
Tactile: taxel map / CoP / contact area / slip flag (or visuotactile image)
Action: Δq/Δx or impedance parameters (what you commanded)
QC labels: replay_ok, recovery_event

Future direction: tactile expectation & recovery as “first-class skills”

The next step is to make tactile not only an input, but a predictable signal: learn a tactile expectation model; when real tactile deviates from predicted tactile, trigger recovery. This aligns with the idea that diverse, messy, fail-retry data is crucial for robust policies: Spirit-v1.5 Blog.

系统构想（灵巧手 × 传感 × 学习）

传感系统到底要解决什么问题？

灵巧操作失败更常见的原因不是“看不见”，而是接触的非线性：摩擦未知、遮挡常态、桌面硬碰撞真实存在，微小几何误差会被放大。因此系统目标应是让接触可测、可控、可复现。

接触力/摩擦防滑/预滑可复现

传感栈（从最小可用到“皮肤”）

Level-0（默认）：编码器 q/dq、电机电流/电压、温度/功耗。
Level-1（稳定抓取）：指尖法向力、可选腕部 6D F/T、触觉阵列的接触面积/压力中心。
Level-2（微操）：剪切/滑移线索与预滑检测。
Level-3（高信息密度）：视触觉或动态触觉阵列（学习接触几何）。
Level-4（未来）：掌心大面积皮肤、指甲/硬边缘传感、近场接近传感。

三层闭环（快反射 / 触觉协调 / 慢语义）

DexHand Sensing/Control: three-layer closed loop

  Slow (5~15 Hz): semantics & planning
  ┌───────────────────────────────────────────────────────────┐
  │ VLM/VLA: stage machine, goal selection, recovery policy   │
  └──────────────────────────────┬────────────────────────────┘
                                 │ goals + constraints
                                 ▼
  Mid (60~200 Hz): tactile coordination
  ┌───────────────────────────────────────────────────────────┐
  │ Slip/Force controller: friction margin, Δgrip, micro-move  │
  └──────────────────────────────┬────────────────────────────┘
                                 │ impedance/targets
                                 ▼
  Fast (500~1000 Hz): on-hand reflex & safety
  ┌───────────────────────────────────────────────────────────┐
  │ RT loop: encoder/current control, over-current, thermal    │
  │ derating, collision detection, emergency stop              │
  └───────────────────────────────────────────────────────────┘

这种分层能让“拿稳”不依赖 30Hz 视觉，让“安全”不依赖上位机线程调度抖动。

同步与标定（系统的生命线）

设备端打戳：采集瞬间打全局时间戳，而不是接收时打戳。
统一时钟：优先 PTP（IEEE 1588），必要时用硬件触发补偿。
对齐：环形缓冲 + ZOH/线性插值，把 30Hz 视觉与 1kHz 本体对齐。
触觉标定：触觉像素/Taxel → 指尖坐标系；原始信号 → 力的代理量；与视觉点云对齐构造监督。

面向学习的数据结构（你应该记录什么）

视觉：wrist/palm RGB(D)（含 source timestamp）
本体：q/dq/current/temp（含 source timestamp）
触觉：taxel map/CoP/contact area/slip flag（或视触觉图像）
动作：Δq/Δx 或阻抗参数（你到底下发了什么）
质控标签：replay_ok、recovery_event

未来：触觉预期与恢复动作成为“第一类技能”

未来触觉不只是输入，而应成为可预测的信号：学触觉预期，一旦真实触觉偏离预期就触发恢复。这与“多样化、失败-重试”的数据更利于泛化的观点一致： Spirit-v1.5 Blog。