VLA 深度追踪

METHOD FAMILY TRENDS

data thru 2026-04-24 · 681 papers · 44d window · 15 families

high

▼ 2 declining 15 families · 203 papers covered

FAMILY MOM 7d 14d 30d Δ7d Δ14d Δ30d CHART ST

Lang. Grounding 40 78 176 1.05x 0.86x 1.26x ●

World Model 22 51 105 0.76x 1.04x 1.29x ●

Flow Matching 20 37 80 0.99x 0.75x 0.99x ●

Long Horizon 20 30 61 0.99x 0.74x 0.75x ●

Multi-Task 20 38 57 0.99x 0.94x 0.70x ●

RL Fine-tuning 20 42 109 0.91x 0.67x 0.91x ●

Tactile 14 21 37 0.69x 0.52x 0.46x ●

Human-Robot 13 26 63 0.64x 0.64x 0.78x ●

Diffusion Policy 10 13 33 0.49x 0.32x 0.41x ●

Cross-Embodiment 7 10 17 0.34x 0.25x 0.21x ●

Dexterous Hand 7 14 30 0.34x 0.34x 0.37x ●

Sim-to-Real 4 7 17 0.20x 0.17x 0.21x ●

Mobile Manip. 3 10 18 0.15x 0.25x 0.22x ●

3D Repr. 2 4 18 0.10x 0.10x 0.22x ●

Instr. Tuning 1 1 7 0.05x 0.02x 0.09x ●

COMPETITION PAIRS 6 matchups · hover for details

VLA vs WAM

Paradigm war: end-to-end action prediction vs world model planning

Lang. Ground. vs World Model

65%

35%

5.9% · x1.05 ratio 1.82 3.2% · x0.76

ACTION HEAD ROUTE

Continuous action generation: denoising vs optimal transport

Diffusion Pol. vs Flow Matching

33%

67%

1.5% · x0.49 ratio 0.50 2.9% · x0.99

POST-TRAINING ROUTE

Model adaptation: supervised tuning vs reward optimization

Instr. Tuning vs RL Fine-tune

95%

0.1% · x0.05 ratio 0.05 2.9% · x0.91

LEARNING SIGNAL

Training paradigm: imagination-based vs reward-based

World Model vs RL Fine-tune

52%

48%

3.2% · x0.76 ratio 1.10 2.9% · x0.91

MANIPULATION SENSING

Manipulation approach: tactile feedback vs dexterous control

Tactile vs Dext. Hand

67%

33%

2.1% · x0.69 ratio 2.00 1.0% · x0.34

TRANSFER APPROACH

Domain bridging: simulation transfer vs cross-embodiment

Sim2Real vs Cross-Embod.

36%

64%

0.6% · x0.20 ratio 0.57 1.0% · x0.34

EMERGING SIGNALS

2026-04-24 · 36/171 unmatched · 7d window

3 signals

TERM COUNT AGE VELOCITY STATUS SAMPLE

embodied ai

10 23d ++ x1.0 CANDIDATE GaLa: Hypergraph-Guided Visual Language Models for...

robot manipulation

6 8d + x0.0 CANDIDATE Bimanual Robot Manipulation via Multi-Agent In-Con...

high fidelity

5 8d ~ x-0.5 CANDIDATE FLASH: Fast Learning via GPU-Accelerated Simulatio...

TOP INSTITUTIONS

30d window · 20 labs tracked · VLA domain

20 active / 30d

INSTITUTION TOTAL BEST LAST SEEN ACTIVITY

1 CMU 10 ⚡ 04-17

2 Berkeley 6 🔧 04-17

3 NVIDIA 4 🔧 03-27

Physical Intelligence 4 ⚡ 04-10

Tongji 2 🔧 03-26

HKUST 2 📖 03-26

Ryoo 2 🔧 04-05

清华 15 ⚡ 04-01

Stanford 4 🔧 04-17

UCSD 3 📖 03-26

Princeton 3 🔧 04-22

科大 2 🔧 04-01

北大 2 🔧 04-01

Wisconsin 1 📖 03-26

CUHK 1 📖 03-26

Colorado 1 📖 03-26

Roma 1 📖 03-26

NJU 1 📖 03-26

Buffalo 1 📖 03-26

Wayne 1 📖 03-26

📐 理論文章庫📐 理论文章库

204 篇篇查看 GitHub 全庫查看 GitHub 全库 →

最近 2 週最近 2 周 50 篇篇

2026.04.23 昨天 vla core

PhysMem: 测试时物理记忆扩展 (Scaling Test-time Physical Memory for Robot Manipulation)

在 GitHub 閱讀在 GitHub 阅读

2026.04.23 昨天 vla core

基于漂移的策略优化：面向在线机器人控制的单步原生策略学习 (Drift-Based Policy Optimization: Native One-Step Policy Learning for Online Robot Control)

在 GitHub 閱讀在 GitHub 阅读

2026.04.23 昨天 world model

World Model 辅助 VLA 后训练：研究进展与问题拆解（2026）

在 GitHub 閱讀在 GitHub 阅读

2026.04.22 2 天前 vla core

世界-价值-动作模型：VLA 系统的隐式规划 (World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems)

在 GitHub 閱讀在 GitHub 阅读

2026.04.22 2 天前 vla core

DeepThinkVLA：增强视觉-语言-动作模型的推理能力 (DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models)

在 GitHub 閱讀在 GitHub 阅读

2026.04.22 2 天前 foundation

人类数据是伪装成另一种形式的机器人数据：Danfei Xu 深度访谈（2026）

在 GitHub 閱讀在 GitHub 阅读

2026.04.21 3 天前 vla core

长程记忆赋能 VLA 智能体在开放世界任务执行 (Long-Term Memory for VLA-based Agents in Open-World Task Execution)

在 GitHub 閱讀在 GitHub 阅读

2026.04.21 3 天前 vla core

从看到仿真：用数字表亲生成高保真仿真环境 (From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation)

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 vla core

分层时空动作分词器用于上下文模仿学习 (A Hierarchical Spatiotemporal Action Tokenizer for In-Context Imitation Learning in Robotics)

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 vla core

力场流匹配：从单演示生成力觉数据学习 3D 顺应性策略 (Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data)

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 vla core

无需微调部署 VLA：即插即用推理时策略引导 (Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion)

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 foundation

潜空间综述：语言模型的"原生思维空间"与具身智能的统一接口

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 foundation

免微调部署 VLA：即插即用推理时策略引导 (Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion)

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 foundation

VLA 数据工程指南：从采集到训练的完整链路

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 planning

分层时空动作分词器：上下文模仿学习的新范式 (HiST-AT: A Hierarchical Spatiotemporal Action Tokenizer for In-Context Imitation Learning)

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 vla core

GR00T-N1.7：NVIDIA 的开源通用机器人基础模型——从人形到任意形态

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 vla core

LingBot-VLA：20,000 小时真实数据预训练的实用主义 VLA

在 GitHub 閱讀在 GitHub 阅读

2026.04.20 4 天前 vla core

完全开源 VLA 选型指南：谁是真开源，谁在"开源洗"

在 GitHub 閱讀在 GitHub 阅读

2026.04.19 5 天前 tactile

多模态操作 via 多模态策略共识 (Multi-Modal Manipulation via Multi-Modal Policy Consensus)

在 GitHub 閱讀在 GitHub 阅读

2026.04.19 5 天前 vla core

cuRoboV2：高自由度机器人的动力学感知运动生成 (cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots)

在 GitHub 閱讀在 GitHub 阅读

2026.04.18 6 天前 vla core

DockAnywhere: 通过演示生成提升移动操作数据效率 (DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation)

在 GitHub 閱讀在 GitHub 阅读

2026.04.17 7 天前 vla core

HAMLET：将视觉 - 语言 - 动作模型转换为历史感知策略 (HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy)

在 GitHub 閱讀在 GitHub 阅读

2026.04.17 7 天前 vla core

X-Diffusion: 跨具身人类演示训练扩散策略 (X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations)

在 GitHub 閱讀在 GitHub 阅读

2026.04.17 7 天前 vla core

人工三元智能：生物启发的物理 AI 传感器优先架构 (Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-First Architecture for Physical AI)

在 GitHub 閱讀在 GitHub 阅读

2026.04.17 7 天前 vla core

IGen: 从开放世界图像可扩展生成机器人学习数据 (IGen: Scalable Data Generation for Robot Learning from Open-World Images)

在 GitHub 閱讀在 GitHub 阅读

2026.04.17 7 天前 perception

3D 优先：从 VGA 和 Spark 2.0 看具身智能的下一个表征革命

在 GitHub 閱讀在 GitHub 阅读

2026.04.17 7 天前 perception

VGA：机器人操作是视觉到几何的映射，不是视觉到语言到动作

在 GitHub 閱讀在 GitHub 阅读

2026.04.17 7 天前 vla core

π0.7：可操控的通用机器人基础模型，涌现出组合泛化能力

在 GitHub 閱讀在 GitHub 阅读

2026.04.16 8 天前 vla core

BLaDA：在 3DGS 场中桥接语言与功能性灵巧动作 (BLaDA: Bridging Language to Functional Dexterous Actions within 3DGS Fields)

在 GitHub 閱讀在 GitHub 阅读

2026.04.16 8 天前 vla core

迭代组合式数据生成用于机器人控制 (Iterative Compositional Data Generation for Robot Control)

在 GitHub 閱讀在 GitHub 阅读

2026.04.16 8 天前 vla core

HazardArena：评估 VLA 模型的语义安全 (HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models)

在 GitHub 閱讀在 GitHub 阅读

2026.04.16 8 天前 perception

Spark 2.0：李飞飞 World Labs 开源的 3DGS 网页渲染引擎——1 亿点云手机秒开

在 GitHub 閱讀在 GitHub 阅读

2026.04.15 9 天前 vla core

StaMo：从紧凑状态表示中涌现通用机器人运动 (StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation)

在 GitHub 閱讀在 GitHub 阅读

2026.04.15 9 天前 vla core

StaMo: 从紧凑状态表示中涌现机器人运动 (StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation)

在 GitHub 閱讀在 GitHub 阅读

2026.04.15 9 天前 foundation

StarVLA-α：简化视觉 - 语言 - 动作系统的强基线 (StarVLA-α: Reducing Complexity in Vision-Language-Action Systems)

在 GitHub 閱讀在 GitHub 阅读

2026.04.15 9 天前 vla core

StaMo：从紧凑状态表示中涌现通用机器人运动 (StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation)

在 GitHub 閱讀在 GitHub 阅读

2026.04.14 10 天前 vla core

Déjà Vu：具身智能的经验反馈学习框架 (Dejavu: Towards Experience Feedback Learning for Embodied Intelligence)

在 GitHub 閱讀在 GitHub 阅读

2026.04.14 10 天前 vla core

你有一张金票：用单个噪声向量提升生成式机器人策略 (You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector)

在 GitHub 閱讀在 GitHub 阅读

2026.04.14 10 天前 vla core

2D 还是 3D：谁主导 VLA 模型中的显著性？—— 三阶段 Token 剪枝框架与模态显著性感知 (2D or 3D: Who Governs Salience in VLA Models? -- Tri-Stage Token Pruning Framework with Modality Salience Awareness)

在 GitHub 閱讀在 GitHub 阅读

2026.04.14 10 天前 vla core

用自由语言指令操控人形机器人：统一运动词汇的大型语言动作模型 (Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary)

在 GitHub 閱讀在 GitHub 阅读

2026.04.13 11 天前 vla core

基于反思的任务适应：自改进 VLA 框架 (Reflection-Based Task Adaptation for Self-Improving VLA)

在 GitHub 閱讀在 GitHub 阅读

2026.04.13 11 天前 vla core

可证明概率安全：具身 AI 系统的大规模部署新范式 (Towards Provable Probabilistic Safety for Scalable Embodied AI Systems)

在 GitHub 閱讀在 GitHub 阅读

2026.04.13 11 天前 foundation

HY-Embodied-0.5：具身基础模型实战解析 (HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents)

在 GitHub 閱讀在 GitHub 阅读

2026.04.12 12 天前 vla core

DailyArt: 从单张静态图像发现关节结构 (DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics)

在 GitHub 閱讀在 GitHub 阅读

2026.04.12 12 天前 vla core

Orion-Lite：将 LLM 推理能力蒸馏至高效纯视觉驾驶模型 (Orion-Lite: Distilling LLM Reasoning into Efficient Vision-Only Driving Models)

在 GitHub 閱讀在 GitHub 阅读

2026.04.11 13 天前 vla core

HiF-VLA：通过运动表示实现后见、洞察与前瞻的视觉 - 语言 - 动作模型 (HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models)

在 GitHub 閱讀在 GitHub 阅读

2026.04.11 13 天前 vla core

UniLACT：深度感知 RGB 潜在动作学习 (UniLACT: Depth-Aware RGB Latent Action Learning for VLA Models)

在 GitHub 閱讀在 GitHub 阅读

2026.04.10 14 天前 vla core

RoSHI: 野外便携式全身动捕套装 (RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild)

在 GitHub 閱讀在 GitHub 阅读

2026.04.10 14 天前 tactile

TAMEn: 触觉感知操作引擎用于接触丰富任务中的闭环数据收集 (TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks)

在 GitHub 閱讀在 GitHub 阅读

2026.04.10 14 天前 vla core

Genie Sim PanoRecon：从单张全景图快速生成沉浸式 3D 场景 (Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama)

在 GitHub 閱讀在 GitHub 阅读

更多文章 · 全部在 GitHub更多文章 · 全部在 GitHub 154 篇篇

🏗️ Foundation & Training · 30 篇篇

2026.04.09 联合训练 (Co-training) 2026.04.09 数据处理 (Data Processing) 2026.04.09 具身智能深度：数据飞轮与跨模态迁移 (Data Flywheel & Cross-modal Transfer) 2026.04.09 DCP：凸性检测规则与 CVX/CVXPY 建模心法 (Disciplined Convex Programming) 2026.04.09 DoRA：权重分解的低秩适配 (DoRA: Weight-Decomposed Low-Rank Adaptation) 2026.04.09 评估体系详解 (Evaluation Protocols Deep Dive) 2026.04.09 Flash Attention: 高效 Transformer 推理的关键 2026.04.09 🏗️ 基础理论 — ML 工具箱主线总纲 2026.04.09 更新成本摊销：Doc-to-LoRA / Text-to-LoRA 让 LLM “瞬时内化” (Cost Amortization for Instant LLM Updates) 2026.04.09 知识蒸馏 (Knowledge Distillation) 2026.04.09 Knowledge Insulation: 防止灾难性遗忘 2026.04.09 当我们谈论 AI 推理的 KV Cache，我们在说什么？ (KV Cache in LLM Inference) 2026.04.09 终身模仿学习与多模态潜在回放 (Lifelong Imitation Learning with Multimodal Latent Replay and Incremental Adjustment) 2026.04.09 VLA 文献核心技术归纳 (Literature Technical Review) 2026.04.09 VLA 数学必备：从直觉到实作 2026.04.09 弹性模组化架构 Table 生成器（VLA Modular Pipelines） 2026.04.09 NeurIPS 2025 最佳论文：具身智能视角解读 2026.04.09 VLA 论文索引 (Paper Index) 2026.04.09 高效微调理论 (PEFT & LoRA) 2026.04.09 量化理论 (Quantization Theory) 2026.04.09 动作空间敏感量化：QVLA (QVLA: Not All Channels Are Equal in VLA Quantization) 2026.04.09 RDT2：UMI 数据规模化与跨本体零样本部署 (RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization) 2026.04.09 RoboGene: 通过多样性驱动的智能体框架提升 VLA 预训练 (Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation) 2026.04.09 自监督学习 (Self-Supervised Learning) 2026.04.09 Shallow-π：Flow-based VLA 的层深蒸馏 (Shallow-π: Knowledge Distillation for Flow-based VLAs) 2026.04.09 迁移学习 (Transfer Learning) 2026.04.09 Transformer vs CNN: 核心架构对比 2026.04.09 VideoWeaver：具身智能体的多模态多视角视频迁移 (VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents) 2026.04.09 VLA Loss Functions Handbook（VLM-Robot Policy 训练目标实务手册） 2026.04.08 VLA-Forget：具身基础模型的视觉 - 语言 - 动作遗忘机制 (VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models)

🧠 Planning & Reasoning · 26 篇篇

2026.04.09 VLA 的安全、对齐与约束决策 (Safety & Alignment for VLA) 2026.04.09 BEHAVIOR-1K：为什么它不是“任务更多的 benchmark”，而是对通用机器人提出了更真实的要求？ (BEHAVIOR-1K: A Human-Centered Embodied AI Benchmark with OmniGibson) 2026.04.09 2025 BEHAVIOR Challenge 冠军方案：当 benchmark 足够难时，VLA 最后靠什么赢？ (Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge) 2026.04.09 Benchmark 主线总纲：从任务世界到安全约束，再到世界模型评测器 (Benchmark Mainline Overview) 2026.04.09 BeSafe-Bench：揭示功能环境中具身代理的行为安全风险 (BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments) 2026.04.09 思维链推理 (Chain-of-Thought Reasoning) 2026.04.09 基于概念字典学习的 VLA 推理时安全控制 (SAFE-Dict) 2026.04.09 DAC-RL：分治推理训练提升测试时可扩展性 (Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability) 2026.04.09 通过视觉符号诊断、纠正并从操作失败中学习 (Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols) 2026.04.09 具身思维链：让 VLA 先“想清楚再动手” (Robotic Control via Embodied Chain-of-Thought Reasoning, 2024) 2026.04.09 ENACT：它不是再做一个 benchmark，而是在追问 VLM 有没有“具身认知”？ (ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction) 2026.04.09 AgiBot：ERIQ + FACT + GenieReasoner —— 量化“推理→动作”的传递损耗 2026.04.09 如何用刀削皮：将细粒度操作与人类偏好对齐 (How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference) 2026.04.09 IAIL：跨机器人行为适配中的意图对齐 (Cross-robot Behavior Adaptation through Intention Alignment) 2026.04.09 IS-Bench：它测的不是“安不安全”，而是“会不会在交互过程中把事情做危险”？ (IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks) 2026.04.09 运动规划 (Motion Planning) 2026.04.09 RoboClaw：长程机器人操作的智能体框架 (RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks) 2026.04.09 SOMA：通过记忆增强与策略编排实现 VLA 鲁棒性 (SOMA: Strategic Orchestration and Memory-Augmented System for VLA Robustness via In-Context Adaptation) 2026.04.09 Tex3D: 通过对抗性 3D 纹理将物体变为 VLA 模型的攻击表面 (Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models) 2026.04.09 具身任务规划的视觉语言基础模型：Thinker (Thinker: A Vision-Language Foundation Model for Embodied Intelligence) 2026.04.09 13 参数推理微调：TinyLoRA (Learning to Reason in 13 Parameters) 2026.04.09 Uni-Skill：构建自演化技能库实现通用机器人操作 (Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation) 2026.04.09 VLA 十大挑战 (10 Open Challenges) 2026.04.09 VLA 本质安全：从梯度掩码到物理“脑切除” (SGTM) 2026.04.09 VLM Promptable Representations：用“可提示表征”给 RL 注入常识 (PR2L, 2024/2025) 2026.04.09 何时执行、询问或学习：不确定性感知策略转向 (When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering)

🏛️ VLA Core · 21 篇篇

2026.04.09 SnapFlow：流匹配 VLA 的单步动作生成 (SnapFlow: One-Step Action Generation for Flow-Matching VLAs via Progressive Self-Distillation) 2026.04.09 即时 VLA 自适应 via 测试时强化学习 (On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning) 2026.04.09 GeoPredict：利用预测运动学与 3D 高斯几何实现精确 VLA 操作 (GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation) 2026.04.09 ABot-M0：动作流形学习的 VLA 基础模型 (ABot-M0: VLA Foundation Model with Action Manifold Learning) 2026.04.09 ACT: 动作分块变换器 (Action Chunking with Transformers) 2026.04.09 超越注意力幅度：利用层间秩一致性实现高效 VLA 模型 (Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models) 2026.04.09 FAST: 高效动作 Token 化 2026.04.09 Figure Helix 02：全身端到端 VLA 的“运动-操作一体化”架构 (Helix 02: Full-Body Autonomy) 2026.04.09 FocusVLA：聚焦视觉利用的 Vision-Language-Action 模型 (FocusVLA: Focused Visual Utilization for Vision-Language-Action Models) 2026.04.09 Galaxea G0: 双系统 VLA 框架 2026.04.09 GR00T-N1.6 模型解剖 (Dissecting GR00T-N1.6) 2026.04.09 InstructVLA：从理解到操作的视觉 - 语言 - 动作指令微调 (InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation) 2026.04.09 LangGap：诊断与修复 VLA 模型的语言理解缺口 (Diagnosing and Closing the Language Gap in Vision-Language-Action Models) 2026.04.09 LingBot-VLA：实用主义 VLA 基座模型与高吞吐训练栈 (LingBot-VLA: A Pragmatic VLA Foundation Model) 2026.04.09 压碎、涂抹、切片！通过视觉空间进展学习物体状态操作 (Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress) 2026.04.08 学习加法组合潜在动作用于具身 AI (Learning Additively Compositional Latent Actions for Embodied AI) 2026.04.08 PALM: 通过可供性推理实现进度感知的策略学习 (PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation) 2026.04.08 通过质量多样性提示生成对 VLA 模型进行红队测试 (Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies) 2026.04.07 开环规划，闭环验证：VLA 的推测验证框架 (Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA) 2026.04.07 Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception (EyeVLA) 2026.04.07 从视觉语言模型学习结构化机器人策略 via 合成神经符号监督 (Learning Structured Robot Policies from Vision-Language Models via Synthetic Neuro-Symbolic Supervision)

🔧 Deployment & Hardware · 19 篇篇

2026.04.09 🔧 部署与硬件 — 实战落地主线总纲 2026.04.09 DexGrasp-Zero：形态对齐的零样本跨本体灵巧抓取策略 (DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping) 2026.04.09 中金人机系列05（灵巧手）→ VLA/控制/硬件的“可计算约束”框架（理论侧整理） 2026.04.09 灵巧手机械学深度解析 (Dexterous Hand Mechanics) — 修订整合版 v2 2026.04.09 机器人开可乐/发牌有多难？灵巧手：硬件路线 × 接触数学 × 数据金字塔（访谈摘录整理） 2026.04.09 EquiBim：双臂操作中的对称等变策略学习 (EquiBim: Learning Symmetry-Equivariant Policy for Bimanual Manipulation) 2026.04.09 GR-Dexter（ByteDance Seed）：把 VLA 扩展到高自由度灵巧手的“硬件-数据-模型”全栈框架 2026.04.09 抓取算法与仿真平台 (Grasp Algorithms & Simulation Platforms) 2026.04.09 House of Dextra: 灵巧手机器人形态 - 控制协同设计 (House of Dextra: Cross-embodied Co-design for Dexterous Hands) 2026.04.09 产业视角：通用性与“元学习”路径（从一张路线图说起） 2026.04.09 Isaac Lab: GPU 加速的多模态机器人学习仿真框架 2026.04.09 Lightning Grasp：Contact Field 驱动的超高速灵巧手抓取合成 (Lightning Grasp: Procedural Grasp Synthesis with Contact Fields) 2026.04.09 NVIDIA 的 AI 五层蛋糕：从能源到机器人应用的基础设施观 (AI Is a 5-Layer Cake) 2026.04.09 英伟达物理 AI 的第一刀：为什么先砍向汽车 (Why NVIDIA's First Physical AI Wedge Hits Cars First) 2026.04.09 Physical Intelligence Layer：机器人基础模型 API 的产品化范式 (The Physical Intelligence Layer) 2026.04.09 RoboPocket：把“机器人博士”装进口袋的无本体即时策略迭代 (RoboPocket: Improve Robot Policies Instantly with Your Phone) 2026.04.09 机械臂运动学、动力学与控制 (Robot Arm Kinematics, Dynamics & Control) 2026.04.09 机器人动力学系统分类 (Classification of Robot Dynamical Systems) 2026.04.09 机器人“开源基建”三分法：成果展示 / 生态绑定 / 基础设施（以 RoboParty Roboto_Origin 为例）

🎮 Reinforcement Learning · 16 篇篇

2026.04.09 CausalGDP：因果引导的扩散策略用于强化学习 (CausalGDP: Causality-Guided Diffusion Policies for Reinforcement Learning) 2026.04.09 Evo-RL：在低成本机械臂上把 π*0.6 / RECAP 真机 RL 跑成可复现工程 (Evo-RL for Open Real-World RL on SO101 and Beyond) 2026.04.09 GR-RL 模型解剖 (Dissecting GR-RL) 2026.04.09 π*0.6 / RECAP：披着 RL 外衣的 Supervised Learning？——从 Offline RL 的“监督化”到 VLA Post-training 的新范式 2026.04.09 π-StepNFT：更宽探索空间需要更细粒度步级监督 (Wider Space Needs Finer Steps in Online RL for Flow-based VLAs) 2026.04.09 后验优化与裁剪目标：生成式策略学习中的效率 - 稳定性桥梁 (Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning) 2026.04.09 PGR：用条件扩散“生成式回放”替代 PER 的稀有样本过拟合 (Prioritized Generative Replay, ICLR 2025) 2026.04.09 强化学习 (Reinforcement Learning) 2026.04.09 具身智能体的奖励函数自主发现 (Discovery of Reward Function for Embodied RL) 2026.04.09 🎮 强化学习 — VLA 后训练主线总纲 2026.04.09 RLinf：面向 Embodied / Agentic AI 的 RL 训练基础设施（以及它对 VLA+RL 的意义） 2026.04.09 扩展验证比扩展策略学习更有效：VLA 对齐的测试时验证框架 (Scaling Verification Can Be More Effective than Scaling Policy Learning for VLA Alignment) 2026.04.09 U2O RL：用无监督离线技能预训练替代“任务奖励离线预训练” (Unsupervised-to-Online Reinforcement Learning, 2024) 2026.04.09 VLA-OPD：用反向 KL 桥接 SFT 与 RL 的 VLA 后训练范式 (VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation) 2026.04.09 VLA+RL 实战教程：架构、算法与工具链 (Practical VLA+RL Guide) 2026.04.09 VLGOR：视觉 - 语言知识引导的离线强化学习用于通用智能体 (VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents)

👁️ Perception & 3D · 15 篇篇

2026.04.09 ArtPro: 基于运动提议自适应整合的关节物体自监督重建 (ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals) 2026.04.09 DKT: 基于视频扩散先验的透明物体感知 (Diffusion Knows Transparency) 2026.04.09 DVGT-2：以密集几何为基石的自动驾驶新范式 (DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale) 2026.04.09 EgoDemoGen: 第一人称视角演示生成用于机器人操作中的视角泛化 (EgoDemoGen: Egocentric Demonstration Generation for Viewpoint Generalization in Robotic Manipulation) 2026.04.09 Fast-FoundationStereo：把基础立体匹配压到实时的零样本双目深度模型 (Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching) 2026.04.09 语言如何“改写”视觉：从「香蕉是黄色的」到 VLA 的工程启示 (How Language Shapes Vision: From “Bananas Are Yellow” to VLA) 2026.04.09 多模态模型基础 (Multimodal Models) 2026.04.09 PAM: 用于 Sim-to-Real HOI 视频生成的姿态 - 外观 - 动作引擎 (PAM: A Pose–Appearance–Motion Engine for Sim-to-Real HOI Video Generation) 2026.04.09 👁️ 视觉感知 — 3D 理解主线总纲 2026.04.09 视觉/多模态感知技术 (Visual & Multimodal Perception Techniques) 2026.04.09 点云理解与 SLAM (Point Cloud Intelligence & SLAM) 2026.04.09 空间智能与坐标系 (Spatial Intelligence & Coordinate Systems) 2026.04.09 状态估计与传感器融合 (State Estimation & Sensor Fusion) 2026.04.09 WaveFormer：波动方程驱动的视觉建模 (WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation) 2026.04.09 单图像零样本三维生成：Zero-1-to-3 (Zero-1-to-3: Zero-shot One Image to 3D Object)

🌊 Diffusion & Flow · 13 篇篇

2026.04.09 动作生成范式详解 (Action Representations & Generation) 2026.04.09 瓶颈定位：VLA 模型动作生成的边缘架构困境 (Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures) 2026.04.09 闭环动作块动态校正：训练-free 扩散策略实时适应 (Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy) 2026.04.09 CoMo：从互联网视频学习连续潜在运动 (CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning) 2026.04.09 收缩扩散策略：通过收缩微分方程实现鲁棒动作扩散 (Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations) 2026.04.09 🌊 扩散与流匹配 — 动作生成主线总纲 2026.04.09 扩散策略详解 (Diffusion Policy) 2026.04.09 神经隐式动作场：从离散航点到连续函数 (Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for VLA Models) 2026.04.09 Pi0 (π0) 代码解构：Flow Matching for VLA 2026.04.09 Pixel Motion Diffusion is What We Need for Robot Control (DAWN) 2026.04.09 压缩鸿沟：为何离散 Tokenization 限制 VLA 模型Scaling (The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling) 2026.04.09 传统动作生成方法 (Traditional Action Generation) 2026.04.09 全模态共享 Token 空间：以 MM-ACT 为例的 VLA 进化论

🤚 Tactile Perception · 13 篇篇

2026.04.09 学习何时看与何时感觉：接触感知操作中的自适应视觉-力矩融合 (Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation) 2026.04.09 多模态操控的策略共识：让“触觉不再拖后腿” (Multi-Modal Manipulation via Policy Consensus) 2026.04.09 软体机器人“本体觉醒”：GVS 应变建模 + 灵敏度椭球，让形状与 3D 外力可估计 (Soft Robot Proprioception with GVS + Sensitivity Ellipsoids) 2026.04.09 SuperTac：多模态“电子皮肤” + 触觉语言模型 DOVE（Nature Sensors 2025） 2026.04.09 TacRefineNet：触觉驱动的机器人精细抓取微调模型 (TacRefineNet: Tactile-Only Grasp Refinement) 2026.04.09 触觉：为什么看起来最“低级”的感官，在具身智能里最不可替代？(Tactile, the Irreplaceable Modality) 2026.04.09 🤚 触觉感知 — 多模态触觉主线总纲 2026.04.09 触觉感知与 VLA (Tactile VLA) 2026.04.09 触觉-力对齐的 VLA：TaF-VLA (Tactile-Force Alignment for VLA) 2026.04.09 让策略先“看见”，再“摸准”：TouchGuide 触觉推理引导 (TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance) 2026.04.09 UniTacHand：用 MANO UV Map 统一触觉，实现人手→机器人零样本技能迁移 (Unified Spatio-Tactile Representation) 2026.04.09 视触觉“感同身受”的神经基础 (Vicarious Body Maps) 2026.04.09 视触觉预训练 + 在线多任务学习：用单目 + 二值触觉解锁“类人灵巧操作” (Visual-Tactile Pretraining + Online Multitask Learning)

🔬 Frontier Research · 1 篇篇

2026.04.09 🔬 前沿跨域 — 灵感之源主线总纲

🏆 SOTA 排行SOTA 排行

Evo-SOTA 完整榜Evo-SOTA 完整榜 → 30

CALVIN ABCD-D 飽和饱和 avg_len

#	Model	Score	vs Prev	Date	Paper
1	Xiaomi-Robotics-0	4.8	Flower VLA +0.13	2026-04-24	arxiv →
2	Xiaomi-Robotics-0	4.8	Flower VLA +0.13	2026-04-17	arxiv →
3	MMaDA-VLA	4.78	Xiaomi-Robotics-0 +0.03	2026-04-24	arxiv →
4	MMaDA-VLA	4.78	Xiaomi-Robotics-0 +0.03	2026-04-17	arxiv →
5	AVA-VLA	4.65	TriVLA +0.28	2026-04-24	arxiv →
6	AVA-VLA	4.65	TriVLA +0.28	2026-04-17	arxiv →
7	GR-2	4.64	DFM-VLA +0.20	2026-04-24	arxiv →
8	GR-2	4.64	DFM-VLA +0.20	2026-04-17	arxiv →
9	NS-VLA	4.56	AtomicVLA +0.29	2026-04-24	arxiv →
10	NS-VLA	4.56	AtomicVLA +0.29	2026-04-17	arxiv →
11	Flower VLA	4.35	RoboUniview +0.49	2026-04-24	arxiv →
12	Flower VLA	4.35	RoboUniview +0.49	2026-04-17	arxiv →
13	MCIL	1.82		2026-04-24	arxiv →
14	MCIL	1.82		2026-04-17	arxiv →

LIBERO standard-opensource 飽和饱和 average

#	Model	Score	vs Prev	Date	Paper
1	CORAL	99.3	SRPO +0.10	2026-04-24	arxiv →
2	PLD	99.17	NS-VLA +0.57	2026-04-24	arxiv →
3	PLD	99.17	NS-VLA +0.57	2026-04-17	arxiv →
4	Dual-CoT VLA	98.8	FocusVLA +0.10	2026-04-24	arxiv →
5	Dual-CoT VLA	98.8	FocusVLA +0.10	2026-04-17	arxiv →

LIBERO Plus standard-closed total

#	Model	Score	vs Prev	Date	Paper
1	TAG	87.24	ProGAL-VLA +1.74	2026-04-24	arxiv →
2	ACoT-VLA	86.6	pi0.5 +0.90	2026-04-24	arxiv →
3	NS-VLA	79.4		2026-04-24	arxiv →

MetaWorld non-standard average

#	Model	Score	vs Prev	Date	Paper
1	MPI	86	iRe-VLA +3.00	2026-04-24	arxiv →
2	pi-RL	85.8	Evo-1 +5.20	2026-04-24	arxiv →

RoboCasa-GR1-Tabletop standard-opensource avg_success_rate

#	Model	Score	vs Prev	Date	Paper
1	ABot-M0	58.3	TwinBrainVLA +3.70	2026-04-24	arxiv →
2	StarVLA-alpha (generalist)	57.3	Dual-CoT VLA +2.20	2026-04-24	arxiv →
3	StarVLA-alpha (generalist)	57.3	Dual-CoT VLA +2.20	2026-04-18	arxiv →

RoboChallenge standard-opensource score

#	Model	Score	vs Prev	Date	Paper
1	DM0	72.25	Giga-Brain-0.1 +3.91	2026-04-24	arxiv →
2	StarVLA-alpha (generalist)	54.5		2026-04-24	arxiv →
3	StarVLA-alpha (generalist)	54.5		2026-04-18	arxiv →