2026-05-28

VLA 研究日報 Pulsar

LIVE

— AI 線今日無資料 —— AI 线今日无资料 —

VLA 線VLA 线 · cs.RO · cs.AI · cs.LG

PhyPush: One Push is All You Need for Sensorless Physical Property Estimation with Physics-Guided Transformers Koyo Fujii et al. · 提出基于物理引导Transformer的无传感器物体属性估计方法，通过单次推动推断质量和摩擦系数。虽涉及交互感知，但侧重属性估计而非端到端VLA控制策略。 CS.RO
Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient [NVIDIA] Haoxiang You et al. · 提出SDPG算法，在单张RTX 4080上几小时内训练视觉强化学习策略。亮点是极高的计算效率和轻量化设计，适合资源受限环境下的VLA策略微调。 CS.RO
HyperSim: A Holistic Sim-To-Real Framework For Robust Robotic Manipulation Junyi Dong et al. · 提出用于鲁棒机器人操作的Sim-to-Real框架，旨在解决合成数据到真实世界的迁移问题。属于工程框架类工作，需查看正文确认具体VLA架构贡献。 CS.RO
On the Generalization Capabilities, Design Choices and Limitations of Keypoint Imitation Learning Thomas Lips et al. · 分析关键点模仿学习的泛化能力、设计选择及局限性，探讨视觉基础模型在少样本泛化中的作用。属于方法论分析与消融研究，非新架构提出。 CS.RO
Can VLA Models Learn from Real-World Data Continually without Forgetting? Jiarun Zhu et al. · 研究VLA模型在真实世界数据上的持续学习能力，防止灾难性遗忘。提供针对VLA部署中动态适应性的解决方案，具有明确的实际应用路径。 CS.RO
Towards Shared Embodied Intelligence in Humanoid Robots through Optimization Development and Testing of the Human Aware ergoCub Robot Carlotta Sartore et al. · 介绍ergoCub人形机器人的优化开发与测试，旨在实现共享具身智能。侧重于硬件平台与协作概念验证，缺乏具体的VLA算法创新细节。 CS.RO
VR-DAgger: Immersive VR for Dexterous Data Collection and Uncertainty-Guided On-Policy Correction Ren\'e Zurbr\"ugg et al. · 提出VR-DAgger框架，利用沉浸式VR进行灵巧操作数据采集，并结合不确定性引导的在线策略修正。为VLA数据收集提供高效工具，可直接复用。 CS.RO
FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies Xintong Hu et al. · 提出FineVLA，实现细粒度指令对齐以增强VLA策略的可控性。解决现有数据集仅匹配粗粒度目标的问题，提升对执行过程指令的遵循能力。 CS.RO
Riding the Shifting Potential: When Reactive Control Suffices for Multi-Goal Behavior Vito Mengers et al. · 论证反应式控制在多目标行为中的充分性，提出动态势能编码以克服局部极小值。侧重控制理论分析，未明确结合VLA大模型架构。 CS.RO
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies Yinpei Dai et al. · 发布RoboMME基准，评估通用机器人策略的记忆能力，特别是长程和历史依赖任务。作为Benchmark论文，重要但不紧急，供后续研究参考。 CS.RO
SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning Philip Schroeder et al. · 提出SOLE-R1，仅使用视频语言推理作为机器人在线强化学习的奖励信号。无需人工标注奖励函数，简化了RL在VLA中的应用流程，具有创新性。 CS.RO
Continual Model-Based Reinforcement Learning with Hypernetworks Yizhou Huang et al. · 结合超网络实现持续模型基强化学习，解决动力学模型的非平稳性问题。方法较通用，未特指VLA架构，且为旧文更新，优先级较低。 CS.RO

2026-05-28

VLA 研究日報VLA 研究日报

17 篇 9 篇共 26 篇

🔧 技術技术

Practical VLA [NVIDIA] 2026-05-28

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

Haoxiang You et al. · 提出SDPG算法，在单张RTX 4080上几小时内训练视觉强化学习策略。亮点是极高的计算效率和轻量化设计，适合资源受限环境下的VLA策略微调。

cs.RO 閱讀原文

Practical VLA 2026-05-28

Can VLA Models Learn from Real-World Data Continually without Forgetting?

Jiarun Zhu et al. · 研究VLA模型在真实世界数据上的持续学习能力，防止灾难性遗忘。提供针对VLA部署中动态适应性的解决方案，具有明确的实际应用路径。

cs.RO 閱讀原文

Practical VLA 2026-05-28

VR-DAgger: Immersive VR for Dexterous Data Collection and Uncertainty-Guided On-Policy Correction

Ren\'e Zurbr\"ugg et al. · 提出VR-DAgger框架，利用沉浸式VR进行灵巧操作数据采集，并结合不确定性引导的在线策略修正。为VLA数据收集提供高效工具，可直接复用。

cs.RO 閱讀原文

Practical VLA 2026-05-28

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

Xintong Hu et al. · 提出FineVLA，实现细粒度指令对齐以增强VLA策略的可控性。解决现有数据集仅匹配粗粒度目标的问题，提升对执行过程指令的遵循能力。

cs.RO 閱讀原文

Practical VLA 2026-05-28

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

Philip Schroeder et al. · 提出SOLE-R1，仅使用视频语言推理作为机器人在线强化学习的奖励信号。无需人工标注奖励函数，简化了RL在VLA中的应用流程，具有创新性。

cs.RO 閱讀原文

Practical VLA 2026-05-28

Bridging the Semantic-Action Gap in Visual Token Pruning for Efficient VLA Inference

Ziyan Liu et al. · 提出视觉Token剪枝方法以加速VLA推理，弥合语义与动作间的差距。针对实时部署效率痛点，提供可复用的工程优化方案。

cs.AI 閱讀原文

Practical VLA 2026-05-28

MVISTA-4D: View-Consistent 4D World Model with Test-Time Action Inference for Robotic Manipulation

Jiaxu Wang et al. · 提出MVISTA-4D，构建视图一致的4D世界模型并支持测试时动作推断。结合世界模型与VLA，提升复杂几何推理能力，具有显著技术价值。

cs.CV 閱讀原文

Practical VLA 2026-05-28

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Xiaoyuan Cheng et al. · 提出通过扩散策略优化扩展世界模型强化学习，解决模型偏差和误差累积问题。结合Diffusion Policy与世界模型，为大规模MBRL提供新路径。

cs.LG 閱讀原文

Practical VLA [LIBERO Team] 2026-05-28

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models

Jianwei Tai · 从信息论角度推导VLA模型能力与鲁棒性的权衡边界，揭示对抗攻击下的脆弱性根源。提供理论指导，帮助理解VLA安全极限。

cs.LG 閱讀原文

📖 背景閱讀背景阅读

Background VLA 2026-05-28

PhyPush: One Push is All You Need for Sensorless Physical Property Estimation with Physics-Guided Transformers

Koyo Fujii et al. · 提出基于物理引导Transformer的无传感器物体属性估计方法，通过单次推动推断质量和摩擦系数。虽涉及交互感知，但侧重属性估计而非端到端VLA控制策略。

cs.RO 閱讀原文

Background VLA 2026-05-28

HyperSim: A Holistic Sim-To-Real Framework For Robust Robotic Manipulation

Junyi Dong et al. · 提出用于鲁棒机器人操作的Sim-to-Real框架，旨在解决合成数据到真实世界的迁移问题。属于工程框架类工作，需查看正文确认具体VLA架构贡献。

cs.RO 閱讀原文

Background VLA 2026-05-28

On the Generalization Capabilities, Design Choices and Limitations of Keypoint Imitation Learning

Thomas Lips et al. · 分析关键点模仿学习的泛化能力、设计选择及局限性，探讨视觉基础模型在少样本泛化中的作用。属于方法论分析与消融研究，非新架构提出。

cs.RO 閱讀原文

Background VLA 2026-05-28

Towards Shared Embodied Intelligence in Humanoid Robots through Optimization Development and Testing of the Human Aware ergoCub Robot

Carlotta Sartore et al. · 介绍ergoCub人形机器人的优化开发与测试，旨在实现共享具身智能。侧重于硬件平台与协作概念验证，缺乏具体的VLA算法创新细节。

cs.RO 閱讀原文

Background VLA 2026-05-28

Riding the Shifting Potential: When Reactive Control Suffices for Multi-Goal Behavior

Vito Mengers et al. · 论证反应式控制在多目标行为中的充分性，提出动态势能编码以克服局部极小值。侧重控制理论分析，未明确结合VLA大模型架构。

cs.RO 閱讀原文

Background VLA 2026-05-28

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

Yinpei Dai et al. · 发布RoboMME基准，评估通用机器人策略的记忆能力，特别是长程和历史依赖任务。作为Benchmark论文，重要但不紧急，供后续研究参考。

cs.RO 閱讀原文

Background VLA 2026-05-28

Continual Model-Based Reinforcement Learning with Hypernetworks

Yizhou Huang et al. · 结合超网络实现持续模型基强化学习，解决动力学模型的非平稳性问题。方法较通用，未特指VLA架构，且为旧文更新，优先级较低。

cs.RO 閱讀原文

Background VLA [Physical Intelligence] 2026-05-28

Advancing Creative Physical Intelligence in Large Multimodal Models

Cheng Qian et al. · 探索大型多模态模型在开放环境中发现视觉接地解决方案的能力，超越模式识别。概念较为宽泛，缺乏具体的VLA控制实验证据。

cs.AI 閱讀原文

Background VLA 2026-05-28

Neuro-Inspired Inverse Learning for Planning and Control

Maryna Kapitonova et al. · 受哺乳动物大脑启发的具身规划与控制框架，结合前向/逆向内部模型。侧重生物启发控制理论，未直接关联当前主流VLA大模型范式。

cs.AI 閱讀原文

Background VLA 2026-05-28

TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting

Yuyang Tan et al. · 提出TrackRef3D，在3D高斯泼溅中实现多视图一致的开放世界指代分割。虽服务于具身AI感知，但核心贡献在于3D视觉算法，非VLA决策层。

cs.CV 閱讀原文

Background VLA 2026-05-28

Probabilistic Recurrent Intention Switching Model

Wenyuan Sheng et al. · 提出概率循环意图切换模型，从观察行为中恢复非平稳奖励函数。侧重逆强化学习理论，未明确应用于VLA端到端控制。

cs.LG 閱讀原文

Background VLA 2026-05-28

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

Sen Cui et al. · 从哈密顿力学视角提出物理原生世界模型，旨在提高生成模型的物理一致性。理论性强，但缺乏具体的机器人操作实验验证。

cs.RO 閱讀原文

Background VLA 2026-05-28

Olaf-World: Orienting Latent Actions for Video World Modeling

Yuxin Jiang et al. · 提出Olaf-World，从无标签视频中提取潜在动作接口以构建世界模型。侧重视频预测，未明确展示在机器人控制闭环中的应用。

cs.AI 閱讀原文

Background VLA 2026-05-28

Identifiable Token Correspondence for World Models

Youngin Kim et al. · 解决基于Token的世界模型中长程Rollout的时间不一致性问题，提出可识别的Token对应机制。属世界模型底层改进，非VLA高层策略。

cs.AI 閱讀原文

Background VLA 2026-05-28

EgoExo-WM: Unlocking Exo Video for Ego World Models

Danny Tran et al. · 利用第三人称视频增强第一人称世界模型，缓解部分可观测性问题。侧重视频数据利用，未深入VLA控制策略层面。

cs.CV 閱讀原文

Background VLA 2026-05-28

When Does LeJEPA Learn a World Model?

David Klindt et al. · arXiv:2605.26379v1 Announce Type: cross Abstract: A representation that scrambles the true degrees of freedom of the world cannot support reliable planning or compositional generalization. We prove that LeJEPA (alignment plus Gaussian regularization) linearly recovers the world's latent variables from nonlinear observations, a property known as linear identifiability, in a broad class of worlds where latents evolve under stationary, additive-noise transitions. Our main result is that among all s

cs.LG 閱讀原文

Background VLA 2026-05-28

Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints

Ting Gao et al. · 结合镜像下降和熵约束的流匹配策略优化方法，平衡探索与利用。属RL算法改进，未特指VLA架构集成。

cs.LG 閱讀原文

首頁首页 / VLA 日報VLA 日报 / 2026-05-28