<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
  xmlns:atom="http://www.w3.org/2005/Atom"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
  <title>Pulsar 照見 · VLA 每日信号</title>
  <link>https://sou350121.github.io/pulsar-web/daily</link>
  <atom:link href="https://sou350121.github.io/pulsar-web/rss/vla-daily.xml" rel="self" type="application/rss+xml" />
  <description>VLA 每日筛选的⚡🔧级论文 + SOTA 榜变动。仅保留 Pulsar 评级认为值得读的工作 · ❌和📖不进此 feed。</description>
  <language>zh-CN</language>
  <ttl>60</ttl>
  <lastBuildDate>Mon, 20 Apr 2026 04:53:39 GMT</lastBuildDate>
  <generator>Pulsar 照見 · https://github.com/sou350121/pulsar-web</generator>
  <copyright>Content under CC BY 4.0 — sou350121</copyright>
  <item>
    <title>⚡ Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion</title>
    <link>https://arxiv.org/abs/2511.14178</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2511.14178</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出无需微调的即插即用推理时策略引导框架，利用具身进化扩散解决部署性能下降核心痛点。若真能免除 fine-tuning 实现零样本部署，将彻底改变 VLA 落地范式。

摘要：arXiv:2511.14178v2 Announce Type: replace Abstract: Vision-Language-Action (VLA) models have demonstrated significant potential in real-world robotic manipulation. However, pre-trained VLA policies still suffer from substantial performance degradation during downstream deployment. Although fine-tuning can mitigate this issue, its reliance on costly demonstration collection and intensive computation makes it impractical in real-world settings. In this work, we introduce VLA-Pilot, a plug-and-play

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data</title>
    <link>https://arxiv.org/abs/2510.02738</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2510.02738</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：结合力场引导与 Flow Matching 学习顺应性策略，专攻接触丰富操作难题。提供从仿真到真实的力控策略迁移路径，触觉操作团队可直接复用其力觉数据处理管线。

摘要：arXiv:2510.02738v3 Announce Type: replace Abstract: While visuomotor policy has made advancements in recent years, contact-rich tasks still remain a challenge. Robotic manipulation tasks that require continuous contact demand explicit handling of compliance and force. However, most visuomotor policies ignore compliance, overlooking the importance of physical interaction with the real world, often leading to excessive contact forces or fragile behavior under uncertainty. Introducing force informa

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 Multi-Modal Manipulation via Multi-Modal Policy Consensus</title>
    <link>https://arxiv.org/abs/2509.23468</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2509.23468</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出多模态策略共识机制替代简单特征拼接，解决触觉/视觉信号主导失衡问题。针对接触丰富操作任务提供新融合架构，适合触觉 VLA 研究者本周集成测试。

摘要：arXiv:2509.23468v3 Announce Type: replace Abstract: Effectively integrating diverse sensory modalities is crucial for robotic manipulation. However, the typical approach of feature concatenation is often suboptimal: dominant modalities such as vision can overwhelm sparse but critical signals like touch in contact-rich tasks, and monolithic architectures cannot flexibly incorporate new or missing modalities without retraining. Our method factorizes the policy into a set of diffusion models, each

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 A Hierarchical Spatiotemporal Action Tokenizer for In-Context Imitation Learning in Robotics</title>
    <link>https://arxiv.org/abs/2604.15215</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.15215</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：设计分层时空动作分词器优化上下文模仿学习，通过两级矢量量化压缩动作序列。为 ACT/Diffusion Policy 提供即插即用模块，显著提升长序列任务推理速度与精度。

摘要：arXiv:2604.15215v1 Announce Type: new Abstract: We present a novel hierarchical spatiotemporal action tokenizer for in-context imitation learning. We first propose a hierarchical approach, which consists of two successive levels of vector quantization. In particular, the lower level assigns input actions to fine-grained subclusters, while the higher level further maps fine-grained subclusters to clusters. Our hierarchical approach outperforms the non-hierarchical counterpart, while mainly exploi

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation</title>
    <link>https://arxiv.org/abs/2604.15023</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.15023</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出新型演示生成方法以提升移动操作数据效率，解决导航与操作两阶段范式痛点。代码开源后可直接用于移动机械臂策略训练，本周内可复现数据增强流程。

摘要：arXiv:2604.15023v1 Announce Type: new Abstract: Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization problem due to shifts of docking points. To address

来源：cs.RO]]></description>
  </item>
  <item>
    <title>⚡ World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems</title>
    <link>https://arxiv.org/abs/2604.14732</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.14732</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出 WVA 新范式，将价值函数隐式规划引入 VLA，解决直接动作预测缺乏推理能力的瓶颈。标题明确指向 VLA 系统改进，若实验验证多任务泛化将显著超越当前 OpenVLA/RT-2 架构。

摘要：arXiv:2604.14732v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action. However, most existing approaches rely on direct action prediction, lacking the ability to reason over long-horizon trajectories and evaluate their consequences, which limits performance in complex decision-making tasks. In this work, we introduce World-Value-Action (WAV) model, a unified framework t

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations</title>
    <link>https://arxiv.org/abs/2511.04671</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2511.04671</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <category>[Berkeley]</category>
    <description><![CDATA[理由：利用跨具身的人类视频训练扩散策略，解决人机形态差异导致的数据利用难题。方法允许直接使用海量人类视频数据，大幅降低了机器人数据采集成本，实用性强。

摘要：arXiv:2511.04671v2 Announce Type: replace Abstract: Human videos are a scalable source of training data for robot learning. However, humans and robots significantly differ in embodiment, making many human actions infeasible for direct execution on a robot. Still, these demonstrations convey rich object-interaction cues and task intent. Our goal is to learn from this coarse guidance without transferring embodiment-specific, infeasible execution strategies. Recent advances in generative modeling t

来源：cs.RO]]></description>
  </item>
  <item>
    <title>⚡ HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy</title>
    <link>https://arxiv.org/abs/2510.00695</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2510.00695</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <category>[CMU]</category>
    <description><![CDATA[理由：首创将历史上下文显式融入 VLA 架构，解决传统 VLA 忽略时间依赖性的核心缺陷。在多个长程操作基准上显著优于 SOTA，提供了即插即用的历史感知模块，极具战略价值。

摘要：arXiv:2510.00695v3 Announce Type: replace Abstract: Inherently, robotic manipulation tasks are history-dependent: leveraging past context could be beneficial. However, most existing Vision-Language-Action models (VLAs) have been designed without considering this aspect, i.e., they rely solely on the current observation, ignoring preceding context. In this paper, we propose HAMLET, a scalable framework to adapt VLAs to attend to the historical context during action prediction. Specifically, we in

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception</title>
    <link>https://arxiv.org/abs/2604.14089</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.14089</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <category>[Stanford]</category>
    <description><![CDATA[理由：将 UMI 接口扩展至 3D 空间感知，解决单目 SLAM 在数据采集中的局限性。提供了新的多模态数据采集方案和硬件集成思路，对构建高质量数据集有直接帮助。

摘要：arXiv:2604.14089v1 Announce Type: new Abstract: We present UMI-3D, a multimodal extension of the Universal Manipulation Interface (UMI) for robust and scalable data collection in embodied manipulation. While UMI enables portable, wrist-mounted data acquisition, its reliance on monocular visual SLAM makes it vulnerable to occlusions, dynamic scenes, and tracking failures, limiting its applicability in real-world environments. UMI-3D addresses these limitations by introducing a lightweight and low

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 Goal2Skill: Long-Horizon Manipulation with Adaptive Planning and Reflection</title>
    <link>https://arxiv.org/abs/2604.13942</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.13942</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：针对 VLA 在长程任务中的脆弱性，提出结合自适应规划与反思的分层系统。通过动态调整观察窗口和引入反思机制，显著提升了复杂任务的完成率，架构可直接借鉴。

摘要：arXiv:2604.13942v1 Announce Type: new Abstract: Recent vision-language-action (VLA) systems have demonstrated strong capabilities in embodied manipulation. However, most existing VLA policies rely on limited observation windows and end-to-end action prediction, which makes them brittle in long-horizon, memory-dependent tasks with partial observability, occlusions, and multi-stage dependencies. Such tasks require not only precise visuomotor control, but also persistent memory, adaptive task decom

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies</title>
    <link>https://arxiv.org/abs/2604.13645</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.13645</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：深入分析仿真与真实数据协同训练的内在机制，揭示决定泛化性能的关键因素。为混合数据训练策略提供了理论依据和调参指导，具有明确的工程参考价值。

摘要：arXiv:2604.13645v1 Announce Type: new Abstract: Co-training, which combines limited in-domain real-world data with abundant surrogate data such as simulation or cross-embodiment robot data, is widely used for training generative robot policies. Despite its empirical success, the mechanisms that determine when and why co-training is effective remain poorly understood. We investigate the mechanism of sim-and-real co-training through theoretical analysis and empirical study, and identify two intrin

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation</title>
    <link>https://tldr.takara.ai/p/2604.13633?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily</link>
    <guid isPermaLink="false">hotspot:https://tldr.takara.ai/p/2604.13633</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：针对移动操作中的长程遗忘问题，提出基于情景空间记忆的自适应执行策略。通过显式记忆模块维持空间一致性，为移动抓取任务提供了可复用的记忆增强架构参考。

摘要：Coordinating navigation and manipulation with robust performance is essential for embodied AI in complex indoor environments. However, as tasks extend over long horizons, existing methods often struggle due to catastrophic forgetting, spatial inconsistency, and rigid execution. To address these issues, we propose ESCAPE (Episodic Spatial Memory Coupled with an Adaptive Policy for Execution), operating through a tightly coupled perception-grounding-execution workflow. For robust perception, ESCAP

来源：hf-papers]]></description>
  </item>
  <item>
    <title>🔧 Jump-Start Reinforcement Learning with Vision-Language-Action Regularization</title>
    <link>https://tldr.takara.ai/p/2604.13733?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily</link>
    <guid isPermaLink="false">hotspot:https://tldr.takara.ai/p/2604.13733</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出利用 VLA 模型作为正则化项来引导 RL 探索，解决长程任务稀疏奖励难题。核心亮点是将 VLA 的先验知识融入 RL 训练循环，代码开源后可直接用于提升现有 RL 策略的样本效率。

摘要：Reinforcement learning (RL) enables high-frequency, closed-loop control for robotic manipulation, but scaling to long-horizon tasks with sparse or imperfect rewards remains difficult due to inefficient exploration and poor credit assignment. Vision-Language-Action (VLA) models leverage large-scale multimodal pretraining to provide generalist, task-level reasoning, but current limitations hinder their direct use in fast and precise manipulation. In this paper, we propose Vision-Language-Action Ju

来源：hf-papers]]></description>
  </item>
  <item>
    <title>🔧 Iterative Compositional Data Generation for Robot Control</title>
    <link>https://arxiv.org/abs/2512.10891</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2512.10891</guid>
    <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出迭代组合式数据生成方法，解决多物体、多环境下机器人操控数据收集昂贵问题。可直接用于扩充训练数据集，降低数据采集成本。

摘要：arXiv:2512.10891v4 Announce Type: replace Abstract: Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compos

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios</title>
    <link>https://arxiv.org/abs/2604.13001</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.13001</guid>
    <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：针对灵巧操作数据瓶颈，提出结合 XR 接口的高质量数据采集新方法，旨在扩展基础模型训练数据规模。为数据稀缺问题提供明确的可复用工程路径。

摘要：arXiv:2604.13001v1 Announce Type: new Abstract: The acquisition of high-quality, action-aligned demonstration data remains a fundamental bottleneck in scaling foundation models for dexterous robot manipulation. Although robot-free human demonstrations (e.g., the UMI paradigm) offer a scalable alternative to traditional teleoperation, current systems are constrained by sub-optimal hardware ergonomics, open-loop workflows, and a lack of systematic data-mixing strategies. To address these limitatio

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models</title>
    <link>https://arxiv.org/abs/2604.12447</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.12447</guid>
    <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：构建 HazardArena 基准，专门评估 VLA 模型在语义安全层面的表现，填补现有评测仅关注执行成功率的空白。提供可复用的安全评估框架与数据集。

摘要：arXiv:2604.12447v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models inherit rich world knowledge from vision-language backbones and acquire executable skills via action demonstrations. However, existing evaluations largely focus on action execution success, leaving action policies loosely coupled with visual-linguistic semantics. This decoupling exposes a systematic vulnerability whereby correct action execution may induce unsafe outcomes under semantic risk. To expose this vulne

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 Dejavu: Towards Experience Feedback Learning for Embodied Intelligence</title>
    <link>https://arxiv.org/abs/2510.10181</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2510.10181</guid>
    <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出部署后经验反馈学习框架，使具身智能体能在真实环境中持续获取新知识并提升性能。解决了 VLA 模型部署后无法进化的痛点，具有明确的在线学习应用价值。

摘要：arXiv:2510.10181v3 Announce Type: replace Abstract: Embodied agents face a fundamental limitation: once deployed in real-world environments, they cannot easily acquire new knowledge to improve task performance. In this paper, we propose Dejavu, a general post-deployment learning framework that augments a frozen Vision-Language-Action (VLA) policy with retrieved execution memories through an Experience Feedback Network (EFN). EFN identifies contextually relevant prior action experiences and condi

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 2D or 3D: Who Governs Salience in VLA Models? -- Tri-Stage Token Pruning Framework with Modality Salience Awareness</title>
    <link>https://arxiv.org/abs/2604.09244</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.09244</guid>
    <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出三阶段 Token 剪枝框架，动态评估 2D/3D 模态显著性以优化 MVLA 模型效率。为多模态 VLA 部署提供了具体的推理加速方案，代码有望快速集成。

摘要：arXiv:2604.09244v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have emerged as the mainstream of embodied intelligence. Recent VLA models have expanded their input modalities from 2D-only to 2D+3D paradigms, forming multi-visual-modal VLA (MVLA) models. Despite achieving improved spatial perception, MVLA faces a greater acceleration demand due to the increased number of input tokens caused by modal expansion. Token pruning is an effective optimization methods tailored to M

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches</title>
    <link>https://arxiv.org/abs/2604.08945</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.08945</guid>
    <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：利用扩散模型引导稀疏触觉信号进行 3D 重建，解决遮挡下的几何估计难题。直接服务于触觉 VLA 方向，提供了一套从稀疏触感到完整几何的可用工具链。

摘要：arXiv:2604.08945v1 Announce Type: cross Abstract: Accurate object geometry estimation is essential for many downstream tasks, including robotic manipulation and physical interaction. Although vision is the dominant modality for shape perception, it becomes unreliable under occlusions or challenging lighting conditions. In such scenarios, tactile sensing provides direct geometric information through physical contact. However, reconstructing global 3D geometry from sparse local touches alone is fu

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 V-CAGE: Vision-Closed-Loop Agentic Generation Engine for Robotic Manipulation</title>
    <link>https://arxiv.org/abs/2604.09036</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.09036</guid>
    <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出视觉闭环代理生成引擎，通过上下文感知的场景生成方法合成高保真训练数据以扩展 VLA 模型。为 VLA 研究者提供了新的数据合成与增强路径，本周可尝试复现数据生成流程。

摘要：arXiv:2604.09036v1 Announce Type: new Abstract: Scaling Vision-Language-Action (VLA) models requires massive datasets that are both semantically coherent and physically feasible. However, existing scene generation methods often lack context-awareness, making it difficult to synthesize high-fidelity environments embedded with rich semantic information, frequently resulting in unreachable target positions that cause tasks to fail prematurely. We present V-CAGE (Vision-Closed-loop Agentic Generatio

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 BLaDA: Bridging Language to Functional Dexterous Actions within 3DGS Fields</title>
    <link>https://arxiv.org/abs/2604.08410</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.08410</guid>
    <pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：在 3D 高斯场中实现语言到灵巧动作的映射，结合语义理解与 3D 定位进行功能抓取。为非结构化环境下的灵巧操作提供了新的场景表示与执行框架。

摘要：arXiv:2604.08410v1 Announce Type: cross Abstract: In unstructured environments, functional dexterous grasping calls for the tight integration of semantic understanding, precise 3D functional localization, and physically interpretable execution. Modular hierarchical methods are more controllable and interpretable than end-to-end VLA approaches, but existing ones still rely on predefined affordance labels and lack the tight semantic--pose coupling needed for functional dexterous manipulation. To a

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 ViVa: A Video-Generative Value Model for Robot Reinforcement Learning</title>
    <link>https://arxiv.org/abs/2604.08168</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.08168</guid>
    <pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：结合视频生成与价值函数学习，提出 ViVa 模型以解决 VLA 在部分可观测环境下的长程反馈问题。为 VLA+RL 精调提供了新的价值估计范式，代码开源后可直接复用。

摘要：arXiv:2604.08168v1 Announce Type: new Abstract: Vision-language-action (VLA) models have advanced robot manipulation through large-scale pretraining, but real-world deployment remains challenging due to partial observability and delayed feedback. Reinforcement learning addresses this via value functions, which assess task progress and guide policy improvement. However, existing value models built on vision-language models (VLMs) struggle to capture temporal dynamics, undermining reliable value e

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation</title>
    <link>https://arxiv.org/abs/2604.07993</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.07993</guid>
    <pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出 HEX 框架，通过专家对齐策略解决高自由度人形机器人全身控制难题，突破传统 VLA 对肢体独立建模的限制。为人形机器人 VLA 部署提供了新的跨本体迁移思路。

摘要：arXiv:2604.07993v1 Announce Type: new Abstract: Humans achieve complex manipulation through coordinated whole-body control, whereas most Vision-Language-Action (VLA) models treat robot body parts largely independently, making high-DoF humanoid control challenging and often unstable. We present HEX, a state-centric framework for coordinated manipulation on full-sized bipedal humanoid robots. HEX introduces a humanoid-aligned universal state representation for scalable learning across heterogeneou

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring</title>
    <link>https://arxiv.org/abs/2604.07395</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.07395</guid>
    <pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：针对语言引导抓取中的执行失败问题，提出包含状态监控的物理代理闭环框架，支持重试与修正。提供了可复用的执行监控模块设计，适合集成到现有 VLA 推理流程中。

摘要：arXiv:2604.07395v1 Announce Type: new Abstract: Robotic manipulation systems that follow language instructions often execute grasp primitives in a largely single-shot manner: a model proposes an action, the robot executes it, and failures such as empty grasps, slips, stalls, timeouts, or semantically wrong grasps are not surfaced to the decision layer in a structured way. Inspired by agentic loops in digital tool-using agents, we reformulate language-guided grasping as a bounded embodied agent o

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks</title>
    <link>https://arxiv.org/abs/2604.07335</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.07335</guid>
    <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：针对手持设备难以收集双臂接触丰富任务数据的问题，提出触觉感知的闭环数据采集引擎。直接服务于触觉 VLA 训练数据瓶颈，提供了具体的数据采集新范式。

摘要：arXiv:2604.07335v1 Announce Type: new Abstract: Handheld paradigms offer an efficient and intuitive way for collecting large-scale demonstration of robot manipulation. However, achieving contact-rich bimanual manipulation through these methods remains a pivotal challenge, which is substantially hindered by hardware adaptability and data efficacy. Prior hardware designs remain gripper-specific and often face a trade-off between tracking precision and portability. Furthermore, the lack of online f

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild</title>
    <link>https://arxiv.org/abs/2604.07331</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.07331</guid>
    <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <category>[CMU]</category>
    <description><![CDATA[理由：推出便携式全身动捕套装 RoSHI，旨在野外收集长程人机交互数据以扩展机器人学习数据集。解决了现有方案在便携性与鲁棒性上的权衡，为 VLA 数据收集提供新硬件路径。

摘要：arXiv:2604.07331v1 Announce Type: new Abstract: Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to occlusion, and global consistency. We introduce RoSHI, a hybrid wearable that fuses low-cost sparse IMUs with the Project Aria glasses to estimate the full 3D pose and body shape of the wearer in a metric global coordinate frame from egocentric perceptio

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 SnapFlow: One-Step Action Generation for Flow-Matching VLAs via Progressive Self-Distillation</title>
    <link>https://arxiv.org/abs/2604.05656</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.05656</guid>
    <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <category>[Physical Intelligence]</category>
    <description><![CDATA[理由：通过渐进自蒸馏实现流匹配 VLA 的单步动作生成，大幅降低推理延迟，代码开源后可直接替换现有解码器。

摘要：arXiv:2604.05656v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models based on flow matching -- such as pi0, pi0.5, and SmolVLA -- achieve state-of-the-art generalist robotic manipulation, yet their iterative denoising, typically 10 ODE steps, introduces substantial latency: on a modern GPU, denoising alone accounts for 80% of end-to-end inference time. Naively reducing the step count is unreliable, degrading success on most tasks due to the velocity field being uncalibrated for

来源：cs.AI]]></description>
  </item>
  <item>
    <title>🔧 On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning</title>
    <link>https://arxiv.org/abs/2601.06748</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2601.06748</guid>
    <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：利用测试时强化学习实现 VLA 即时自适应，解决分布偏移问题，提供可集成的在线微调算法框架。

摘要：arXiv:2601.06748v3 Announce Type: replace Abstract: Vision-Language-Action models have recently emerged as a powerful paradigm for general-purpose robot learning, enabling agents to map visual observations and natural-language instructions into executable robotic actions. Though popular, they are primarily trained via supervised fine-tuning or training-time reinforcement learning, requiring explicit fine-tuning phases, human interventions, or controlled data collection. Consequently, existing me

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination</title>
    <link>https://arxiv.org/abs/2604.05831</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.05831</guid>
    <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：发布面向长程时空协调的双臂操作基准 BiCoord，填补现有基准在复杂协作任务上的空白，立即可用。

摘要：arXiv:2604.05831v1 Announce Type: new Abstract: Bimanual manipulation, i.e., the coordinated use of two robotic arms to complete tasks, is essential for achieving human-level dexterity in robotics. Recent simulation benchmarks, e.g., RoboTwin and RLBench2, have advanced data-driven learning for bimanual manipulation. However, existing tasks are short-horizon and only loosely coordinated, failing to capture the spatial-temporal coupling inherent in real-world bimanual behaviors. To address this g

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🔧 A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model</title>
    <link>https://arxiv.org/abs/2604.05672</link>
    <guid isPermaLink="false">hotspot:https://arxiv.org/abs/2604.05672</guid>
    <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>

    <category>VLA 每日热点</category>
    <category>actionable</category>
    <description><![CDATA[理由：提出截断式高效 VLA 架构 A1，显著降低推理成本并开源代码，适合资源受限场景的快速部署尝试。

摘要：arXiv:2604.05672v1 Announce Type: new Abstract: Vision--Language--Action (VLA) models have emerged as a powerful paradigm for open-world robot manipulation, but their practical deployment is often constrained by \emph{cost}: billion-scale VLM backbones and iterative diffusion/flow-based action heads incur high latency and compute, making real-time control expensive on commodity hardware. We present A1, a fully open-source and transparent VLA framework designed for low-cost, high-throughput infer

来源：cs.RO]]></description>
  </item>
  <item>
    <title>🏆 SOTA: StarVLA-alpha (generalist) on RoboCasa-GR1-Tabletop — avg_success_rate 57.3</title>
    <link>https://sota.evomind-tech.com/benchmarks/robocasa_gr1_tabletop/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:RoboCasa-GR1-Tabletop:StarVLA-alpha (generalist):2026-04-18</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>RoboCasa-GR1-Tabletop</category>
    <description><![CDATA[Benchmark：RoboCasa-GR1-Tabletop · split=standard-closed
指标：avg_success_rate = 57.3
模型：StarVLA-alpha (generalist) · 2604.11757]]></description>
  </item>
  <item>
    <title>🏆 SOTA: StarVLA-alpha (generalist) on RoboChallenge — score 54.5</title>
    <link>https://sota.evomind-tech.com/benchmarks/robochallenge/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:RoboChallenge:StarVLA-alpha (generalist):2026-04-18</guid>
    <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>RoboChallenge</category>
    <description><![CDATA[Benchmark：RoboChallenge · split=non-standard
指标：score = 54.5
模型：StarVLA-alpha (generalist) · 2604.11757]]></description>
  </item>
  <item>
    <title>🏆 SOTA: AVA-VLA on CALVIN — avg_len 4.65</title>
    <link>https://sota.evomind-tech.com/benchmarks/calvin/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:CALVIN:AVA-VLA:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>CALVIN</category>
    <description><![CDATA[Benchmark：CALVIN · split=ABC-D
指标：avg_len = 4.65
模型：AVA-VLA · 2511.18960]]></description>
  </item>
  <item>
    <title>🏆 SOTA: MMaDA-VLA on CALVIN — avg_len 4.78</title>
    <link>https://sota.evomind-tech.com/benchmarks/calvin/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:CALVIN:MMaDA-VLA:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>CALVIN</category>
    <description><![CDATA[Benchmark：CALVIN · split=ABC-D
指标：avg_len = 4.78
模型：MMaDA-VLA · 2603.25406]]></description>
  </item>
  <item>
    <title>🏆 SOTA: NS-VLA on CALVIN — avg_len 4.56</title>
    <link>https://sota.evomind-tech.com/benchmarks/calvin/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:CALVIN:NS-VLA:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>CALVIN</category>
    <description><![CDATA[Benchmark：CALVIN · split=ABC-D
指标：avg_len = 4.56
模型：NS-VLA · 2603.09542]]></description>
  </item>
  <item>
    <title>🏆 SOTA: GR-2 on CALVIN — avg_len 4.64</title>
    <link>https://sota.evomind-tech.com/benchmarks/calvin/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:CALVIN:GR-2:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>CALVIN</category>
    <description><![CDATA[Benchmark：CALVIN · split=ABCD-D
指标：avg_len = 4.64
模型：GR-2 · 2410.06158]]></description>
  </item>
  <item>
    <title>🏆 SOTA: Xiaomi-Robotics-0 on CALVIN — avg_len 4.8</title>
    <link>https://sota.evomind-tech.com/benchmarks/calvin/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:CALVIN:Xiaomi-Robotics-0:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>CALVIN</category>
    <description><![CDATA[Benchmark：CALVIN · split=ABCD-D
指标：avg_len = 4.8
模型：Xiaomi-Robotics-0 · paper]]></description>
  </item>
  <item>
    <title>🏆 SOTA: Flower VLA on CALVIN — avg_len 4.35</title>
    <link>https://sota.evomind-tech.com/benchmarks/calvin/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:CALVIN:Flower VLA:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>CALVIN</category>
    <description><![CDATA[Benchmark：CALVIN · split=D-D
指标：avg_len = 4.35
模型：Flower VLA · 2509.04996]]></description>
  </item>
  <item>
    <title>🏆 SOTA: MCIL on CALVIN — avg_len 1.82</title>
    <link>https://sota.evomind-tech.com/benchmarks/calvin/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:CALVIN:MCIL:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>CALVIN</category>
    <description><![CDATA[Benchmark：CALVIN · split=D-D
指标：avg_len = 1.82
模型：MCIL · 2005.07648]]></description>
  </item>
  <item>
    <title>🏆 SOTA: PLD on LIBERO — average 99.17</title>
    <link>https://sota.evomind-tech.com/benchmarks/libero/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:LIBERO:PLD:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>LIBERO</category>
    <description><![CDATA[Benchmark：LIBERO · split=non-standard
指标：average = 99.17
模型：PLD · 2511.00091]]></description>
  </item>
  <item>
    <title>🏆 SOTA: Dual-CoT VLA on LIBERO — average 98.8</title>
    <link>https://sota.evomind-tech.com/benchmarks/libero/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:LIBERO:Dual-CoT VLA:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>LIBERO</category>
    <description><![CDATA[Benchmark：LIBERO · split=standard-closed
指标：average = 98.8
模型：Dual-CoT VLA · 2603.22280]]></description>
  </item>
  <item>
    <title>🏆 SOTA: CORAL on LIBERO — average 99.3</title>
    <link>https://sota.evomind-tech.com/benchmarks/libero/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:LIBERO:CORAL:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>LIBERO</category>
    <description><![CDATA[Benchmark：LIBERO · split=standard-opensource
指标：average = 99.3
模型：CORAL · 2603.09298]]></description>
  </item>
  <item>
    <title>🏆 SOTA: NS-VLA on LIBERO Plus — total 79.4</title>
    <link>https://sota.evomind-tech.com/benchmarks/liberoplus/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:LIBERO Plus:NS-VLA:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>LIBERO Plus</category>
    <description><![CDATA[Benchmark：LIBERO Plus · split=non-standard
指标：total = 79.4
模型：NS-VLA · 2603.09542]]></description>
  </item>
  <item>
    <title>🏆 SOTA: TAG on LIBERO Plus — total 87.24</title>
    <link>https://sota.evomind-tech.com/benchmarks/liberoplus/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:LIBERO Plus:TAG:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>LIBERO Plus</category>
    <description><![CDATA[Benchmark：LIBERO Plus · split=standard-closed
指标：total = 87.24
模型：TAG · 2603.24584]]></description>
  </item>
  <item>
    <title>🏆 SOTA: ACoT-VLA on LIBERO Plus — total 86.6</title>
    <link>https://sota.evomind-tech.com/benchmarks/liberoplus/?utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=vla-daily-sota</link>
    <guid isPermaLink="false">sota:LIBERO Plus:ACoT-VLA:2026-04-17</guid>
    <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>

    <category>SOTA 榜</category>
    <category>LIBERO Plus</category>
    <description><![CDATA[Benchmark：LIBERO Plus · split=standard-opensource
指标：total = 86.6
模型：ACoT-VLA · 2601.11404]]></description>
  </item>
</channel>
</rss>