VLA 線 · Vision-Language-Action
VLA 研究日報VLA 研究日报
VISION-LANGUAGE-ACTION · cs.RO + cs.AI + cs.LG
Vision-Language-Action(VLA)機器人系統 — 整合 cs.RO、cs.AI、cs.LG 三條 arxiv 流。 重點追蹤 flow matching、世界模型、具身推理等前沿方向,每日 09:00 CST 由 Qwen3.5-Plus 自動評級Vision-Language-Action(VLA)机器人系统 — 整合 cs.RO、cs.AI、cs.LG 三条 arxiv 流。 重点追踪 flow matching、世界模型、具身推理等前沿方向,每日 09:00 CST 由 Qwen3.5-Plus 自动评级。
VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation
Robots Need More than VLA and World Models
Let It Be Simple: One-Step Action Generation for Vision-Language-Action Models
VISTA: Vision-Grounded and Physics-Validated Adaptation of UMI data for VLA Training
See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs
World-Task Factorization for Robot Learning
ELAN4D: Embodiment-Centric 4D Supervision for Vision-Language-Action Models via Plug-and-Play Adaptation
GEM: Generative Supervision Helps Embodied Intelligence
GEM: Generative Supervision Helps Embodied Intelligence
PhyPush: One Push is All You Need for Sensorless Physical Property Estimation with Physics-Guided Transformers
EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models
Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models
Bioinspired ionic thermoreceptors with anisotropic architecture for thermotactile perception in robots
Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR
Key-Gram: Extensible World Knowledge for Embodied Manipulation
PhysBrain 1.0 Technical Report
SECOND-Grasp: Semantic Contact-guided Dexterous Grasping
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation
VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts
From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models
RLDX-1 Technical Report
Online Safety Filter for Deformable Object Manipulation with Horizon Agnostic Neural Operators
Being-H0.7: A Latent World-Action Model from Egocentric Videos
Continuum tactile sensing via an amplified liquid metal interface
Graph World Models: Concepts, Taxonomy, and Future Directions
World2Minecraft: Occupancy-Driven Simulated Scenes Construction
Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer
Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer
AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents
CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models
Closed-loop tactile-visual interactivity via chip-free luminescent fibers enabled by capacitive coupling
From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges
From embodied intelligence to physical AI
Object-centric task representation and transfer using diffused orientation fields
FASTER: Value-Guided Sampling for Fast RL
Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer
Model-Based Reinforcement Learning Exploits Passive Body Dynamics for High-Performance Biped Robot Locomotion
Jump-Start Reinforcement Learning with Vision-Language-Action Regularization
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly
LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation
LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation
RichMap: A Reachability Map Balancing Precision, Efficiency, and Flexibility for Rich Robot Manipulation Tasks
ICR-Drive: Instruction Counterfactual Robustness for End-to-End Language-Driven Autonomous Driving
Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking
F2F-AP: Flow-to-Future Asynchronous Policy for Real-time Dynamic Manipulation
Causal Scene Narration with Runtime Safety Supervision for Vision-Language-Action Driving
Multi-Camera View Scaling for Data-Efficient Robot Imitation Learning
Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning
ViPRA: Video Prediction for Robot Actions