Skip to content

VLA 線 · Vision-Language-Action

VLA 研究日報VLA 研究日报

VISION-LANGUAGE-ACTION · cs.RO + cs.AI + cs.LG


Vision-Language-Action(VLA)機器人系統 — 整合 cs.RO、cs.AI、cs.LG 三條 arxiv 流。 重點追蹤 flow matching、世界模型、具身推理等前沿方向,每日 09:00 CST 由 Qwen3.5-Plus 自動評級Vision-Language-Action(VLA)机器人系统 — 整合 cs.RO、cs.AI、cs.LG 三条 arxiv 流。 重点追踪 flow matching、世界模型、具身推理等前沿方向,每日 09:00 CST 由 Qwen3.5-Plus 自动评级

— 2026 年 6 月 —
今天
🔧 14 📖 16

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

30 篇
昨天
🔧 10 📖 18

Robots Need More than VLA and World Models

28 篇
3 天前
🔧 2

Let It Be Simple: One-Step Action Generation for Vision-Language-Action Models

2 篇
5 天前
🔧 11 📖 13

VISTA: Vision-Grounded and Physics-Validated Adaptation of UMI data for VLA Training

24 篇
6 天前
🔧 12 📖 15

See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs

27 篇
7 天前
🔧 10 📖 19

World-Task Factorization for Robot Learning

29 篇
🔧 18 📖 9

ELAN4D: Embodiment-Centric 4D Supervision for Vision-Language-Action Models via Plug-and-Play Adaptation

27 篇
— 2026 年 5 月 —
⚡ 1 🔧 8 📖 20

GEM: Generative Supervision Helps Embodied Intelligence

29 篇
🔧 13 📖 13

GEM: Generative Supervision Helps Embodied Intelligence

26 篇
🔧 9 📖 17

PhyPush: One Push is All You Need for Sensorless Physical Property Estimation with Physics-Guided Transformers

26 篇
⚡ 1 🔧 12 📖 17

EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models

30 篇
🔧 10 📖 11

Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

21 篇
📖 1

Bioinspired ionic thermoreceptors with anisotropic architecture for thermotactile perception in robots

1 篇
🔧 12 📖 11

Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation

23 篇
⚡ 1 🔧 15 📖 26

Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation

42 篇
🔧 9 📖 25

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

34 篇
⚡ 1 🔧 9 📖 18

Key-Gram: Extensible World Knowledge for Embodied Manipulation

28 篇
🔧 11 📖 20

PhysBrain 1.0 Technical Report

31 篇
🔧 10 📖 23

SECOND-Grasp: Semantic Contact-guided Dexterous Grasping

33 篇
🔧 11 📖 17

StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception

28 篇
🔧 12 📖 15

StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception

27 篇
⚡ 1 🔧 14 📖 15

BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation

30 篇
🔧 10 📖 17

VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

27 篇
⚡ 1 🔧 7 📖 11

From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models

19 篇
🔧 11 📖 13

RLDX-1 Technical Report

24 篇
⚡ 1 🔧 11 📖 17

Online Safety Filter for Deformable Object Manipulation with Horizon Agnostic Neural Operators

29 篇
🔧 9 📖 9

Being-H0.7: A Latent World-Action Model from Egocentric Videos

18 篇
📖 1

Continuum tactile sensing via an amplified liquid metal interface

1 篇
📖 1

Graph World Models: Concepts, Taxonomy, and Future Directions

1 篇
🔧 9 📖 15

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

24 篇
🔧 8 📖 10

Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer

18 篇
— 2026 年 4 月 —
🔧 13 📖 19

Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer

32 篇
🔧 12 📖 13

AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents

25 篇
🔧 6 📖 8

CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models

14 篇
📖 2

Closed-loop tactile-visual interactivity via chip-free luminescent fibers enabled by capacitive coupling

2 篇
🔧 1

From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

1 篇
🔧 12 📖 16

From embodied intelligence to physical AI

28 篇
🔧 15 📖 24

Object-centric task representation and transfer using diffused orientation fields

39 篇
🔧 6 📖 14

FASTER: Value-Guided Sampling for Fast RL

20 篇
🔧 21 📖 8

Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer

29 篇
⚡ 2 🔧 4 📖 9

Model-Based Reinforcement Learning Exploits Passive Body Dynamics for High-Performance Biped Robot Locomotion

15 篇
⚡ 1 🔧 7 📖 12

Jump-Start Reinforcement Learning with Vision-Language-Action Regularization

20 篇
⚡ 1 🔧 5 📖 11

Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting

17 篇
🔧 5 📖 12

AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly

17 篇
📖 2

LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation

2 篇
🔧 5 📖 23

LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation

28 篇
⚡ 1 🔧 5 📖 14

RichMap: A Reachability Map Balancing Precision, Efficiency, and Flexibility for Rich Robot Manipulation Tasks

20 篇
🔧 8 📖 18

ICR-Drive: Instruction Counterfactual Robustness for End-to-End Language-Driven Autonomous Driving

26 篇
🔧 6 📖 31

Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking

37 篇
🔧 3 📖 12

F2F-AP: Flow-to-Future Asynchronous Policy for Real-time Dynamic Manipulation

15 篇
🔧 6 📖 19

Causal Scene Narration with Runtime Safety Supervision for Vision-Language-Action Driving

25 篇
🔧 5 📖 5

Multi-Camera View Scaling for Data-Efficient Robot Imitation Learning

10 篇
🔧 2 📖 6

Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning

8 篇
🔧 5 📖 23

ViPRA: Video Prediction for Robot Actions

28 篇
— 2026 年 3 月 —
🔧 3 📖 14

MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

17 篇