VLA 線 · Vision-Language-Action
VLA 研究日報VLA 研究日报
VISION-LANGUAGE-ACTION · cs.RO + cs.AI + cs.LG
Vision-Language-Action(VLA)機器人系統 — 整合 cs.RO、cs.AI、cs.LG 三條 arxiv 流。 重點追蹤 flow matching、世界模型、具身推理等前沿方向,每日 09:00 CST 由 Qwen3.5-Plus 自動評級Vision-Language-Action(VLA)机器人系统 — 整合 cs.RO、cs.AI、cs.LG 三条 arxiv 流。 重点追踪 flow matching、世界模型、具身推理等前沿方向,每日 09:00 CST 由 Qwen3.5-Plus 自动评级。
Object-centric task representation and transfer using diffused orientation fields
FASTER: Value-Guided Sampling for Fast RL
Demonstrate once, execute on many: Kinematic intelligence for cross-robot skill transfer
Model-Based Reinforcement Learning Exploits Passive Body Dynamics for High-Performance Biped Robot Locomotion
Jump-Start Reinforcement Learning with Vision-Language-Action Regularization
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly
LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation
LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation
RichMap: A Reachability Map Balancing Precision, Efficiency, and Flexibility for Rich Robot Manipulation Tasks
ICR-Drive: Instruction Counterfactual Robustness for End-to-End Language-Driven Autonomous Driving
Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking
F2F-AP: Flow-to-Future Asynchronous Policy for Real-time Dynamic Manipulation
Causal Scene Narration with Runtime Safety Supervision for Vision-Language-Action Driving
Multi-Camera View Scaling for Data-Efficient Robot Imitation Learning
Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning
ViPRA: Video Prediction for Robot Actions
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
FODMP: Fast One-Step Diffusion of Movement Primitives Generation for Time-Dependent Robot Actions
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning
Causal World Modeling for Robot Control
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
FASTER: Rethinking Real-Time Flow VLAs
Grounding Robot Generalization in Training Data via Retrieval-Augmented VLMs
DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping
Panoramic Affordance Prediction
MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers
Robot-mediated haptic feedback outperforms vision in violin duo coordination
Cross-embodied Co-design for Dexterous Hands
RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning
Evolving Diffusion and Flow Matching Policies for Online Reinforcement Learning
What if? Emulative Simulation with World Models for Situated Reasoning
Beyond the Patch: Exploring Vulnerabilities of Visuomotor Policies via Viewpoint-Consistent 3D Adversarial Object
Beyond Pixel Histories: World Models with Persistent 3D State
Next Embedding Prediction Makes World Models Stronger
Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parameteric Policies
UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models
When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering