ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning

Abstract

ManiFeel, a reproducible and scalable simulation benchmark for studying supervised visuotactile manipulation policies across a diverse set of tasks and scenarios. ManiFeel presents a comprehensive benchmark suite spanning a diverse set of manipulation tasks, evaluating various policies, input modalities, and tactile representation methods. Through extensive experiments, our analysis reveals key factors that influence supervised visuotactile policy learning, identifies the types of tasks where tactile sensing is most beneficial, and highlights promising directions for future research in visuotactile policy learning. ManiFeel aims to establish a reproducible benchmark for supervised visuotactile policy learning, supporting progress in visuotactile manipulation and perception. To facilitate future research and ensure reproducibility, we will release our codebase, datasets, training logs, and pretrained checkpoints.

Tactile Reactive Experiments

We evaluate visuotactile and vision-only policies in a real-world tactile sorting task. The robot must distinguish between two visually similar objects—a table tennis ball and a golf ball—and place them into designated bowls.

Under human disturbances during rollouts, the visuotactile policy demonstrates adaptive behavior: when a ball is removed, the robot detects the loss and actively re-initiates the search. In contrast, the vision-only policy fails to respond and cannot recover.

Visuotactile

Vision-only

ManiFeel Benchmark: Tasks and Policy Rollouts in Simulation

We present a diverse suite of manipulation tasks to systematically evaluate supervised visuotactile policy learning across varying sensory conditions, ranging from clear vision to visually degraded or fully occluded scenarios. The tasks are organized into six categories based on their reliance on visual and tactile feedback.

Manipulation in Occluded and Low-Light Scenarios

These tasks require strong tactile sensing to handle scenarios with visual occlusion or low lighting. They highlight how tactile feedback enables robust manipulation when visual input is unreliable or unavailable.

Successful Rollouts of Tactile-augmented Policy

Blind Insertion
Occluded Vision

Tactile Exploration
Occluded Vision

Tactile Sorting
Low-Light Vision

Unsuccessful Rollouts of Vision-only Policy

Blind Insertion
Occluded Vision

Tactile Exploration
Occluded Vision

Tactile Sorting
Low-Light Vision

Contact-Rich Manipulation

These tasks represent standardized assembly and insertion scenarios that are supported by our benchmark. They help identify key challenges in visual-tactile integration and highlight the potential of visuotactile policies in complex manipulation settings.

Successful Rollouts of Visuotactile Policy

Gear Assembly

Power Plug Insertion

USB Insertion

Real-World Policy Rollouts

Performance Under Varying Lighting Conditions in Tactile Sorting

We evaluate the effectiveness of visuotactile and vision-only policies under different lighting conditions in a real-world tactile sorting task. The robot must distinguish between two visually similar objects—a table tennis ball and a golf ball—and place them into the correct target bowls. Please refer to our paper for detailed results.

Normal Lighting

Visuotactile Policy Rollouts

Success Rate: 13/15

Vision-only Policy Rollouts

Success Rate: 5/15

Dim Lighting

Visuotactile Policy Rollouts

Success Rate: 5/15

Vision-only Policy Rollouts

Success Rate: 1/15

Failure Cases

Failure cases in the visuotactile policy often stem from controller overshoot—although the object is correctly identified, it may be dropped slightly outside the target bowl. In contrast, due to the absence of tactile feedback, the vision-only policy frequently misidentifies the object or fails to determine the correct placement location, resulting in incorrect sorting.

Visuotactile Failures

Normal Lighting

Dim Lighting

Vision-only Failures

Normal Lighting

Dim Lighting

Sim-to-Real Comparison: TacSL Simulator vs. GelSight Mini

We compare tactile images generated in the TacSL simulator with those captured using the GelSight Mini sensor. The examples below feature several representative objects from the ManiFeel benchmark and demonstrate the visual and structural similarities between simulated and real tactile feedback.

Related Work

Go our lab website for more information about our research: Purdue MARS Lab.

BibTeX


@misc{luu2025manifeelbenchmarkingunderstandingvisuotactile,
  title        = {ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning},
  author       = {Quan Khanh Luu and Pokuang Zhou and Zhengtong Xu and Zhiyuan Zhang and Qiang Qiu and Yu She},
  year         = {2025},
  eprint       = {2505.18472},
  archivePrefix = {arXiv},
  primaryClass = {cs.RO},
  url          = {https://arxiv.org/abs/2505.18472}
}

ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning

Abstract

Tactile Reactive Experiments

Visuotactile

Vision-only

ManiFeel Benchmark: Tasks and Policy Rollouts in Simulation

Manipulation in Occluded and Low-Light Scenarios

Successful Rollouts of Tactile-augmented Policy

Blind Insertion Occluded Vision

Tactile Exploration Occluded Vision

Tactile Sorting Low-Light Vision

Unsuccessful Rollouts of Vision-only Policy

Blind Insertion Occluded Vision

Tactile Exploration Occluded Vision

Tactile Sorting Low-Light Vision

Contact-Rich Manipulation

Successful Rollouts of Visuotactile Policy

Gear Assembly

Power Plug Insertion

USB Insertion

Real-World Policy Rollouts

Performance Under Varying Lighting Conditions in Tactile Sorting

Normal Lighting

Visuotactile Policy Rollouts

Vision-only Policy Rollouts

Dim Lighting

Visuotactile Policy Rollouts

Vision-only Policy Rollouts

Failure Cases

Visuotactile Failures

Vision-only Failures

Sim-to-Real Comparison: TacSL Simulator vs. GelSight Mini

Related Work

BibTeX

Blind Insertion
Occluded Vision

Tactile Exploration
Occluded Vision

Tactile Sorting
Low-Light Vision

Blind Insertion
Occluded Vision

Tactile Exploration
Occluded Vision

Tactile Sorting
Low-Light Vision