ManiFeel, a reproducible and scalable simulation benchmark for studying supervised visuotactile manipulation policies across a diverse set of tasks and scenarios. ManiFeel presents a comprehensive benchmark suite spanning a diverse set of manipulation tasks, evaluating various policies, input modalities, and tactile representation methods. Through extensive experiments, our analysis reveals key factors that influence supervised visuotactile policy learning, identifies the types of tasks where tactile sensing is most beneficial, and highlights promising directions for future research in visuotactile policy learning. ManiFeel aims to establish a reproducible benchmark for supervised visuotactile policy learning, supporting progress in visuotactile manipulation and perception. To facilitate future research and ensure reproducibility, we will release our codebase, datasets, training logs, and pretrained checkpoints.
We evaluate visuotactile and vision-only policies in a real-world tactile sorting task. The robot must distinguish between two visually similar objects—a table tennis ball and a golf ball—and place them into designated bowls.
Under human disturbances during rollouts, the visuotactile policy demonstrates adaptive behavior: when a ball is removed, the robot detects the loss and actively re-initiates the search. In contrast, the vision-only policy fails to respond and cannot recover.
We present a diverse suite of manipulation tasks to systematically evaluate supervised visuotactile policy learning across varying sensory conditions, ranging from clear vision to visually degraded or fully occluded scenarios. The tasks are organized into six categories based on their reliance on visual and tactile feedback.
These tasks require strong tactile sensing to handle scenarios with visual occlusion or low lighting. They highlight how tactile feedback enables robust manipulation when visual input is unreliable or unavailable.
These tasks represent standardized assembly and insertion scenarios that are supported by our benchmark. They help identify key challenges in visual-tactile integration and highlight the potential of visuotactile policies in complex manipulation settings.
We evaluate the effectiveness of visuotactile and vision-only policies under different lighting conditions in a real-world tactile sorting task. The robot must distinguish between two visually similar objects—a table tennis ball and a golf ball—and place them into the correct target bowls. Please refer to our paper for detailed results.
Success Rate: 13/15
Success Rate: 5/15
Success Rate: 5/15
Success Rate: 1/15
Failure cases in the visuotactile policy often stem from controller overshoot—although the object is correctly identified, it may be dropped slightly outside the target bowl. In contrast, due to the absence of tactile feedback, the vision-only policy frequently misidentifies the object or fails to determine the correct placement location, resulting in incorrect sorting.
Normal Lighting
Dim Lighting
Normal Lighting
Dim Lighting
We compare tactile images generated in the TacSL simulator with those captured using the GelSight Mini sensor. The examples below feature several representative objects from the ManiFeel benchmark and demonstrate the visual and structural similarities between simulated and real tactile feedback.
Go our lab website for more information about our research: Purdue MARS Lab.
@misc{luu2025manifeelbenchmarkingunderstandingvisuotactile,
title = {ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning},
author = {Quan Khanh Luu and Pokuang Zhou and Zhengtong Xu and Zhiyuan Zhang and Qiang Qiu and Yu She},
year = {2025},
eprint = {2505.18472},
archivePrefix = {arXiv},
primaryClass = {cs.RO},
url = {https://arxiv.org/abs/2505.18472}
}