Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation
A data collection system and learning algorithm designed for contact-rich robotic manipulation.
1X Speed

Reactive Diffusion Policy:
Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

In Submission

Han Xue1*, Jieji Ren1*, Wendi Chen1*,
Gu Zhang234\( \dagger \), Yuan Fang1\( \dagger \), Guoying Gu1, Huazhe Xu234\( \ddagger \), Cewu Lu1\( \ddagger \)
1Shanghai Jiao Tong University 2Tsinghua University, IIIS 3Shanghai Qi Zhi Institute 4Shanghai AI Lab
*Equal contribution \(\dagger\)Equal contribution \(\ddagger\)Equal advising

In three challenging contact-rich tasks, RDP can perform real-time closed-loop tactile / force control and successfully complete the task, even when faced with human perturbations.

Abstract

Humans can accomplish complex contact-rich tasks using vision and touch, with highly reactive capabilities such as quick adjustments to environmental changes and adaptive control of contact forces; however, this remains challenging for robots. Existing visual imitation learning (IL) approaches rely on action chunking to model complex behaviors, which lacks the ability to respond instantly to real-time tactile feedback during the chunk execution. Furthermore, most teleoperation systems struggle to provide fine-grained tactile / force feedback, which limits the range of tasks that can be performed.

To address these challenges, we introduce TactAR, a low-cost teleoperation system that provides real-time tactile feedback through Augmented Reality (AR), along with Reactive Diffusion Policy (RDP), a novel slow-fast visual-tactile imitation learning algorithm for learning contact-rich manipulation skills. RDP employs a two-level hierarchy:
(1) a slow latent diffusion policy for predicting high-level action chunks in latent space at low frequency, (2) a fast asymmetric tokenizer for closed-loop tactile feedback control at high frequency. This design enables both complex trajectory modeling and quick reactive behavior within a unified framework.

Through extensive evaluation across three challenging contact-rich tasks, RDP significantly improves performance compared to state-of-the-art visual IL baselines through rapid response to tactile / force feedback. Furthermore, experiments show that RDP is applicable across different tactile / force sensors.

Method

Overview

Overview of TactAR and RDP

Fig. 1: TactAR is a low-cost and versatile teleoperation system which can provide real-time tactile / force feedback via Augmented Reality (AR). Reactive Diffusion Policy (RDP) is a slow-fast imitation learning algorithm that can model complex behaviors with a slow policy network and achieve closed-loop control based on tactile / force feedback with a fast policy network.


Data Collection System (TactAR)

Overview of TactAR

Fig. 2: Overview of TactAR teleoperation system. It can provide real-time tactile / force feedback via Augmented Reality (AR). The tactile feedback is represented as the 3D deformation field, which is a universal representation applicable to multiple different tactile / force sensors. The 3D deformation field is rendered and "attached" to the robot end-effector in AR, which makes the user perceive the rich contact information in 3D space. TactAR also support real-time streaming for multiple RGB cameras and optical tactile sensors.

Calibration

The user adjust the translation and rotation of the virtual coordinate system such that it can align with the pre-defined TCP position (the white sphere) and the origin of the world coordinate system.

Realtime Tactile Feedback in AR

The 3D deformation / force field of the tactile / force sensors can be rendered in real time with Augmented Reality (AR).



Attachment to Robot RCP

The 3D deformation field is attached to the robot end-effector in virtual space.


Camera Streaming

The system supports real-time streaming of multi-view RGB cameras and tactile cameras for more immersive teleoperation experience.

Example Tasks

When collecting contact-rich task data through teleoperation, the system can provide intuitive tactile / force information.


Learning Algorithm (RDP)

Overview of RDP

Fig. 3: Overview of Reactive Diffusion Policy (RDP) framework. (a) The training pipeline of RDP, comprising the first stage for training the fast policy (Asymmetric Tokenizer) and the second stage for training the slow policy (Latent Diffusion Policy). (b) The inference pipeline of RDP. The slow policy leverages low-frequency observations for modeling complex behaviors with diffusion and action chunking. The fast policy enables closed-loop control by using high-frequency tactile / force input and fine-tuning the latent action chunk predicted by the slow policy in an auto-regressive manner.

Comparison among Various Pipelines

Comparison among Various Pipelines

Fig. 4: Comparison among various pipelines. (a) Vanilla action chunking with open-loop control during the chunk execution. (b) Action chunking enhanced with temporal ensembling for semi-closed-loop control. (c) Our slow-fast inference pipeline, showcasing closed-loop capabilities with fast responsive adjustments. (d) Human control patterns in contact-rich tasks.

Experiments

Main Results

Task1: Peeling

Task Description: The robot needs to grasp the peeler, approach a cucumber held midair by a human hand, then begin peeling. This task requires the following capabilities: (1) Precision. The robot needs to finish the task under environment uncertainties (e.g., different tool grasp locations, different cucumber pose) with high precision (millimeter-level). (2) Fast response. The robot needs to react instantly to human perturbations.

Evaluation Protocol: There are three test-time variations and we run 10 trials for each variation: (1) No perturbation. The object is fixed with a random 6D pose in the air. (2) Perturbation before contact. The human evaluator will move the object right before the tool makes contact. (3) Perturbation after contact. The human evaluator will move the object after the tool makes contact to break the contact state.

Score Metric: We calculate the score based on the proportion of the peeled cucumber skin to the total length of the cucumber, normalized by the average score of the demonstration data.

TABLE I: Policy Performance for Peeling Task

No Perturb. Perturb. before Contact Perturb. after Contact All
DP 0.56 0.58 0.19 0.44
DP w. tactile img. 0.00 0.08 0.00 0.03
DP w. tactile emb. 0.48 0.55 0.15 0.39
RDP (GelSight) 0.98 0.93 0.80 0.90
RDP (MCTac) 1.00 0.84 0.79 0.88
RDP (Force) 0.99 0.98 0.88 0.95

Video of all trials:

Task2: Wiping

Task Description: The robot needs to grasp the eraser, approach the vase held midair by a human hand, then begin wiping. This task requires the following capabilities: (1) Adaptive force control with rotation. The robot needs to adaptively track the curved vape surface with different environment uncertainties (e.g., tool grasp locations, vase pose). (2) Fast response. The robot needs to react instantly to human perturbations.

Evaluation Protocol: The same as Task1: Peeling.

Score Metric: We calculate the score based on the size of the remaining handwriting compared to the demonstration data. If the residue reaches the human demonstration level, the score is 1; If there is minor residue (less than one third of the handwriting length), the score is 0.5; If significant residue remains, the score is 0.

TABLE II: Policy Performance for Wiping Task

No Perturb. Perturb. before Contact Perturb. after Contact All
DP 0.75 0.70 0.25 0.57
DP w. tactile emb. 0.60 0.75 0.15 0.50
RDP (GelSight) 0.85 0.95 0.50 0.77
RDP (Force) 0.95 0.85 0.80 0.87

Video of all trials:

Task3: Bimanual Lifting

Task Description: The two robot arms need to grasp the handlers, approach the paper cup, clamp the paper cup with the two handlers, carefully lift the cup along the trajectory of the curve without squeezing it. This task requires the following capabilities: (1) Precise force control. The two robots must apply precise force during the task execution. It is crucial to avoid exerting excessive force that could squeeze the cup while also ensuring that the force is sufficient to prevent the cup from slipping. (2) Bimanual coordination. (3) Multi-modality. In the expert data, there are two upward lift trajectories.

Evaluation Protocol: There are two test-time variations and we run 10 trials for each variation: (1) soft paper cup. (2) hard paper cup.

Score Metric: If the paper cup is lifted into the air following the designated trajectory without significant compression, the score will be 1; If the paper cup is partially compressed in the air, the score will be 0.5; If the cup is not lifted up, or dropped in the air, the score will be 0.

TABLE III: Policy Performance for Bimanual Lifting Task

Soft Paper Cup Hard Paper Cup All
Clamp Lift Score Clamp Lift Score Score
DP 0% 0% 0.00 0% 0% 0.00 0.00
DP w. tactile emb. 10% 10% 0.10 20% 10% 0.05 0.08
RDP (GelSight + MCTac) 100% 100% 0.55 90% 80% 0.40 0.48
RDP (Force) 100% 90% 0.80 90% 90% 0.60 0.70

Video of all trials:


Visualization of RDP Inference Process

We visualize the original action predicted by the slow policy (red and blue dots) and the corrected action (green dots) predicted with tactile feedback by the fast network. The results show that the fast policy can utilize tactile information to achieve quick and accurate responses, thereby completing contact-rich tasks smoothly.

We obtain the original action through padding the initial tactile / force signal. The corrected action has been scaled up to achieve clearer visual effects.

Case 1

Case 2

Case 3


Failure Cases Analysis of Baseline Policies

Lose Contract

DP w. tactile emb.

DP w. tactile emb.

Large Force

DP w. tactile img.

DP

DP w. tactile emb. (temp. ens., \( \tau = 0.8 \))

Get Stuck

DP w. tactile emb.

DP w. tactile emb. (chunk size \(= 2\))

DP w. tactile emb.