Closing the sim-to-real gap with neural retargeting

Train agents on reality. Not theory.

RL Environments-as-a-Service (RLEaaS)

Translation & Localization

Multilingual AI

Platforms

Data Marketplace

Data Canvas

AI Data Foundry

OneForma

AI Localization

Expert Network

Join our Expert Network

Build & Train AI

Data Collection & Creation

RLHF & Preference Optimization

Supervised Fine Tuning

Model Safety & Evaluation

Internationalization

Vertical AI

Physical AI

Healthcare

Vision AI

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Platforms

Data Marketplace

Data Canvas

AI Data Foundry

OneForma

AI Localization

Expert Network

Join our Expert Network

Build & Train AI

Data Collection & Creation

RLHF & Preference Optimization

Supervised Fine Tuning

Model Safety & Evaluation

Internationalization

Vertical AI

Physical AI

Healthcare

Vision AI

Platforms

Data Marketplace

Data Canvas

AI Data Foundry

OneForma

AI Localization

Expert Network

Join our Expert Network

Build & Train AI

Data Collection & Creation

RLHF & Preference Optimization

Supervised Fine Tuning

Model Safety & Evaluation

Internationalization

Vertical AI

Physical AI

Healthcare

Vision AI

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Platforms

Data Marketplace

Data Canvas

AI Data Foundry

OneForma

AI Localization

Expert Network

Join our Expert Network

Build & Train AI

Data Collection & Creation

RLHF & Preference Optimization

Supervised Fine Tuning

Model Safety & Evaluation

Internationalization

Vertical AI

Physical AI

Healthcare

Vision AI

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Book a Demo

Article

Closing the sim-to-real gap with neural retargeting

Explore how Centific and N1 Robotics measured the sim-to-real dexterity gap across 1,400+ teleoperated robot episodes, and why neural retargeting may redefine how robotics foundation models are trained.

Published on May 6, 2026

•

16 min read time

Table of contents

Summarize

AI Summary by Centific

Turn this article into insights

with AI-powered summaries

Summarize article

Give me key takeaways

Topics

Robotics AI

Physical AI

Teleoperation

Neural Retargeting

Humanoid Robotics

Robotics AI

Physical AI

Teleoperation

Neural Retargeting

Humanoid Robotics

Author(s)

Eivy Cedeno Torres

Kamwai Chan

Kriti Banka

Leela Krishna

Mangesh Damre

Ahmed Abdellah

Alex Cho

Emily Shen

Every robotics team has heard this story:

Months of work go into building a simulation environment — physics engines tuned to perfection; domain randomization cranked up, reward functions carefully shaped.
The policy achieves near-perfect task success in sim.
Then it’s transferred to real hardware.
The robot drops the object on the first try.

We measured this pattern directly at Centific. We built a Dexterous Manipulation Benchmark using simulated multi-fingered hands (16 degrees of freedom, 4 articulated fingers), then compared it against over 1,400 real teleoperated episodes from Unitree’s G1 humanoid robot across 29 task datasets — stacking blocks, pouring liquids, folding towels, cleaning tables, organizing tools, packaging cameras, and more.

1,400+ real episodes: the dataset that exposed the gap

Unitree Robotics, the manufacturer of the G1 humanoid, published a large-scale teleoperation dataset (DiverseManip) on HuggingFace, collected by human operators wearing VR headsets and controlling the G1’s arms and grippers in real time. This dataset spans single-arm and dual-arm tasks across three different end-effectors:

End-Effector	DOF	Type	Tasks Covered
Unitree Dex1	1-DOF	Simple parallel gripper (open/close)	Towel folding, table cleaning, tool organizing, block stacking
Unitree Dex3	28-DOF	Multi-fingered dexterous hand	Block stacking, pouring, camera packaging, object placement
BrainCo Revo 2	6-DOF per hand	Anthropomorphic dexterous hand	Rubik's cube grasping, Oreo pickup, precision tasks

This dataset gave us the real-world ground truth to benchmark against simulation. But it also revealed a critical limitation: the data was collected using standard VR controllers with basic IK retargeting. This method systematically destroys the most valuable signals in teleoperation data. This limitation is exactly what N1 Robotics’ neural retargeting technology is designed to solve.

Figure 1: Unitree G1 dexterous manipulation — dual-arm robot performing Rubik's Cube grasping and precision object manipulation tasks across the three end-effector configurations used in Centific’s benchmark.

Figure 2: Unitree G1 humanoid robot performing block manipulation tasks — the same robot platform used across all 29 datasets in Centific’s B4 Dexterous Manipulation Benchmark.

The numbers that opened our eyes

We measured four core metrics across simulation and real-world teleoperation. The results were unambiguous:

Metric	What it measures	Simulation	Real teleoperation	Gap
Task Success Rate	Did the robot complete the task?	~95%	~83%	~1.1x
Manipulation Accuracy	How precise was object placement?	~99%	~68%	~1.5x
Grasp Quality	Was the grasp stable and functional?	~68%	~47%	~1.5x
Grasp Adaptiveness	Did the robot adjust mid-task?	~100%	~2%	~50x

Across these metrics, simulation consistently reports near-perfect performance, while real-world results diverge by 1.5x to over 50x on the measures that determine whether a task succeeds in deployment.

The largest difference appears in grasp adaptiveness. Simulation reports constant adjustment, while real operators adapt roughly 2 out of every 100 frames. That result is not a weakness; it reflects how operators anticipate failure before it occurs, using wrist orientation, approach angle, and timing. A policy trained only in simulation does not learn those anticipatory strategies. Recent work from NVIDIA points in a similar direction. In introducing the CaP-X framework, Jim Fan, NVIDIA Director of AI & Distinguished Scientist, describes API-driven robotic systems in which perception, planning, and control are composed at runtime, allowing robots to solve certain manipulation tasks zero-shot without task-specific policy training. This approach shifts part of the problem from learning behavior in advance to assembling it during execution, reinforcing the limits of training-only approaches.

When expanded our analysis across all 29 datasets spanning three different end-effectors (Dex1, Dex3, and BrainCo Revo 2 hands), the strongest dataset showed simulation overestimating grasp adaptiveness by over 80x. Across the full benchmark, 26 out of 29 datasets passed all quality gates, confirming the robustness of our methodology.

Why teleoperation teaches what simulation cannot

These results point to a broader issue: simulation captures outcomes, but not how those outcomes are achieved. When a human teleoperates a robot through a manipulation task, like picking up a slippery bottle, threading a cable, handing a tool to a colleague, they capture details that simulation does not reproduce: the physics of contact as experienced through imperfect sensing and actuation.

Real contact is messy, and that’s the point

In simulation, when a fingertip contacts an object, the force is modeled as a clean vector computed from a friction cone. In reality, the fingertip deforms, surfaces vary at a micro level, and sensors introduce noise, drift, and delay. The operator responds to these conditions by increasing pressure or adjusting wrist orientation to stabilize the grasp.

Our benchmark showed this clearly: simulated grasps scored higher by the simulator’s own metric, but real teleoperated grasps, though “messier,” were perfectly functional. The teleoperator learned to work with imperfections. A sim-trained policy optimized for the wrong objective would fail.

Recovery is the most valuable signal

The most important moments in any manipulation episode are the failures: the fumbled pickup, the object sliding mid-transfer, the awkward re-grasp. The most important moments in any manipulation episode are the failures: the fumbled pickup, the object sliding mid-transfer, the awkward re-grasp. In simulation, failures are rare. In real tasks, recovery happens frequently and provides useful training data. In our teleoperation dataset, nearly one in four episodes included at least one grasp-regrasp cycle. Every one of those is a lesson simulation cannot provide.

In our teleoperation dataset, nearly one in four episodes included at least one grasp-regrasp cycle. Every one of those is a lesson simulation cannot provide.

Teleoperation is grounded in physical reality

Two of our benchmark task types, in-hand rotation and self-correction via finger gaiting, had zero matching episodes in the real dataset. Why? Not because the robot couldn’t attempt them, but because a simple gripper physically cannot rotate an object in-hand using finger gaiting. Simulation lets you define any task; reality constrains which tasks are possible with a given end-effector. Teleoperation data is honest. It contains only what the hardware can actually do.

These results show that teleoperation data is not a substitute for simulation, but a reference point for it. Simulation approximates task outcomes, but it misses the contact dynamics that determine real-world success, often by an order of magnitude on key metrics.

What Centific built: the most comprehensive dexterity benchmark in the industry

To quantify this mismatch, Centific built a benchmark that compares simulation and real-world performance directly. The benchmark measures how large the difference is, where it appears, and what drives it across tasks, datasets, and end-effectors.

A benchmark engineered from the ground up

Centific’s Robotics & Physical AI team designed and executed the B4 Dexterous Manipulation Benchmark spanning 29 real-world task datasets, over 1,400 teleoperated episodes, three different dexterous end-effectors (Unitree Dex1, Dex3, and BrainCo Revo 2), and a simulated Allegro hand baseline. This experiment covered:

Household tasks (folding towels, cleaning tables).
Precision tasks (pouring medicine, mounting cameras).
Logistics tasks (packing boxes, organizing tools).
Bimanual coordination (dual-arm cleaning, dishwasher loading).

No one in the industry had systematically measured the sim-to-real dexterity gap across so many tasks, end-effectors, and episodes. Centific’s benchmark is the first to put hard numbers on what the field has long suspected.

Centific’s technical capabilities

Building this benchmark required deep robotics engineering across multiple disciplines. Here is what the Centific team designed, built, and validated:

Capability	What Centific built	Why it matters
4-Metric Evaluation Framework	TSR, DMS, GQS, GAR — four complementary metrics with pass/fail gates	Industry-first standardized way to measure dexterity quality
29-Dataset Benchmark Pipeline	Ingest HuggingFace datasets, extract joint trajectories, compute metrics, generate reports	Reproducible, scalable evaluation across any teleoperation dataset
IK Retargeting Engine	Cross-embodiment retargeting that normalizes Dex1/Dex3/BrainCo data to a common 16-DOF Allegro joint space	Enables fair comparison across different robot hands — revealed the 50x to 1.6x insight
Neural Retargeting Architecture	Production-ready infrastructure for ML-based retargeting (WaldoRT-compatible)	Ready to integrate with n1 Robotics or any neural retargeting model
Sim-to-Real Fine-tuning Pipeline	Alpha-annealing data mixer, asymmetric actor-critic (SAC), dual replay buffer with cross-trial recycling	Implements latest RL research for stable real-world policy fine-tuning
Multi-Domain Benchmark Framework	4 benchmark domains: Airport Contact (B1), PCB Assembly (B2), Egocentric Transfer (B3), Dexterous Manipulation (B4)	Extensible architecture — new domains plug in without breaking existing ones
NVIDIA Cosmos Reason 2 Integration	Verification and error-correction pipeline powered by Cosmos Reason 2 video understanding	AI-powered quality assurance for robotic manipulation episodes

The insight only this benchmark could reveal

The most important finding from Centific’s benchmark was not the ~50x gap itself. It was what happened when we applied retargeting: the gap collapsed from ~50x to under 2x. This proved that the massive raw gap was largely a measurement artifact caused by comparing 1-DOF grippers against 16-DOF simulated fingers. The true dexterity gap is much smaller, but it can only be seen through proper retargeting.

This result shows what retargeting changes. When joint spaces are aligned, the observed difference reflects actual behavior rather than differences in embodiment. It also shows that teleoperated data contains more useful information than raw comparisons suggest, because human operators encode stable grasp strategies before failure occurs.

This insight has direct commercial implications: it means the problem is solvable. With the right retargeting technology, like N1 Robotics’ WaldoRT, the remaining gap can be closed with far less data than anyone previously assumed. Centific's benchmark is the evidence base that makes this case quantitatively.

Figure 3: Centific’s data collection pipeline — robot joint state recordings showing State and Action telemetry across manipulation tasks in the B4 benchmark.

Figure 4: Centific benchmark episode — robot arm performing block manipulation task with real-time joint trajectory capture, part of Centific's 29-dataset evaluation framework.

The teleoperation bottleneck: why collecting data is still too slow

If teleoperation data is widely valuable, why isn’t everyone collecting it at scale? Because the current process is painfully slow.

Challenge	Industry Reality
Setup time	2-3 weeks of custom engineering to integrate XR trackers with a new robot
Collection speed	~50 episodes per hour after accounting for resets and operator fatigue
Retargeting	Linear IK mapping that destroys the grasp intent the operator was trying to convey
Cross-embodiment	Every new robot hand requires a custom retargeting pipeline from scratch

The industry standard is linear Inverse Kinematics (IK) retargeting — mapping human joint angles to robot joint angles through geometric correspondence. This sounds reasonable, but it systematically destroys exactly the signals that make teleoperation data valuable.

When a human pinches an object between thumb and index finger, the IK mapper sees two joint angles. It maps them independently to the robot’s two corresponding joints. But the pinch isn’t about individual joint angles; it’s about the coordinated closure that creates force closure on the object. Linear IK strips away this coordination, the grasp intent, and the precision geometry. What remains is a lossy approximation that requires many more demonstrations to learn the same policy.

Neural retargeting: the breakthrough that changes everything

Here is where the pieces connect. Centific’s benchmark used Unitree’s G1 robot dataset, collected with VR headsets and basic IK retargeting. We found massive gaps, especially in grasp adaptiveness. What if the same Unitree G1 data was collected with a system that preserves grasp intent instead of destroying it?

That system exists. N1 Robotics’ Waldo is built for the same Unitree G1 humanoid that our benchmark is based on. But replaces the VR + IK retargeting pipeline with neural retargeting that faithfully captures what the human operator intended.

Same robot, better data: Waldo on the Unitree G1

Waldo is a complete dexterous teleoperation platform designed specifically for the Unitree G1 humanoid (29-DOF) with BrainCo Revo 2 hands (6-DOF per hand), the same robot and one of the same end-effectors in our benchmark. It includes finger-tracking gloves with EMF sensors, Vive trackers for 6-DOF wrist tracking, a pre-configured inference PC, and software that handles calibration, recording, and export out of the box.

Capability	Industry baseline	Waldo
Setup time	2-3 weeks	15 minutes
Collection speed	~50 episodes/hr	~200 episodes/hr
End-to-end latency	Variable, often >200ms	<100ms
Retargeting method	Linear IK (lossy)	WaldoRT neural mapping (preserves intent)

Using Waldo for teleoperation results in a 4x speedup in data collection and a 100x reduction in setup time. For example, Unitree’s DiverseManip dataset, the same dataset Centific benchmarked, took an estimated 20 hours to collect using VR headsets. With Waldo on the same G1 robot, an equivalent dataset can be collected in roughly 5 hours, with higher fidelity per episode due to neural retargeting.

Figure 5: Waldo vs Meta Quest — side-by-side comparison. Waldo (left): Episode Count 4, Avg 42.86s per episode, 93.3% success rate, 14/15 successful. Meta Quest (right): Episode Count 1, Avg 300s per episode, 18.2% success rate, 2/11 successful. Same robot, same task, same time window.

Neural retargeting in action

Neural retargeting is the mechanism that makes teleoperation data usable for training.

Figure 6: WaldoRT Neural Retargeting in action — human hand pose (left) mapped to robot hand pose (right) using neural network inference at 1 kHz. The model preserves finger coordination and grasp intent that IK mapping destroys. This is the core technology that collapses the 50× gap to under 2×.

This figure shows how neural retargeting maps human hand motion to robot control in real time. The mapping preserves coordination across fingers and maintains grasp intent during execution.

N1 Robotics partnership: what this means for Centific

Centific is partnering with N1 Robotics to bring Waldo into our large-scale data collection pipeline. Key details from our partnership:

WaldoRT runs inference at 1 kHz: lightweight, fast, and scales identically from 6-DOF to higher DOF hands with zero latency increase
Setup takes under 30 seconds: push a pedal and data collection begins immediately
Time between episodes can be as short as 5 seconds, enabling rapid rinse-and-repeat collection loops
Inference PC ships pre-configured with an NVIDIA GPU, which is compatible with Centific’s Jetson Thor infrastructure
Multi-operator profiles are pre-saved: operators clock in ready to collect without hand retraining
Enterprise plan includes 100 hours per month of N1 Robotics managed operator time on Centific’s systems
Custom integration builds available for additional end-effectors and robot embodiments beyond humanoids

These capabilities make it possible to collect high-fidelity manipulation data quickly and consistently across operators, tasks, and robot configurations.

CaP-X: when frontier LLMs become robot controllers

A parallel development is reshaping how we think about robot control entirely. CaP-X (Coding as Policies), available at capgym.github.io, demonstrates that frontier large language models can control robots, without any robot-specific training, simply by writing executable control code from natural language task descriptions.

CaP-Agent0: zero-shot manipulation

CaP-Agent0 is a training-free coding agent evaluated across 100+ manipulation tasks spanning LIBERO-PRO, Robosuite, and BEHAVIOR. The findings:

Frontier models achieved over 30% average zero-shot success on manipulation, with no task-specific training
On LIBERO-PRO with position and instruction perturbations, state-of-the-art VLAs like OpenVLA and pi0 scored 0% across the board
Even the best VLA (pi0.5) reached only 13% average success on perturbed tasks
CaP-Agent0 reached 18% (better than the best VLA) without any training

These results show that training alone does not guarantee robustness, and that alternative approaches can outperform learned policies even without task-specific optimization.

Figure 7: CaP-Agent0 in action: robot recognizes the affordance of stacking order, placing cubes before the round apple. The instruction: “Stack everything as high as you can.” Autonomous execution (4x speed). Source: capgym.github.io

Figure 8: CaP-Agent0 task completed. Cubes stacked with apple on top, demonstrating embodied reasoning about object geometry without any task-specific training. Source: capgym.github.io

CaP-RL: code-based post-training

CaP-RL applies reinforcement learning directly to the coding agent. A 7B-parameter model (Qwen 2.5 Coder) jumped from 20% to 72% average success in simulation after just 50 training iterations. The learned policies then transferred to a real Franka robot (84% on cube lifting, 76% on cube stacking) with minimal sim-to-real gap.

Why CaP-X matters for this pipeline

CaP-X addresses the task generalization layer, allowing robots to handle novel instructions without retraining. When CaP-Agent0 generates task code and WaldoRT-collected episodes provide the fine-grained manipulation signal, they cover complementary failure modes. CaP-X handles novel instructions and task reasoning; WaldoRT handles the physical contact fidelity that code alone cannot capture.

The complete pipeline: from simulation to deployment

Our benchmark, combined with recent advances in data-efficient reinforcement learning, points to a five-phase architecture for closing the sim-to-real dexterity gap:

Figure 9: Five-phase pipeline for closing the sim-to-real dexterity gap — Phase 1: Portable Data Collection → Phase 2: Neural Retargeting (WaldoRT) → Phase 3: Simulation Pre-training → Phase 4: Real-World Fine-tuning → Phase 5: Deployment with Human-in-the-Loop Correction.

Phase 1: portable data collection

Capture human manipulation with wearable motion-capture rigs (finger tracking gloves + wrist trackers). No robot needed during collection. Approaches like DexCap (RSS 2024) achieve 3x the speed of traditional teleoperation at a fraction of the cost. Critically, store all attempts, including failures, because mixed-quality data provides better state-space coverage than expert-only demonstrations.

Phase 2: neural retargeting

Map human hand motions to robot hand commands using learned models like WaldoRT. This preserves grasp intent, force closure, and finger coordination at 1-2ms per frame, compatible across different end-effectors without custom engineering per robot.

Phase 3: simulation pre-training

Train policies in massively parallel simulation, with thousands of environments running simultaneously, to learn task structure such as approach trajectories, finger coordination patterns, and basic manipulation sequences. Our benchmark shows that simulation transfers task-level structure well, with an approximately 1.1x gap in success rate, even though contact-level details remain poor.

Phase 4: real-world fine-tuning

Fine-tune sim-pretrained policies on real teleoperation data using stabilized RL techniques: alpha-annealing data mixing (gradually shifting from sim to real data), asymmetric actor-critic updates, and warm-start episodes. Research shows these techniques prevent the catastrophic forgetting that typically destroys sim-pretrained policies during real-world fine-tuning.

Phase 5: deployment + human-in-the-loop correction

Deploy the fine-tuned policy with live Waldo teleoperation for human-in-the-loop correction. When the policy fails, the operator takes over, generating the hardest and most valuable training data. At 200 episodes per hour, correction data accumulates rapidly. Each cycle closes the gap further.

Centific + neural retargeting: the roadmap

Centific’s benchmark provides evidence of where simulation falls short and how large the gap remains in real-world performance. The five-phase pipeline defines the architecture. Neural retargeting platforms like n1 Robotics’ Waldo provide the tooling. Together, they form a concrete roadmap for building the data infrastructure that next-generation robotics demands.

Why this matters for foundation models

The next frontier in robotics is dexterous foundation models: vision-language-action (VLA) models that can reason about finger placement, contact forces, and in-hand manipulation from natural language instructions. Models like RT-2, Octo, and pi0 have shown this is possible for simple grippers. But extending to multi-fingered dexterity requires millions of teleoperated episodes with rich grasp data, which is the kind of data that Centific’s pipeline is designed to produce and that neural retargeting makes feasible to collect at scale.

The roadmap

Implications for Dexterous Manipulation Training

The gap between simulated and real-world dexterous manipulation is a fundamental measure of how much real-world contact physics matters and how much of that knowledge can only come from human-guided teleoperation.

Simulation gives you task structure. Teleoperation gives you contact truth. Neural retargeting ensures that contact truth is preserved when mapping human motion to robot control.

Dimension	Simulation	Teleoperation	Winner
Task success	Higher scores	Harder, but grounded in reality	Teleop
Manipulation accuracy	Near-perfect	Imperfect but true	Teleop
Grasp quality	Optimized for wrong metric	Messier, but functional	Teleop
Grasp adaptiveness	Scripted, artificial	Anticipatory, human-intelligent	Teleop
Recovery behaviors	Grasps don't fail in sim	1 in 4 episodes includes recovery	Teleop
Data fidelity	Perfect but wrong	Imperfect but true	Teleop
Setup cost (with Waldo)	GPU cluster + months	~$12K hardware + 15 min	Teleop

The implications extend beyond model performance. The constraint is not simulation quality alone, but the data used to train policies. Robotics companies that invest in teleoperation infrastructure and collect high-fidelity manipulation data will have an advantage, because their models are trained on the contact behavior that determines real-world outcomes.

Learn more about Centific’s Physical AI capabilities.

Are your ready to get

modular

AI solutions delivered?

Centific offers a plugin-based architecture built to scale your AI with your business, supporting end-to-end reliability and security. Streamline and accelerate deployment—whether on the cloud or at the edge—with a leading frontier AI data foundry.

Start Building

Connect data, models, and people — in one enterprise-ready platform.

Latest Insights

Ideas, insights, and

research from our team

From original research to field-tested perspectives—how leading organizations build, evaluate, and scale AI with confidence.

Explore

Article

From raw robotics data to training-ready AI datasets

May 22, 2026

Article

Why global ad campaigns fail at the creative-media intersection

May 22, 2026

Article

You’re already training AI with your localization data. Is it reinforcing your brand voice?

May 19, 2026

Connect with Centific

Stay ahead of what’s next

Stay ahead

Updates from the frontier of AI data.

Receive updates on platform improvements, new workflows, evaluation capabilities, data quality enhancements, and best practices for enterprise AI teams.

Book a Demo

Get a live walkthrough

Talk to our team

Careers

See all our open positions

Turn data into AI that works

Book a demo