Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Abstract image

Article

Article

From raw robotics data to training-ready AI datasets

From raw robotics data to training-ready AI datasets

Discover why annotated robotics data is critical for training reliable AI models and how structured labeling turns raw robot data into training-ready datasets at scale.

Discover why annotated robotics data is critical for training reliable AI models and how structured labeling turns raw robot data into training-ready datasets at scale.

7 min read time

Table of contents

Share

Summarize

AI Summary by Centific

Turn this article into insights

with AI-powered summaries

Topics

Robotics AI
AI Data
Data Annotation
Physical AI
Robotics AI
AI Data
Data Annotation
Physical AI

Author(s)

Author(s)

Centifc logo

Leela Krishna

Centifc logo

Kriti Banka

Centifc logo

Mangesh Damre

Centifc logo

Bishwajit Pal

Modern robotics learning, whether imitation learning, reinforcement learning, or vision-language-action models, demands more than raw sensor streams. To train a robot that can reliably grasp a cup, a model needs to understand what is happening in each frame, when key events occur, and how the task succeeded or failed.

A single robot demonstration episode may contain:

  • Multiple camera feeds capturing the scene from different angles

  • Joint state and action telemetry, often with 26 or more degrees of freedom, recorded at every timestep

  • Tactile sensor data from fingertip sensors detecting contact and slip

  • Language instructions guiding the robot’s behavior

Without structured labels, such as bounding boxes around objects, temporal annotations marking grasp events, and quality assessments of each episode, this rich multimodal data remains difficult to use. Training on unlabeled data leads to models that mimic motion trajectories without capturing the semantics of manipulation.

Centific and Hugging Face: the annotation layer for open robotics data

Hugging Face hosts thousands of open robotics datasets, collected from diverse platforms including bipedal humanoids, dexterous arms, mobile manipulators, and more. These datasets arrive in a standardized packaging format: Parquet files for structured sensor data and MP4 video files for camera feeds. Centific has built a production pipeline that ingests datasets in this format directly from Hugging Face, converts them into the .rrd visualization format used by Data Canvas, and puts them in front of expert annotation teams. The result: open robotics datasets on Hugging Face can become fully annotated, training-ready data at enterprise scale.

Step 1: ingest from Hugging Face

Robotics datasets on Hugging Face are packaged as Parquet files for structured sensor data, including joint states, actions, and timestamps, and MP4 video files for camera feeds. The Data Canvas pipeline connects directly to the Hugging Face Hub, downloads episode data for compatible datasets, and decodes video frames using tools such as FFmpeg, with support for modern codecs including AV1. This works across robot platforms and dataset sizes, from small research collections to corpora with hundreds of thousands of episodes.

Step 2: convert to rerun (.rrd) format

Raw Parquet tables and video files are not designed for human review. Each episode is converted into the Rerun .rrd format, an interactive visualization standard built for multimodal robotics data. A single .rrd file packages camera imagery, time-series telemetry, tactile sensor feeds, and language embeddings on a shared timeline.

  • Camera imagery: JPEG-encoded frames from every angle, synchronized to a shared timeline

  • Time-series telemetry: joint states and action commands plotted as interactive charts

  • Tactile sensor feeds: fingertip contact images aligned frame-by-frame

  • Language embeddings: task instruction tokens and attention masks

This unified view makes it possible to inspect robot behavior across modalities, align sensor signals with actions, and annotate events with temporal precision.

Step 3: annotate with Data Canvas

This is where the value of the pipeline becomes clear. The .rrd file is uploaded into Data Canvas, where annotation teams, including domain experts trained in robotics workflows, label the data with precision at the frame and event level. Annotators can:

  • Draw bounding boxes on objects, robot arms, grippers, and fingers across camera frames

  • Tag object states such as whether the object is in grasp, being manipulated, or falling

  • Label manipulation steps including approach, gripper alignment, pinch grasp, lift, and handover

  • Mark sensor events such as tactile contact, slip detection, force spikes, and joint limits

  • Classify episode outcomes as success, partial success, or failure

  • Annotate time-series segments to identify grasp tighten events, contact moments, and stable motion phases

These annotations convert raw episodes into training-ready data that captures both motion and manipulation intent.


Data Canvas

Figure 1: Data Canvas — LeRobot episode labeling view showing front and wrist camera feeds alongside State (26 joints) and Action (26 joints) telemetry charts synchronized to a shared timeline


Step 4: export training-ready datasets

Once annotated, labeled data is readily available for downstream consumption, structured, versioned, and quality-controlled. Researchers and ML engineers can pull annotated episodes to train object detection models, grasp quality classifiers, manipulation policy models, and failure analysis systems.


End-to-end pipeline from Hugging Face Hub

Figure2 : End-to-end pipeline from Hugging Face Hub to training-ready annotated data via Data Canvas


Already at scale: 29 complex dexterous datasets

Centific has already processed and annotated 29 complex dexterous manipulation datasets available on Hugging Face, spanning up to 36 degrees of freedom, multiple hand configurations, whole-body control, and long-horizon household tasks. These datasets include deformable object handling, bimanual coordination, multi-finger in-hand manipulation, and two-robot collaboration.

Dex3 multi-finger hand (28-DOF): 3 datasets

The Dex3 hand operates at 28 degrees of freedom, among the highest-DOF manipulation configurations in any open dataset. Tasks covered include precision block stacking, controlled liquid pouring, multi-step food preparation such as toasted bread, camera packaging assembly, geometric object grasping, and precision placement. These tasks demand millimeter-level spatial annotation and frame-accurate event labeling across multiple synchronized cameras.

BrainCo hand (26-DOF): 2 datasets

BrainCo datasets push annotation difficulty further with in-hand object reorientation (Rubik’s Cube, 26-DOF) and deformable or fragile object grasping (Oreo biscuit). Multi-face in-hand manipulation requires tracking fine finger contact states across every frame, a task that exposes the limits of automated labeling and demands expert human annotation.

Dex1 complex tasks (16-DOF): 13 datasets

The largest group covers the full range of real-world manipulation challenges at 16 DOF: deformable bimanual tasks (towel folding, clothes packing), tool use (eraser, cloth wiping), food preparation, multi-object sorting, precision insertion, camera assembly, and, most significantly, two-robot coordination for table cleaning. The dual-robot dataset introduces annotation complexity that single-arm datasets simply do not have: simultaneous action labeling across two independent manipulators sharing a workspace.

Whole-body teleoperation (36-DOF total): 5 datasets

The WBT (Whole-Body Teleoperation) datasets represent the frontier of humanoid robot learning: 36 total degrees of freedom, long-horizon household tasks, and deformable object handling at scale. Loading a dishwasher, making a bed, operating a washing machine, or collecting clothes are tasks that require understanding multi-step intent, object deformation, and interaction with real home appliances. Annotating this data correctly requires domain expertise that goes far beyond standard bounding box labeling.

Z1 bimanual arm (14-DOF): 3 datasets

The Z1 dual-arm datasets cover bimanual pouring, cloth folding, and box stacking, which are tasks that require tight coordination between two arms and precise temporal alignment of actions across both manipulators. Bimanual annotation is inherently more complex than single-arm: every label must account for the relationship between both arms, not just individual motions.

Across these 29 datasets, up to 36 DOF, the work spans precision manipulation, deformable objects, tool use, household tasks, and multi-robot coordination.

Why annotation matters more than you think

Annotation introduces the structure required to interpret multimodal robot data. It connects perception, action, and outcome in a way that supports training and evaluation.

Raw demonstrations are not ground truth

A robot arm moving from point A to point B tells you what happened. Annotations explain why it mattered. Was that motion a reach or an approach? Did the grasp succeed or merely appear to? Without labels, a model learns trajectories. With labels, it learns manipulation semantics.

Multimodal data demands multimodal labels

A camera frame alone might show a hand touching an object. But was there actual contact? The tactile sensor says yes, and the force telemetry shows a spike at that exact timestamp. Annotation across modalities creates the cross-modal supervision signals that produce more reliable AI policies. 

Demonstration quality varies

Some episodes are clean demonstrations by expert operators; others contain recoveries from near-failures. Episode-level quality labels and frame-level event annotations let researchers curate training sets intelligently — upweighting successful demonstrations, mining hard examples from near-failure recoveries, and filtering out noisy or corrupt episodes.

Annotation enables evaluation

Annotated datasets become benchmarks. They let teams track whether a new policy grasps more reliably, detects slip earlier, or handles deformable objects more consistently across variations. Without annotations, evaluation is guesswork.

Centific as a robotics data partner

The partnership between Centific and Hugging Face is about making open robotics data usable for AI training at scale. The track record includes 29 complex datasets already annotated, spanning some of the hardest manipulation challenges in the field. Whether you are a humanoid robotics company publishing datasets on Hugging Face, a research lab that needs expert annotation for collected demonstrations, or an enterprise building Physical AI applications from scratch, the infrastructure to go from raw robotics data to training-ready datasets is here.

The path from raw robotics data to reliable AI runs through high-quality annotation. Any robotics dataset on Hugging Face can be transformed into structured, training-ready data through this pipeline. Centific and Hugging Face are building that infrastructure together, and it is already in use.

Are your ready to get

modular

AI solutions delivered?

Centific offers a plugin-based architecture built to scale your AI with your business, supporting end-to-end reliability and security. Streamline and accelerate deployment—whether on the cloud or at the edge—with a leading frontier AI data foundry.

Centific offers a plugin-based architecture built to scale your AI with your business, supporting end-to-end reliability and security. Streamline and accelerate deployment—whether on the cloud or at the edge—with a leading frontier AI data foundry.

Connect data, models, and people — in one enterprise-ready platform.

Latest Insights

Ideas, insights, and

Ideas, insights, and

research from our team

research from our team

From original research to field-tested perspectives—how leading organizations build, evaluate, and scale AI with confidence.

From original research to field-tested perspectives—how leading organizations build, evaluate, and scale AI with confidence.

Connect with Centific

Stay ahead of what’s next

Stay ahead

Updates from the frontier of AI data.

Receive updates on platform improvements, new workflows, evaluation capabilities, data quality enhancements, and best practices for enterprise AI teams.

By proceeding, you agree to our Terms of Use and Privacy Policy