

Topics
7 min read time
Centific today announced the general availability of Data Canvas, a next-generation Annotation Operating System (AoS) purpose-built to capture, synchronize, and annotate the most complex data in AI — spanning robotics, embodied AI, and autonomous systems. Unlike general-purpose annotation tools, Data Canvas is designed from the ground up for multi-sensor fusion, reasoning-traceable labeling, and the full Physical AI data lifecycle.
The global AI annotation market is forecast to grow from approximately $2.3 billion in 2024 to between $17 billion and $28 billion by 2034, compounding at CAGRs of 27–29% (Precedence Research, Market.us, Mordor Intelligence). The fastest-growing segments, LiDAR and point cloud annotation (30.9% CAGR), multimodal annotation (31.1% CAGR), and video annotation (32–34% CAGR), are precisely the domains where conventional platforms fall furthest short, and where Data Canvas was purpose-engineered to lead.
"The bottleneck in Physical AI is not algorithms, it is annotated data at the quality and scale that foundation models require," said Vasu Sundarababu, Chief AI Data Officer at Centific. "Data Canvas and our GAZE pre-annotation methodology together are built to remove that bottleneck."
A Decade of Multimodal Expertise — Now Applied to Physical AI
Centific has spent more than a decade building the data infrastructure behind some of the world's most capable AI systems. With over two million expert annotators globally, Centific has contributed to benchmark design and evaluation frameworks across image, video, audio, speech, and sensor modalities, not just providing people to execute annotation tasks, but helping shape how the field measures progress.
Centific's practice spans the full pipeline: data collection, automated pre-annotation, human-in-the-loop reasoning, automated QA, red-teaming, and cross-modal consistency evaluation. At NeurIPS 2025, the Centific AI Research (CAIR) Labs team published research on governance-aware pre-annotation for zero-shot world models, a direct bridge between frontier AI research and production annotation pipelines, and the foundation of the GAZE methodology now integrated into Data Canvas.
Centific's Physical AI practice enables machines to perceive, reason, and act reliably in complex real-world environments, combining computer vision, simulation, sensor fusion, and adaptive control across warehouses, factories, and urban environments. These workflows are purpose-aligned to world foundation model architectures, supporting Cosmos-style physics-aware video generation and V-JEPA-2-style latent-space predictive training, with reasoning traces that capture not just what happened, but why it happened and what should happen next. Strategic partnerships with NVIDIA and the University of San Diego Robotics Lab underpin the platform's physical AI capabilities.
Why Physical AI Data Is Uniquely Hard
A robot learning to pick up an irregularly shaped object from a moving conveyor — or follow a verbal instruction while avoiding a patient's IV line — requires far more than labeled images. It requires synchronized, multi-modal, temporally aligned ground truth captured at the moment of demonstration. A single task may span multi-camera video at precisely aligned timestamps, joint torque and proprioceptive sensor readings at 500Hz or higher, 3D point clouds from LiDAR or structured light, and operator intent labels with object state transitions across task phases.
Losing temporal alignment across any of these streams can render data unusable for policy training. Most annotation tools were built for a simpler era. Data Canvas was not.
Platform Capabilities
Unified Sensor Timeline — The Rerun Integration View
The Rerun-based multi-sensor temporal visualizer is the defining capability of Data Canvas for Physical AI teams. Every sensor modality — camera feeds, LiDAR point clouds, radar, IMU, GPS, proprioceptive sensor streams — is placed on a shared temporal timeline. Annotators can scrub through any recording and inspect what each sensor registered at any moment, synchronized rather than approximated. Real-time ingestion from live robotic systems means the most valuable annotation work — operator demonstrations, contact events, and novel navigation scenarios — can be captured and labeled as they happen, with no intermediate batch export step.
Universal Data Ingestion — Zero Preprocessing Required
Data Canvas natively accepts video, audio, images, 3D point clouds (PCD/PLY/LAS/LAZ/BIN), HDF5 scientific archives, DICOM medical imaging series, GeoTIFF and vector geospatial formats, binary sensor streams, and custom file types. Format detection, viewer selection, and annotation interface configuration are automatic. For teams where data preprocessing currently consumes 40–60% of project timelines, this represents a fundamental change in delivery speed and cost.
Advanced Video, 3D, and Point Cloud Annotation
Frame-accurate video annotation with object tracking and temporal segmentation captures not just what the robot saw, but what was happening — to which objects — during each phase of a task. This level of detail enables imitation learning systems to generalize effectively across novel environments. Point cloud annotation synchronized across all sensor streams provides spatially grounded labels encoding true world geometry, rather than simple projections. Stage-by-stage 3D annotation with height-normalized color mapping supports complex multi-view scenarios across thousands of points per frame.
Reasoning-Traceable Labeling
Data Canvas Physical AI datasets go where standard vision datasets stop — annotating causal structure, object state transitions, and goal-conditioned intent. Reasoning traces capture not just what happened, but why it happened and what should happen next. This level of semantic depth is required for training on NVIDIA GR00T, Google RT-X, and custom manipulation policies, and for producing RLDS-compatible training datasets without leaving the platform.
Multi-Model AI Integration Pipeline
Any REST-accessible model can be connected to Data Canvas as a pre-labeler, pre-processor, or post-processing validator. Native adapters ship for leading detection and segmentation models (YOLO v8/v9/v10, SAM, SAM2, Grounding DINO), 3D LiDAR models (PointPillars, CenterPoint, VoxelNet), speech-to-text models (Whisper, Wav2Vec2), NLP frameworks (Hugging Face Transformers), and medical AI frameworks (MONAI) — alongside a generic REST adapter for any custom inference endpoint.
Medical Robotics and DICOM Support
DICOM support extends Data Canvas workflows into medical robotics and surgical AI applications, with clinical-grade annotation tools, 3D volumetric segmentation, PHI de-identification, and MONAI model integration for organ segmentation and lesion detection pre-labeling.
CesiumJS Geospatial Intelligence
Full CesiumJS integration brings geo-referenced image annotation, multi-layer geospatial compositing, 3D stencil model placement, and temporal change detection to autonomous navigation, smart infrastructure, and aerial robotics workflows.
Enterprise-Grade Quality and Compliance
The Data Canvas quality engine implements inter-annotator agreement scoring (Fleiss' Kappa, Krippendorff's Alpha), Dawid-Skene probabilistic aggregation, per-annotator reliability scoring, honeypot injection, and disagreement-triggered expert routing. Compliance infrastructure — immutable audit trails, full annotator provenance metadata, multi-tier review pipelines, semantic dataset versioning, and SSO integration — satisfies requirements under FDA SaMD, ISO 26262, the EU AI Act, and NIST AI RMF.
Built for the Frontier: Meta Aria Gen 2 and the A2PD Dataset
Data Canvas is engineered for the sensor-rich, temporally grounded data that defines the frontier of Physical AI. Meta's Aria Gen 2 glasses — featuring fully on-device 6DoF tracking, eye and hand tracking, speech recognition, a 120 dB HDR quad-camera suite, heart rate sensor, and sub-millisecond multi-device sync — represent exactly the kind of egocentric multimodal platform that Data Canvas is built to annotate at scale. The open Aria Gen 2 Pilot Dataset (A2PD), covering cooking, cleaning, and outdoor scenarios with fully synchronized sensor streams, is a natural fit for Data Canvas's unified sensor timeline and reasoning-traceable annotation workflows.
Teams training on NVIDIA GR00T, Google RT-X, or custom manipulation policies can move from raw sensor capture to annotated, RLDS-compatible training datasets without leaving the platform.
Strategic Market Context
The strategic urgency around annotation infrastructure was underscored in June 2025, when Meta invested approximately $15 billion for a 49% stake in Scale AI, valuing the company at more than $29 billion — one of the largest enterprise AI transactions in history. The signal is unambiguous: proprietary, high-quality annotation infrastructure has become a mission-critical asset. Data Canvas gives enterprise AI teams a powerful, self-hostable alternative — one backed by Centific's decade of multimodal annotation depth and active frontier research through CAIR Labs.
Availability
Data Canvas is available now in cloud-hosted and on-premises configurations, with enterprise SSO, role-based access control, and MLOps integrations including MLflow, Weights & Biases, and DVC. To request a demo or discuss enterprise deployment, visit: https://www.centific.com/contact/book-a-demo
About Centific
Centific is a global AI data and services company with over two million expert annotators worldwide, delivering the data infrastructure behind some of the world's most capable AI systems. Centific AI Research (CAIR) Labs drives frontier research in multimodal annotation, governance-aware pre-labeling, and physical AI data pipelines. Data Canvas is Centific's Annotation Operating System for modern AI, unifying universal data ingestion, multi-sensor Physical AI visualization, intelligent model-assisted labeling, live data streaming, geospatial and medical imaging annotation, and enterprise-grade ground truth certification in a single platform.
Are your ready to get
modular
AI solutions delivered?
Connect data, models, and people — in one enterprise-ready platform.
Latest Insights
Connect with Centific
Updates from the frontier of AI data.
Receive updates on platform improvements, new workflows, evaluation capabilities, data quality enhancements, and best practices for enterprise AI teams.

