Evaluating Robotics Foundation Models: From RT-2 to Groot in the Race for General Policy
Defining the Robotics Foundation Model
In the wake of Large Language Models (LLMs), the robotics industry is rapidly pivoting toward "Robotics Foundation Models" (RFMs). However, unlike text-generation models, RFMs must output actions, not just tokens. A true foundation model in robotics maps high-level natural language instructions into low-level motor commands, ideally generalizable across unseen environments. The critical distinction for RobotWale readers is between models trained in simulation versus those validated on shipping hardware. This article grades the leading contenders—Google DeepMind’s RT-2, Tesla’s Groot, and Figure AI’s Pi—based on tangible deployment data rather than whitepaper claims.
Google DeepMind: The RT-2 Vision-Language-Action Model
Google DeepMind introduced RT-2 (Robotic Transformer 2) in 2022, positioning it as a VLA (Vision-Language-Action) model. The core innovation lies in treating robotic control as a form of language modeling. Instead of hand-coding behaviors for every task, the model interprets natural language instructions and image inputs to generate robot actions.
Technical Claims vs. Reality
RT-2 claims to leverage data from the internet and robot datasets to generalize skills. Early demonstrations showed the robot manipulating objects based on text prompts like "pick up the red block." However, the model was primarily trained on large-scale web data and simulated environments. While impressive for zero-shot inference in controlled settings, real-world deployment requires significant edge compute.
Deployment Status: As of late 2024, there is no public evidence of RT-2 powering a mass-produced commercial humanoid robot. It remains largely a research framework available via academic partnerships.
Hardware Implication: Running inference on the edge requires specialized accelerators. For an Indian integration, this implies adding NVIDIA Jetson Orin modules to the robot’s onboard computer. A single Jetson Orin unit costs approximately INR 45,000 to INR 60,000, not including the humanoid chassis integration costs.
Tesla: Groot and End-to-End Video Processing
Tesla introduced "Groot" at its AI Day events, framing it as a video foundation model for robotics. Unlike RT-2’s distinct action tokenization, Groot aims to process video data directly to predict future states and actions. The model operates on the premise that human video data can teach robots how to navigate and manipulate the physical world.
The Dojo Infrastructure
Tesla’s Groot is tightly coupled with its Dojo supercomputer infrastructure. This implies that training and inference are not purely edge-based but rely heavily on cloud connectivity or dedicated local training clusters. For a deployment in India, this creates a dependency on high-bandwidth infrastructure.
Deployment Status: Tesla Optimus is currently in pilot phases. While the Groot model is announced, the hardware required to run it at full inference speed is not yet standard across the supply chain. Tesla focuses on "ship hardware first" principles, meaning the model is often adapted to the hardware rather than the hardware being optimized for the model.
India Availability: As of now, Tesla has not announced a commercial launch date for Optimus in India. The landed cost estimate for a fully integrated Optimus unit is projected between INR 20 Lakhs and INR 30 Lakhs, assuming shipping to India. This excludes the cloud inference costs for Groot, which could add recurring OPEX.
Figure AI: Pi and the Humanoid Focus
Figure AI has gained traction with its Figure 01 humanoid, featuring the "Pi" model. Pi is designed to understand complex instructions and reason about the physical world. Figure AI has partnered with major tech players to integrate Pi into its hardware, emphasizing a vision-centric approach similar to Tesla but with a focus on high-level reasoning.
Real-World Demonstrations
Figure AI has moved beyond simulation more aggressively than many competitors. They have demonstrated Pi taking on tasks like folding laundry and moving objects in a warehouse setting. The key differentiator here is the explicit focus on "shipping hardware" alongside the model.
Deployment Status: Figure AI has secured pilot deployments in industrial settings (e.g., BMW manufacturing lines). This is a stronger signal than whitepaper claims. The Pi model is designed to run on the Figure 01’s onboard compute stack.
India Relevance: If Figure AI expands to India, the hardware cost will be comparable to other humanoid tiers. However, the software licensing for Pi may be bundled with the hardware. Independent estimates suggest a landed cost of INR 35 Lakhs to INR 45 Lakhs for a fully configured Figure 01 unit including basic software licensing.
The Hardware Bottleneck and India Context
Despite the sophistication of models like RT-2, Groot, and Pi, the robotics industry faces a hard constraint: compute latency on the edge. Foundation models are large. Running them on a robot’s onboard processor risks latency that exceeds the safety limits for physical motion.
Edge vs. Cloud Inference
Most VLA models currently rely on cloud inference for heavy lifting. This introduces latency risks for physical robots. A model running on a Jetson Orin (Edge) may sacrifice accuracy compared to a model running on a Data Center GPU.
Cost Breakdown for Indian Integration:
- Compute Hardware: NVIDIA Jetson Orin (Edge) ~ INR 50,000.
- Humanoid Chassis: Generic humanoid platforms ~ INR 20 Lakhs+.
- Model Licensing: Per-unit fees or subscription models (Estimated INR 5 Lakhs+ annually).
- Maintenance: Calibration and model fine-tuning on site.
Regulatory and Infrastructure Constraints
India’s infrastructure variability poses a risk for cloud-based robotics models. In regions with unstable internet, edge inference becomes mandatory. This requires local deployment of the foundation model, increasing the cost of the onboard GPU. Manufacturers must balance the "general policy" promise with the physical reality of low-latency control loops.
Grading the Race: Shipping Hardware First
RobotWale grades these initiatives on a specific hierarchy: Shipping Hardware > Pilot Deployments > Announcements.
- Tesla Groot: Pilot Deployments (High). Hardware shipping (Medium). Model generalization (Medium-High).
- Figure Pi: Pilot Deployments (High). Hardware shipping (Medium). Edge inference (High).
- Google RT-2: Announcements (High). Pilot Deployments (Low). Real-world action (Low-Medium).
Conclusion: The General Policy Horizon
The race to a general policy is not won by model architecture alone. It is won by the ability to deploy inference at the edge without latency penalties. For the Indian market, this means looking past the hype of foundation models and evaluating the specific hardware stack they require. Until a manufacturer ships a robot with a verified foundation model running on its edge compute stack, the "foundation" remains theoretical.
For investors and integrators in India, the priority should be partnerships that offer transparent hardware specifications alongside software models. The cost of hardware integration often dwarfs the cost of the model itself. We await a definitive release where the model and the hardware are sold as a single, verified unit.
References
For further verification of the claims above, the following sources were consulted:
✓ Key takeaways
- •Hands-on view of Evaluating Robotics Foundation Models: From RT-2 to Groot in the Race for General Policy inside our Robotics Foundation Models library.
- •Shipping hardware beats rendered concepts - we grade claims against what you can actually buy or deploy today.
- •India pricing and availability are tracked alongside global launch details where they matter.
References
Related articles
More in Robotics Foundation Models →

