The Race for Robotics Foundation Models: Pi, RT-2, and Groot
Introduction: The Policy Shift in Robotics
The robotics industry has entered a critical inflection point. For decades, automation relied on pre-programmed paths and deterministic control theory. Today, the narrative is shifting toward Robotics Foundation Models (RFMs)—large-scale neural networks that map sensor inputs directly to motor actions. This transition promises to reduce the cost of robotic generalization from weeks of coding to hours of data ingestion. However, RobotWale maintains that hype often outpaces hardware reality. We grade claims by shipping hardware first, pilot deployments second, and announcements last.
Three entities currently dominate the conversation regarding general policy: Google DeepMind's RT-2, Figure AI's Pi, and Tesla's Groot. While the media framing suggests a unified "intelligence" revolution, the technical architectures and deployment realities differ significantly. This article evaluates the current standing of these models, focusing on what is actually shipping versus what remains in research labs.
Google DeepMind: RT-2 and the Web-Scale Vision-Language-Action Model
Technical Architecture and Claims
Google DeepMind introduced RT-2 (Robotic Transformer 2) at I/O in 2023. The model treats robot control as a sequence prediction task, similar to a large language model processing text. It ingests camera images and natural language instructions to output joint control commands. The core claim is "web-scale grounding," meaning the robot learns object affordances from training on internet data rather than just a specific robot dataset.
As of late 2024, RT-2 has moved beyond pure simulation. Google has demonstrated the model running on real physical hardware, specifically the Google Robot Unit (GRU) in controlled environments. However, the deployment is highly restricted. It is not a commercially available API for third-party manufacturers. The model requires significant computational infrastructure to run inference, often necessitating edge servers or cloud connectivity that introduces latency.
Key limitation: The system struggles with out-of-distribution objects. If a robot encounters an object not present in its training data (e.g., a novel tool shape), performance degrades. This is a fundamental constraint of foundation models that does not solve the "long tail" problem of physical robotics.
Availability in India
There is no direct commercial availability for RT-2 in India. It is a proprietary model tied to Google's internal robotics research division. Indian manufacturers interested in this architecture must rely on open-source alternatives or wait for potential licensing partnerships. For a company like a logistics provider in Mumbai, the cost to replicate RT-2 capabilities involves significant R&D spend on transformer architecture, not just model weights.
Figure AI: Pi and the Multimodal LLM Approach
Technical Architecture and Claims
Figure AI, a joint venture with BMW and OpenAI, has attracted significant attention with its "Pi" model. Pi is described as a multimodal large language model (MLLM) specifically designed to control humanoid legs. Unlike RT-2, which focuses on object manipulation, Pi emphasizes natural language interaction and high-level task planning. Figure recently announced the Figure 01 robot, a 1.8-meter humanoid capable of folding clothes and handling lithium batteries.
The claim here is "zero-shot" generalization. Figure demonstrates the robot understanding verbal commands like "I want a water bottle" and executing the grasp without specific programming. The Figure 01 demo video shows the robot walking and grasping objects in a warehouse setting. This is a significant step up from scripted motion, but it remains a controlled demonstration.
Deployment Status: Figure AI has not announced mass production. The Figure 01 is currently in the pilot deployment phase with select partners like BMW. There is no public pricing for the Figure 01 robot, and no shipping schedule for general industrial adoption. The hardware relies on custom actuators and high-torque motors that are not yet standardized in the supply chain.
India Availability and Pricing
The Figure 01 is not available for purchase in India. As a prototype unit, the landed cost would be prohibitive, estimated between $150,000 to $200,000 (approx. ₹1.25 to ₹1.65 Crore) for the hardware alone, excluding the cloud compute costs for the Pi model. Indian manufacturing units would face significant import duties on the custom actuators and sensors, which are currently sourced from US and European suppliers. Until Figure AI opens a supply chain or partners with Indian OEMs, this remains inaccessible hardware.
Tesla: Groot and the Visual Transformer Pipeline
Technical Architecture and Claims
Tesla's Optimus robot is powered by "Groot," a large visual transformer model. Announced at Tesla AI Day, Groot is trained on video data from fleets of Tesla vehicles and Optimus prototypes. The approach treats all sensory inputs as visual tokens, allowing the robot to learn from human teleoperation data. The key differentiator is Tesla's massive scale of video data collection, which potentially offers more robust training data than competitors.
The Groot model feeds into a network that outputs kinematic commands. Unlike Figure AI's Pi, which emphasizes language understanding, Groot emphasizes visual imitation. Tesla claims that the model can learn tasks by watching human demonstrations. The current prototype, Optimus Gen 2, has demonstrated walking and simple object manipulation, but complex dexterity tasks like folding laundry remain partially scripted.
Deployment Status: Tesla has not released the Optimus robot for commercial sale. The "shipping hardware" grade is currently zero for the general market. There are pilot deployments at Tesla's own factories, but these are internal use cases, not external industrial products. The company has not published a price list for the Optimus robot, though Elon Musk has suggested targets of $20,000 (approx. ₹16.5 Lakhs) in the future. This remains unverified pricing.
India Availability and Pricing
Tesla Optimus is not available in India. The regulatory framework for autonomous mobile robots (AMRs) and humanoid robots in India is still under development. The Ministry of Electronics and Information Technology (MeitY) is drafting guidelines for AI safety, but no specific standards exist for humanoid liability yet. Importing a Tesla Optimus prototype would likely be classified as a "research prototype" rather than commercial machinery, attracting high duties. For Indian automotive suppliers, the Groot model represents a competitor rather than a tool, as it is intended to optimize Tesla's own production lines.
The India Context: Hardware Constraints and Economic Reality
The race for foundation models is not just about software; it is about the physical hardware required to execute the policy. In India, the robotics sector faces distinct challenges that foundation model announcements often ignore.
- Actuator Availability: High-torque servo motors and harmonic drives are primarily imported from Japan and Germany. A single humanoid robot requires 40 to 60 of these actuators. Customs duties on these components can add 20% to the landed cost before manufacturing even begins.
- Service Infrastructure: Foundation models are only as good as the hardware maintenance. Indian facilities lack a standardized supply chain for humanoid robot servicing. Unlike traditional industrial arms (e.g., Fanuc, ABB), humanoids require specialized software updates and sensor calibration that are not yet available in Tier-2 cities.
- Compute Costs: Running inference for models like Pi or Groot requires GPU clusters. Cloud-based inference for robotics introduces latency. On-premise GPU clusters (e.g., NVIDIA Jetson or A100s) are expensive, costing over ₹1.5 Lakhs per unit for edge inference hardware.
For Indian manufacturers, the immediate value of these models lies in their training data, not their inference. A company like a logistics provider in Delhi can use the *concept* of these models to inform their data collection strategy, but they cannot yet buy the model to run their fleet.
Conclusion: The Reality Check
The race for robotics foundation models is real, but the timeline for general-purpose deployment is longer than the media suggests. Google's RT-2, Figure AI's Pi, and Tesla's Groot represent significant advancements in mapping language and vision to action. However, they are currently in the pilot deployment phase, not the mass shipping phase.
For the Indian market, the "general policy" promise remains theoretical. Until a vendor ships a robot that can work in a dusty Indian warehouse without human intervention, the foundation model is a research tool, not a commercial product. Investors and manufacturers should prioritize companies with hardware in the field, not just demo videos. The foundation model is the brain, but the body must be robust enough to handle the local environment.
References
- Google DeepMind: "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control". Google DeepMind Blog. https://deepmind.google/discover/blog/rt-2-vision-language-action-models-transfer-web-knowledge-to-robotic-control/
- Figure AI: "Figure 01: The Next Generation of Humanoid Robots". Figure AI Official Site. https://www.figure.ai/
- Tesla: "Optimus: The Future of General Purpose Robotics". Tesla AI Day 2023/2024 Presentations. https://www.tesla.com/optimus
- RobotWale India: "Humanoid Robot Import Regulations in India 2024". RobotWale Editorial. https://robotwale.com/india-regulations-2024
Note: Pricing estimates are based on current exchange rates and standard import duties. Actual landed costs may vary based on bilateral trade agreements and component sourcing.
✓ Key takeaways
- •Hands-on view of The Race for Robotics Foundation Models: Pi, RT-2, and Groot inside our Robotics Foundation Models library.
- •Shipping hardware beats rendered concepts - we grade claims against what you can actually buy or deploy today.
- •India pricing and availability are tracked alongside global launch details where they matter.
References
Related articles
More in Robotics Foundation Models →

