- author: AI FOCUS
Google DeepMind Introduces RoboCat: A Self-Improving AI for General Purpose Robotics
Ladies and gentlemen, Google DeepMind has unveiled its latest innovation in the field of artificial intelligence (AI) - RoboCat. While it may not be as glamorous as RoboCop, RoboCat represents a significant step towards realizing the vision of independent and useful robots that many of us associate with futuristic cartoons like The Jetsons.
The Quest for General Purpose Robots
Robots that are programmed for specific tasks already play a role in certain areas of our lives and have proven to be quite useful. However, the development of a general-purpose robot that can adapt and perform a wide range of tasks has been a challenge. Training such robots on real-world data is time-consuming. This is the driving force behind Google DeepMind's new research, which introduces RoboCat, a self-improving AI agent capable of learning various tasks across multiple robot arms and generating self-training data to enhance its techniques.
This concept may seem familiar to those who follow Google's developments in robotics, as it is reminiscent of their other robot - Palm-E. Palm-E uses the Palm language model to understand and complete tasks, demonstrating the ability to multitask and even continue tasks if interrupted. However, RoboCat stands out by accomplishing tasks across different physical robots, learning at a faster rate than state-of-the-art models, and quickly picking up new tasks with as few as 100 demonstrations. This remarkable capability is attributed to RoboCat's large and diverse dataset.
How RoboCat Improves Itself
To understand how RoboCat achieves its self-improvement, it's important to know that it is based on Gato, DeepMind's multi-modal model, which processes language, images, and actions in both physical and simulated environments. Gato's architecture was combined with a massive training dataset consisting of sequences of images and actions of robot arms completing hundreds of tasks. Once this initial training was complete, it was time to set RoboCat on a path of self-improvement with unseen tasks.
The learning process for each task follows five steps:
- Collect 100 to a thousand demonstrations of a robotic arm performing a new task.
- Fine-tune RoboCat on this specific task, creating a spin-off agent.
- The spin-off agent practices the new task 10,000 times, generating new training data.
- Store the demonstration data and self-generated data in RoboCat's dataset.
- Train a new version of RoboCat using the updated training dataset.
By following this iterative process, the latest version of RoboCat is based on a dataset containing millions of trajectories from robot arms, including self-generated data. Researchers utilized four different types of robots with various types of arms to collect vision-based data representing the tasks RoboCat would be trained to perform.
Examples of the training data include a real robot picking up gears, a simulated robot stacking blocks, and a self-generated example of RoboCat picking up a cucumber.
Rapid Learning and Adaptation
Thanks to the new training data, RoboCat demonstrated its ability to learn to control different types of robot arms within hours. Initially trained on arms with two-pronged grippers, RoboCat quickly adapted to arms with three-fingered grippers, effectively doubling the number of controllable outputs. This self-taught flexibility is showcased in a video where RoboCat is seen lifting fruit with panda robots - a task it learned during training. It is worth noting that pandas and sawyers, the two robot models used, have different action specifications, making this an impressive example of cross-embodiment transfer.
Impressive Results and Future Prospects
After observing 1,000 human control demonstrations, RoboCat achieved a success rate of 86% in picking up gears within the same number of demonstrations. It also exhibited adaptability in softer tasks that require both precision and understanding, such as correctly identifying and removing fruit from a bowl or solving shape matching puzzles. The more tasks RoboCat learns, the better it becomes at learning additional ones, showcasing a human-like capacity for exponential learning.
The initial version of RoboCat achieved a success rate of 36% on previously unseen tasks after learning from 500 demonstrations per task. However, the latest version of RoboCat, trained on a larger diversity of tasks, more than doubled the success rate. DeepMind believes that RoboCat's ability to autonomously learn skills and rapidly self-improve, especially when applied to different robotic devices, will contribute to the development of a new generation of more helpful general-purpose robotic agents.
Introducing Parkour: Teaching Robots Animal-Like Agility
In addition to RoboCat, Google Research has also made progress in the realm of animal-like agility for robots with their development of Parkour. The robotics community has long sought to equip robots with the agility and mobility of animals and humans, enabling them to navigate complex environments more effectively. However, until now, there have been no defined benchmarks or standards for evaluating robot agility within the AI community.
To address this, Google researchers have developed Barcore, an agility benchmark tailored specifically for quadruped robots. Inspired by agility competitions, this benchmark requires legged robots to demonstrate skills such as obstacle jumping, direction changing, and traversing uneven terrain within a specified time limit. By setting the performance of Google's robot in relation to the agility of small dogs, the researchers created a clear goal for the development of machine learning-based locomotion controllers.
The Barcore course is designed with various obstacles, including poles for weaving, an A-frame, a broad jump, and a step onto an end table. These obstacles test different skills, and the course can be easily modified to suit different scenarios. Researchers trained a "teacher" robot on individual skills like walking and jumping using parallel simulation. The "student" robot then learned from the teacher's datasets, incorporating transitions between skills. By using a Transformer-based model, all the acquired skills were integrated into an overarching policy, allowing the robot to move across diverse terrains. The navigation controller guides the robot's movements, while a recovery policy helps the robot quickly regain stability should it fall. Finally, a return to start policy minimizes the need for human intervention during the completion of the course.
The Barcore scoring system established a target time for each obstacle and an overall course target time, both based on the agility of small dogs in novice competitions. The robot must complete all obstacles within 10 seconds, without skipping, failing, or moving too slowly. By using this benchmark, researchers can benchmark the agility of quadruped robots and strive towards enhancing their performance.
In a video demonstration, Google's robot completes the Barcore course in about 20 seconds, while a small dog typically accomplishes it in 10 seconds.
AI Leaders Niche Down: Google, OpenAI, and Meta
As we witness Google DeepMind's advancements in self-improving AI with RoboCat and Google Research's pursuit of animal-like agility with Parkour, it becomes clear that different AI leaders are carving out their niches. OpenAI is silently developing superintelligence, while Meta is pushing the boundaries of open-source models. The future direction of technology companies remains uncertain, but what is certain is that exciting innovations lie ahead.
RoboCat and Parkour demonstrate