- author: AI Explained
Google DeepMind's Gemini: A New Era in AI
In a recent eye-opening interview with Wired Magazine, Demis Hassabis, the head of Google DeepMind, made some thought-provoking statements about Gemini, the upcoming project which is expected to be released this winter. Hassabis boldly claims that Gemini will surpass OpenAI's Chachi PT in terms of capabilities. He further revealed that his team is working on combining the strengths of AlphaGo-like systems with the impressive language capabilities of large models. This synthesis of skills aims to empower Gemini with new functionalities such as planning and problem-solving.
The Context of the Gemini Announcement
Before delving into the potential workings of Gemini, let's take a step back and examine the context in which it was introduced by Sundar Pichai, CEO of Google. Pichai's message emphasized their commitment to developing more advanced and responsible AI systems. Gemini, currently in the training phase, has already displayed remarkable multimodal capabilities not observed in previous models. Moreover, Pichai hinted at exciting upcoming innovations that are set to capture the attention of many skeptics. It is worth noting that DeepMind, the driving force behind AlphaGo and AlphaZero, has a proven track record of groundbreaking achievements, including significant contributions to combating plastic pollution and antibiotic resistance through their breakthroughs in AlphaFold and AlphaFold 2.
The Multi-modality of Gemini
According to recent reports, Gemini's multi-modality will be reinforced through training on YouTube videos. Interestingly, OpenAI has also mined YouTube for data, encompassing not only text transcripts but also audio, images, and comments. This raises the question of whether Google DeepMind might further explore the potential of YouTube beyond data collection. It is evident that the harnessing of multimodal training data is a significant step towards enhancing Gemini's capabilities.
Another notable development from DeepMind is their recent research paper on RoboCAD. This paper introduces a self-improving foundation agent for robotic manipulation, referred to as RoboCAT. Demonstrating impressive adaptability and generalization capabilities, RoboCAT showcases the ability to perform new tasks both with minimal examples and through subsequent iterations of training. This highlights the potential for models themselves to generate data for further training iterations, forming the basis of an autonomous improvement loop.
The Path to AGI and the Role of AlphaGo
Discussions regarding the timeline towards Artificial General Intelligence (AGI) have recently gained attention. Hassabis sheds light on the potential path to AGI by explaining the fundamental approach behind the development of AlphaGo in several of his recent talks. He describes the model as a guide for search processes, where the model determines the most probable moves, ultimately optimizing the efficiency of the search. This combination of model-guided search and learning from simulated and real data has proven successful in game scenarios.
Hassabis draws an interesting parallel between game playing and other problem-solving domains, suggesting that similar techniques could be applied in various fields. One such example is drug discovery, where chemical compounds could replace nodes in the search tree, leading to a near-optimal solution. This concept aligns with recent research demonstrating the advantage of sampling multiple plans and exploring different paths, reminiscent of the findings in the "Tree of Thoughts" paper.
Leveraging Language Models for Planning
The integration of a large language model, such as GPT-4, with alphago-style branching mechanisms holds tremendous potential. Although initial model outputs may not always yield the best results, evolving techniques like Smart GPT and self-consistency have shown that the first output is not necessarily the most accurate. Hassabis predicts that, with advancements in the fusion of language models and planning strategies, the capability of reaching 100% performance on the MMLU (Machine Learning Benchmark) within five years is within grasp. This optimistic outlook is supported by the growing understanding that models can exhibit stronger planning abilities with improved search techniques.
Implications and Challenges Ahead
As intriguing as the possibilities may be, Hassabis acknowledges the dual responsibility of accelerating AI technologies while managing the associated risks. He emphasizes the invaluable benefits AI could bring to scientific discovery, climate research, and healthcare. Mandating a pause or restriction on AI development is deemed impractical and unlikely to be enforceable. Hassabis firmly believes that, when developed responsibly, AI holds the potential to be the most beneficial technology for humanity. However, it is crucial to approach AI's future boldly, addressing potential risks with foresight and courage.
Ensuring the Safety of AI: A Collaborative Approach
In recent years, there has been a surge in the development of powerful AI models, leading to significant advancements in various fields, including machine learning. The emergence of the Machine Learning Models of Uncertain Level (MMLU), as mentioned in my video on Smart GPT, has sparked discussions about the potential capabilities and risks associated with these models. One notable prediction by Ronan is that we could witness a model hitting 100 in the MMLU within five years [^1^]. This bold prediction not only highlights the growing capabilities of AI but also raises questions about its implications.
According to experts like Asabas, one of the current challenges lies in identifying the potential risks posed by increasingly capable AI systems. Asabas emphasizes the need for urgent research on evaluation tests to determine the controllability of these new AI models [^1^]. This addresses concerns about the potential dangers that could arise if AI technology were to surpass human understanding and become uncontrollable.
To address these challenges and foster collaboration, there have been efforts to grant academia early access to cutting-edge AI models. DeepMind, OpenAI, and Anthropoc, for instance, have taken the lead in providing their Foundation models to the UK AI task force, led by Ian Hogarth [^1^]. This initiative reflects the concept of establishing a CERN-like organization, an idea proposed by Hogarth in his paper titled "We Must Slow Down the Race to Godlike AI" [^1^].
Surprisingly, Satya Nadella, the CEO of Microsoft, recently echoed Hogarth's idea, suggesting the need for a collaborative effort involving academics, corporations, and governments to prevent AI from going out of control [^1^]. The alignment problem, as Nadella refers to it, requires a joint effort to ensure that both the scientific understanding and practical engineering of AI systems are well-regulated.
In light of the interview with Asabas, it becomes evident that there is a sense of urgency in addressing the potential dangers posed by AI. He suggests that, while the effectiveness of the safeguards being implemented in their Gemini series seems promising, there is not much time left to develop comprehensive safety measures [^1^]. This statement raises concerns about the adequacy of ongoing efforts in ensuring the safety and success of AI.
On a related note, it is essential to shed light on the level of commitment from organizations like DeepMind when it comes to evaluating and implementing preemptive safety measures. An article published a few months ago estimates that there might be fewer than 100 researchers focused on these critical areas out of their extensive workforce [^1^]. This raises questions about whether such a relatively small proportion of their workforce is sufficient to address safety concerns adequately.
In the upcoming AI Summit, scheduled to take place this autumn in the UK, safety will undoubtedly be a central focus. However, the significance of the commitments made during such summits may vary depending on the level of investment and workforce dedicated to safety research and development by organizations like DeepMind. If it is revealed that a substantial number of researchers are working on these vital aspects, the prospects of ensuring safety and success become more promising [^1^].
InThe imminent release of gemini, with its promise of enhanced multimodal capabilities and its integration of language models with planning mechanisms, signals a new era in ai. as we venture further into the frontiers of agi, the fusion of diverse technologies and research findings from various domains will continue to push the boundaries of what we thought was possible.
, as the capabilities of AI models continue to grow, it is crucial to acknowledge the potential risks associated with their unprecedented power. Collaborative ventures, like the one proposed by Hogarth and endorsed by Nadella, represent a proactive approach to address the alignment problem and prevent AI from spiraling out of control. However, there remains a need for more clarity regarding the allocation of resources towards safety measures within organizations like DeepMind. Ultimately, a robust commitment to safety research and development is essential to pave the way for a secure and prosperous future powered by AI.
Thank you for taking the time to read this article, and I hope you have a wonderful day.