- author: Matthew Berman
Super Alignment: Preparing for the Arrival of Super Intelligence
In a recent blog post from OpenAI, the development of super intelligence is discussed, along with the imminent need to address the problem of alignment. OpenAI firmly believes that Artificial General Intelligence (AGI) will be a reality in the near future, which is why they have launched a new team dedicated to solving the alignment problem.
The Significance of Alignment
Alignment refers to ensuring that AGI or any AI system is aligned with the goals of humanity. While super intelligence has the potential to solve numerous global challenges, it also poses significant risks. OpenAI recognizes the need to prevent the disempowerment or even the extinction of humanity due to the immense power of super intelligence.
Current Techniques for Aligning Super Intelligence
OpenAI's current technique for addressing alignment is called reinforcement learning through human feedback (RLHF). This involves using human evaluators to provide feedback on the performance of AI models. Evaluators are tasked with rating the accuracy of different responses generated by the AI model. However, the challenge lies in the fact that humans won't be able to reliably supervise AI systems that are much smarter than us.
While RLHF has its limitations, with AI models sometimes failing to follow instructions or providing biased or toxic responses, OpenAI acknowledges the improvements in models like GPT-4. However, as these models continue to evolve and become more advanced, addressing the alignment problem becomes increasingly crucial.
To overcome these limitations, OpenAI envisions a future where human-like automated systems will conduct alignment research on our behalf. Due to the scalability limitations faced by human researchers, AI systems will increasingly take over alignment work while human researchers review and ensure alignment with human values.
The Challenges of Aligning AI Systems
Aligning AI systems with human values raises critical questions. Who determines what those human values are? OpenAI recognizes the potential amplification of inconsistencies or biases by AI systems tasked with evaluation. Furthermore, aligning AGI may require solving very different problems than aligning present-day AI systems, necessitating new scientific and technical breakthroughs.
The idea of using AI to monitor and align AI systems brings forth concerns about the possibility of being misled. How can humans ensure proper alignment when the systems designed to monitor alignment might themselves be misaligned?
OpenAI's Approach to Alignment
OpenAI's recent blog post highlights their new approach to alignment, divided into three main pillars:
- Training AI systems to handle tasks that are difficult for humans to evaluate.
- Leveraging AI systems to assist in the evaluation of other AI systems.
- Testing the entire alignment pipeline by deliberately training misaligned models and ensuring the techniques can detect the worst forms of misalignments.
OpenAI also discusses the use of self-critique, where an AI agent critiques its own responses, as a way to address the alignment problem. This method of self-evaluation has proven effective in coding problems and may have potential applications in aligning AI systems.
OpenAI's Commitment to Alignment
OpenAI's dedication to addressing the alignment problem is evident through their allocation of 20 percent of their total compute power (estimated to be worth two billion dollars) over the next four years specifically for the super intelligence alignment problem. Their commitment is strengthened by their agreement with Microsoft, who has invested 10 billion dollars, primarily in Azure credits for compute power.
OpenAI aims to be transparent about the effectiveness of their alignment techniques and encourages all AGI developers to adopt the world's best alignment practices. However, concerns arise from the fact that OpenAI, once an advocate for open-source AI, has transitioned into a closed-source organization. While they might offer alignment tools to developers, the inner workings and techniques used remain undisclosed.
The Uncertainty of Super Intelligence Arrival
Acknowledging the risks, OpenAI cautions that there is no guarantee of success in alignment. However, they express a stronger belief in the arrival of AGI within the next decade compared to their confidence in solving the alignment problem. This highlights the urgency of addressing alignment concerns and reinforces the need for continued research and development in this field.
OpenAI is not alone in recognizing the importance of alignment. Other companies, such as Google DeepMind, have also explored techniques like red teaming to identify harmful outputs from language models. These efforts demonstrate that the alignment challenge is being taken seriously across the AI community.
The Unknown Future
As we move closer to the potential arrival of super intelligence, it is both an exciting and slightly alarming time. The development of alignment techniques is essential to navigate the risks associated with super intelligence. However, the complexities of aligning AI systems with human values, determining who decides these values, and the possibility of AI misleading humans raise crucial questions.
The journey toward aligning super intelligence with human values requires ongoing research, scientific breakthroughs, and collaboration between AI systems and human researchers. OpenAI's significant allocation of resources and commitment to alignment demonstrate their dedication to addressing this vital challenge. Nonetheless, the ever-evolving nature of AI necessitates continuous scrutiny, evaluation, and improvement of alignment techniques.