OpenAI, the trailblazing artificial intelligence (AI) company behind the successful ChatGPT, is taking its revolutionary approach to the next level by combining AI with human trainers. This innovative technique, known as reinforcement learning with human feedback (RLHF), aims to make AI models smarter, more coherent, and more reliable.
By incorporating input from human testers, ChatGPT’s AI model was fine-tuned to produce outputs that were judged to be more coherent, less objectionable, and more accurate. However, OpenAI acknowledges that RLHF has its limitations. Human feedback can be inconsistent, and it can be challenging for humans to rate complex outputs, such as sophisticated software code. Additionally, the process can sometimes optimize a model to generate convincing but inaccurate output.
To overcome these limitations, OpenAI has developed a new model, CriticGPT, by refining its most powerful offering, GPT-4. CriticGPT assists human trainers in assessing code and has shown promising results. It was able to detect bugs missed by humans, and human judges found its code critiques to be better 63% of the time. OpenAI plans to extend this approach beyond code assessment in the future.
Nat McAleese, a researcher at OpenAI involved in this work, acknowledges that CriticGPT is not perfect and can make mistakes. However, he believes that integrating this technique into OpenAI’s RLHF chat stack can reduce errors in human training, making their models and tools, such as ChatGPT, more accurate. He also suggests that it could be crucial in helping AI models surpass human abilities, as humans could train AIs that exceed their own capabilities.
This new technique is part of a broader effort to enhance large language models and unleash their full potential. It is also aimed at ensuring AI behaves responsibly and aligns with human values. Anthropic, a rival company founded by ex-OpenAI employees, recently announced a more capable chatbot called Claude, showcasing advances in training and data. Both Anthropic and OpenAI are exploring new ways of inspecting AI models to understand their decision-making processes, thus minimizing unwanted behavior like deception.
OpenAI’s commitment to responsible AI is evident in its training of the next major AI model and its efforts to ensure the model’s behavior is trustworthy. This comes after the disbandment of a team dedicated to assessing long-term AI risks, which attracted criticism from some members for rushing AI algorithm development without adequate precautions.
Dylan Hadfield-Menell, a professor at MIT specializing in AI alignment, sees OpenAI’s technique as a natural development. He suggests that it could lead to significant advancements in individual capabilities and pave the way for more effective feedback in the long run. However, the overall applicability and power of this approach still need to be examined.
OpenAI’s revolutionary approach of combining AI with human trainers demonstrates their commitment to improving AI models and ensuring they align with human values. By addressing the limitations of RLHF, OpenAI can move closer to creating powerful and reliable AI models that can exceed human abilities while maintaining accountability and trustworthiness. As AI continues to evolve, the collaboration between AI and human trainers may prove essential in shaping the future of AI technology.
Use the share button below if you liked it.