Google’s highly anticipated generative AI model, Gemini, has finally made its debut. However, it’s not quite the revolutionary model we were expecting. During a virtual press briefing, members of the Google DeepMind team unveiled Gemini 1.0, which is actually a family of AI models rather than a single model. Let’s dive deeper into the different variations of Gemini and their capabilities.
Key Takeaway
Google has unveiled its next-gen AI model, Gemini, which consists of two variations: Gemini Pro and Gemini Ultra. While Gemini Pro offers improved capabilities over its predecessor, Gemini Ultra is the flagship model, demonstrating natively multimodal abilities. However, both versions have limitations and unanswered questions regarding their training data and environmental impact.
Gemini Pro: The Lite Version
Gemini Pro is a lightweight version of the more powerful flagship model, Gemini Ultra, which is set to launch next year. Gemini Pro offers improved reasoning, planning, and understanding capabilities compared to its predecessor, making it competitive with OpenAI’s GPT-3.5. However, it’s worth noting that GPT-3.5 is over a year old, so surpassing its performance is not a groundbreaking achievement.
Gemini Pro will be integrated into Google’s ChatGPT competitor, Bard, and will be available for enterprise customers using Vertex AI, Google’s machine learning platform. It will also be incorporated into various Google products like Duet AI, Chrome, Ads, and the Search Generative Experience.
Gemini Ultra: Advancing Multimodal Capabilities
Gemini Ultra, the flagship model of the Gemini family, offers a more impressive range of capabilities. It is designed to be “natively multimodal,” meaning it can comprehend and generate text, images, audio, and code simultaneously. Gemini Ultra outperforms OpenAI’s GPT-4 with Vision in several benchmarks, showcasing its proficiency in understanding and generating complex information across different modalities.
Unlike GPT-4 with Vision, which can only understand the context of words and images, Gemini Ultra can also transcribe speech, answer questions about audio and videos, and interpret art and photos. It excels in fields such as math and physics, making it a valuable tool for tasks requiring nuanced comprehension.
Concerns and Limitations
While Gemini Pro and Gemini Ultra demonstrate advancements in generative AI, there are still notable limitations and concerns. Gemini Ultra’s benchmark performance shows only marginal improvements over previous AI models like GPT-4 with Vision. Gemini’s training datasets were not discussed in detail, raising questions about data collection, sources, and potential licensing issues. Google’s decision not to address these questions may invite speculation regarding fair use and copyright concerns.
Google claims that Gemini is their most efficient large generative AI model to date, but no specific information was provided regarding the number of chips used during training or the associated environmental impact. The training of such models releases significant amounts of carbon emissions, and users are increasingly concerned about the sustainability of AI technologies.