Last October, a research paper published by a Google data scientist, the CTO of Databricks Matei Zaharia and UC Berkeley professor Pieter Abbeel posited a way to allow GenAI models — i.e. models along the lines of OpenAI’s GPT-4 and ChatGPT — to ingest far more data than was previously possible. In the study, the co-authors demonstrated that, by removing a major memory bottleneck for AI models, they could enable models to process millions of words as opposed to hundreds of thousands — the maximum of the most capable models at the time.
Key Takeaway
Google’s Gemini 1.5 Pro is a groundbreaking GenAI model that can process a significantly larger amount of data, marking a major advancement in AI technology.
Google’s Gemini 1.5 Pro: A Breakthrough in GenAI Models
Today, Google announced the release of Gemini 1.5 Pro, the newest member of its Gemini family of GenAI models. Designed to be a drop-in replacement for Gemini 1.0 Pro, Gemini 1.5 Pro is improved in a number of areas compared with its predecessor, perhaps most significantly in the amount of data that it can process.
- Gemini 1.5 Pro can take in ~700,000 words, or ~30,000 lines of code — 35x the amount Gemini 1.0 Pro can handle.
- The model being multimodal, it’s not limited to text. Gemini 1.5 Pro can ingest up to 11 hours of audio or an hour of video in a variety of different languages.
Unlocking Long Context in a Massive Way
Google characterizes the large-data-input Gemini 1.5 Pro as “experimental,” allowing only developers approved as part of a private preview to pilot it via the company’s GenAI dev tool AI Studio. Several customers using Google’s Vertex AI platform also have access to the large-data-input Gemini 1.5 Pro — but not all.
VP of research at Google DeepMind Oriol Vinyals heralded it as an achievement, stating, “We’ve unlocked long context in a pretty massive way.”
Big Context and Its Implications
A model’s context, or context window, refers to input data that the model considers before generating output. Models with large context windows can better grasp the narrative flow of data they take in and generate more contextually rich responses.
Google’s Gemini 1.5 Pro’s maximum context window is 1 million tokens, enabling it to perform tasks such as analyzing a whole code library, reasoning across lengthy documents like contracts, holding long conversations with a chatbot, and analyzing and comparing content in videos.
Other Improvements and Pricing
Beyond the expanded context window, Gemini 1.5 Pro brings other quality-of-life upgrades to the table. Google claims that Gemini 1.5 Pro is “comparable” to the current version of Gemini Ultra, Google’s flagship GenAI model, thanks to a new architecture comprised of smaller, specialized “expert” models.
During the private preview, Gemini 1.5 Pro with the 1 million-token context window will be free to use, Google says. But the company plans to introduce pricing tiers in the near future that start at the standard 128,000 context window and scale up to 1 million tokens.