Google launched its family of Gemini 2.0 models that includes a version of Gemini 2.0 Flash as its latest Trillium TPUs become generally available.
The new models--Gemini 2.0 Flash will be available in AI Studio and Vertex AI--ride shotgun with multiple Google services as the search and cloud giant revs its agentic AI plans.
Google said that it will launch new features for Project Astra, add an agentic web exploration prototype called Project Mariner to automate browser-based tasks, and roll out Jules, an AI coding agent, to trusted testers. For good measure, Google is exploring Gemini 2.0 for Games.
The launches today are consumer focused in many ways, but make their way to Google Cloud and enterprises as well. Google's Trillium TPUs have been used to train Gemini 2.0 and offer a 4x increase in training performance and 3x gain in inference throughput relative to the previous generation TPU v5e instances. Google said Trillium instances are optimized for price and performance.
In a blog post, Alphabet CEO Sundar Pichai said Gemini 2.0 "is our most capable model yet." "With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant," said Pichai.
Among the key items:
- Gemini 2.0 Flash will be available for Gemini and Gemini Advanced users on desktop and mobile Web.
- Gemini 2.0 Flash is as fast as Gemini 1.5 Pro with gains in coding, reasoning and visual understanding.
- Gemini 2.0 will power AI Overviews in Search this week.
- Google is launching Deep Research, an agentic feature in Gemini Advanced on desktop and mobile web.
- Gemini 2.0 Flash will be generally available in January with more model sizes to follow.
Google Cloud Q3 revenue up 35% from a year ago, Alphabet results shine | Google Cloud Vertex AI updates focus on the practical with Context Caching, grounding services | Google Cloud Next 2024: Google Cloud aims to be data, AI platform of choice | Google Cloud Next: The role of genAI agents, enterprise use cases
Google's announcements land as hyperscalers are racing to release their own workhorse models. Gemini 2.0 Flash is designed to be a workhorse. Amazon Web Services last week launched its Nova family of large language models and Trainium2. Meanwhile, Microsoft has been busy launching its own models as it diversifies away from OpenAI.
Here are a few other details about Gemini 2.0 Flash.
- The model is multimodal for audio and inline image output.
- Gemini 2.0 Flash has a bidirectional streaming API, real-time voice interactions and conversation mechanics like interruptions.
- Google said Gemini 2.0 flash can access up-to-date information, perform calculations and interact with data sources.
- It also has a single interface and unified SDK across AI Studio and Vertex AI.
Google DeepMind CEO Demis Hassabis and CTO Koray Kavukcuoglu wrote:
"In addition to supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions."
Other items worth noting include:
- Project Astra, which was outlined at Google I/O, is designed to be a universal AI assistant. With Gemini 2.0, Project Astra can converse in multiple languages and mix them, leverage Google Search, Lens and Maps, and has 10 minutes of in-session memory.
- Project Mariner, an early prototype built with Gemini 2.0 available via a Chrome extension. Project Mariner is 83.5% accurate working on web tasks as a single agent, but is slow. Google said the big takeaway is that it's technically possible to navigate within a browser with Project Mariner.
- Jules, an agent for developers, is an experimental AI-driven code agent that integrates into GitHub workflows. Jules can handle an issue, plan and execute for Python and JavaScript coding.
- Colab, a data science agent, creates notebooks and insights for anyone who uploads a dataset. With Gemini 2.0, a user can describe analysis goals in plain language and a notebook will be created. Colab will be in the trusted tester program before rolling out broadly in the first half of 2025.
- Deep Research launched in Gemini Advanced, which is upgraded with Gemini 2.0 Flash. Deep Research is an agent that explores topics and generates a report based on a multi-step research plan you revise or approve.
- Gemini 2.0 for Games, an effort to have agents navigate video games and reason based on action on the screen and offer suggestions. Google said these experiments can build on Gemini 2.0's spatial reasoning and apply them to robotics.