Google's Gemini AI models are designed for multimodal tasks, including understanding and generating text, images, audio, and video. They can perform tasks such as writing code, interpreting images, generating content, and answering questions2. The models can also analyze large datasets and assist with tasks in various Google products and services.
Google claimed that Gemini AI has sophisticated multimodal reasoning and advanced coding capabilities5. It can generalize and seamlessly understand, operate across, and combine different types of information, including text, images, audio, video, and code6. With three different sizes - Ultra, Pro, and Nano - Gemini has the flexibility to run on everything from data centers to mobile devices5.
Gemini 1.5 Pro's performance in true/false fiction book tests was subpar. When tested on a 260,000-word book, it answered the true/false statements correctly only 46.7% of the time, while its counterpart, Gemini 1.5 Flash, achieved a mere 20% accuracy. This suggests that the model struggles to understand and process large amounts of textual data effectively.