multimodal

Gemini 2.0, Google’s newest flagship AI, can generate text, images, and speech

Google’s next major AI model has arrived to combat a slew of new offerings from OpenAI. On Wednesday, Google announced Gemini 2.0 Flash, which the company says can natively generate images and audio in addition to text. 2.0 Flash can also use third-party apps and services, allowing it to tap into Google Search, execute code, […]

Gemini 2.0, Google’s newest flagship AI, can generate text, images, and speech Read More »

Mistral releases Pixtral, its first multimodal model

French AI startup Mistral has released its first model that can process images as well as text. Called Pixtral 12B, the 12-billion-parameter model is roughly 24GB size. (Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.) Available on GitHub as well as the AI and machine

Mistral releases Pixtral, its first multimodal model Read More »