Google Rolls Out Gemini 3.1 Flash-Lite as Most Affordable Model in the Lineup
Google has introduced Gemini 3.1 Flash-Lite, its latest AI model in the Gemini 3 family, positioning it as the fastest and most cost-efficient option in the lineup. The new model is designed specifically for high-volume developer workloads, which focus on delivering strong performance at scale without driving up costs.
Gemini 3.1 Flash-Lite pricing and availability
Pricing for Gemini 3.1 Flash-Lite is set at $0.25 per 1M input tokens and $1.50 per 1M output tokens. Google positions this as a cheaper option than bigger models, while still offering strong capability for many tasks that require fast replies and frequent use in production systems.

Gemini 3.1 Flash-Lite is now available in preview through the Gemini API in Google AI Studio for developers, and through Vertex AI for enterprise users. It is designed for large-scale deployment, where many requests are sent at once and response time and cost are both critical.
Gemini 3.1 Flash-Lite benchmarks and AI capabilities
Gemini 3.1 Flash-Lite is engineered to handle demanding, high-frequency workflows where low latency and cost control are crucial. According to Google, the model offers a 2.5x faster Time to First Answer Token and a 45% increase in output speed compared to Gemini 2.5 Flash, based on Artificial Analysis benchmarks.
This improved responsiveness makes it particularly suitable for real-time applications, such as chat interfaces, translation tools, and content moderation systems. Despite being positioned as a lighter, cost-efficient model, Gemini 3.1 Flash-Lite does not compromise heavily on quality. It has achieved:
- Elo score of 1432 on the Arena.ai Leaderboard
- 86.9% on GPQA Diamond
- 76.8% on MMMU Pro
Google claims the model outperforms others in its tier across reasoning and multimodal understanding benchmarks. In some cases, it even surpasses larger models from earlier generations, such as Gemini 2.5 Flash.
Gemini 3.1 Flash-Lite also comes standard with thinking levels in AI Studio and Vertex AI. This feature allows developers to control how much reasoning effort the model applies to a task. For more complex use cases - such as generating dashboards, simulations, or following multi-step instructions - the model can be set to apply deeper reasoning.


Click it and Unblock the Notifications








