Skip to main content

DAY 0 Support: Gemini 3 Flash on LiteLLM

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

LiteLLM now supports gemini-3-flash-preview and all the new API changes along with it.

note

If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1

What's New​

1. New Thinking Levels: thinkingLevel with MINIMAL & MEDIUM​

Gemini 3 Flash introduces granular thinking control with thinkingLevel instead of thinkingBudget.

  • MINIMAL: Ultra-lightweight thinking for fast responses
  • MEDIUM: Balanced thinking for complex reasoning
  • HIGH: Maximum reasoning depth

LiteLLM automatically maps the OpenAI reasoning_effort parameter to Gemini's thinkingLevel, so you can use familiar reasoning_effort values (minimal, low, medium, high) without changing your code!

2. Thought Signatures​

Like gemini-3-pro, this model also includes thought signatures for tool calls. LiteLLM handles signature extraction and embedding internally. Learn more about thought signatures.

Edge Case Handling: If thought signatures are missing in the request, LiteLLM adds a dummy signature ensuring the API call doesn't break


Supported Endpoints​

LiteLLM provides full end-to-end support for Gemini 3 Flash on:

  • ✅ /v1/chat/completions - OpenAI-compatible chat completions endpoint
  • ✅ /v1/responses - OpenAI Responses API endpoint (streaming and non-streaming)
  • ✅ /v1/messages - Anthropic-compatible messages endpoint
  • ✅ /v1/generateContent – Google Gemini API compatible endpoint All endpoints support:
  • Streaming and non-streaming responses
  • Function calling with thought signatures
  • Multi-turn conversations
  • All Gemini 3-specific features
  • Converstion of provider specific thinking related param to thinkingLevel

Quick Start​

Basic Usage with MEDIUM thinking (NEW)

from litellm import completion

# No need to make any changes to your code as we map openai reasoning param to thinkingLevel
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Solve this complex math problem: 25 * 4 + 10"}],
reasoning_effort="medium", # NEW: MEDIUM thinking level
)

print(response.choices[0].message.content)

Key Features​

✅ Thinking Levels: MINIMAL, LOW, MEDIUM, HIGH
✅ Thought Signatures: Track reasoning with unique identifiers
✅ Seamless Integration: Works with existing OpenAI-compatible client
✅ Backward Compatible: Gemini 2.5 models continue using thinkingBudget


Installation​

pip install litellm --upgrade
import litellm
from litellm import completion

response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Your question here"}],
reasoning_effort="medium", # Use MEDIUM thinking
)
print(response)
note

If using this model via vertex_ai, keep the location as global as this is the only supported location as of now.

reasoning_effort Mapping for Gemini 3+​

reasoning_effortthinking_level
minimalminimal
lowlow
mediummedium
highhigh
disableminimal
noneminimal