Skip to main content

DAY 0 Support: Gemini 3 Flash on LiteLLM

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

LiteLLM now supports gemini-3-flash-preview and all the new API changes along with it.

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1

What's New​

1. New Thinking Levels: thinkingLevel with MINIMAL & MEDIUM​

Gemini 3 Flash introduces granular thinking control with thinkingLevel instead of thinkingBudget.

  • MINIMAL: Ultra-lightweight thinking for fast responses
  • MEDIUM: Balanced thinking for complex reasoning
  • HIGH: Maximum reasoning depth

LiteLLM automatically maps the OpenAI reasoning_effort parameter to Gemini's thinkingLevel, so you can use familiar reasoning_effort values (minimal, low, medium, high) without changing your code!

2. Thought Signatures​

Like gemini-3-pro, this model also includes thought signatures for tool calls. LiteLLM handles signature extraction and embedding internally. Learn more about thought signatures.

Edge Case Handling: If thought signatures are missing in the request, LiteLLM adds a dummy signature ensuring the API call doesn't break


Supported Endpoints​

LiteLLM provides full end-to-end support for Gemini 3 Flash on:

  • ✅ /v1/chat/completions - OpenAI-compatible chat completions endpoint
  • ✅ /v1/responses - OpenAI Responses API endpoint (streaming and non-streaming)
  • ✅ /v1/messages - Anthropic-compatible messages endpoint
  • ✅ /v1/generateContent – Google Gemini API compatible endpoint All endpoints support:
  • Streaming and non-streaming responses
  • Function calling with thought signatures
  • Multi-turn conversations
  • All Gemini 3-specific features
  • Converstion of provider specific thinking related param to thinkingLevel

Quick Start​

Basic Usage with MEDIUM thinking (NEW)

from litellm import completion

# No need to make any changes to your code as we map openai reasoning param to thinkingLevel
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Solve this complex math problem: 25 * 4 + 10"}],
reasoning_effort="medium", # NEW: MEDIUM thinking level
)

print(response.choices[0].message.content)

Key Features​

✅ Thinking Levels: MINIMAL, LOW, MEDIUM, HIGH
✅ Thought Signatures: Track reasoning with unique identifiers
✅ Seamless Integration: Works with existing OpenAI-compatible client
✅ Backward Compatible: Gemini 2.5 models continue using thinkingBudget


Installation​

pip install litellm --upgrade
import litellm
from litellm import completion

response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Your question here"}],
reasoning_effort="medium", # Use MEDIUM thinking
)
print(response)

reasoning_effort Mapping for Gemini 3+​

reasoning_effortthinking_level
minimalminimal
lowlow
mediummedium
highhigh
disableminimal
noneminimal