DAY 0 Support: Gemini 3 Flash on LiteLLM

December 17, 2025

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

LiteLLM now supports gemini-3-flash-preview and all the new API changes along with it.

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1

pip install litellm
pip install litellm==1.80.8.post1

What's New

1. New Thinking Levels: `thinkingLevel` with MINIMAL & MEDIUM

Gemini 3 Flash introduces granular thinking control with thinkingLevel instead of thinkingBudget.

MINIMAL: Ultra-lightweight thinking for fast responses
MEDIUM: Balanced thinking for complex reasoning
HIGH: Maximum reasoning depth

LiteLLM automatically maps the OpenAI reasoning_effort parameter to Gemini's thinkingLevel, so you can use familiar reasoning_effort values (minimal, low, medium, high) without changing your code!

2. Thought Signatures

Like gemini-3-pro, this model also includes thought signatures for tool calls. LiteLLM handles signature extraction and embedding internally. Learn more about thought signatures.

Edge Case Handling: If thought signatures are missing in the request, LiteLLM adds a dummy signature ensuring the API call doesn't break

Supported Endpoints

LiteLLM provides full end-to-end support for Gemini 3 Flash on:

✅ /v1/chat/completions - OpenAI-compatible chat completions endpoint
✅ /v1/responses - OpenAI Responses API endpoint (streaming and non-streaming)
✅ /v1/messages - Anthropic-compatible messages endpoint
✅ /v1/generateContent – Google Gemini API compatible endpoint All endpoints support:
Streaming and non-streaming responses
Function calling with thought signatures
Multi-turn conversations
All Gemini 3-specific features
Converstion of provider specific thinking related param to thinkingLevel

Quick Start

SDK
PROXY
LOW
MEDIUM (NEW)
HIGH

Basic Usage with MEDIUM thinking (NEW)

from litellm import completion

# No need to make any changes to your code as we map openai reasoning param to thinkingLevel
response = completion(
    model="gemini/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Solve this complex math problem: 25 * 4 + 10"}],
    reasoning_effort="medium",  # NEW: MEDIUM thinking level
)

print(response.choices[0].message.content)

1. Setup config.yaml

model_list:
  - model_name: gemini-3-flash
    litellm_params:
      model: gemini/gemini-3-flash-preview
      api_key: os.environ/GEMINI_API_KEY

2. Start proxy

litellm --config /path/to/config.yaml

3. Call with MEDIUM thinking

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "Complex reasoning task"}],
    "reasoning_effort": "medium"
  }'
``'

</TabItem>
</Tabs>

---

## All `reasoning_effort` Levels

<Tabs>
<TabItem value="minimal" label="MINIMAL">

**Ultra-fast, minimal reasoning**

```python
from litellm import completion

response = completion(
    model="gemini/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "What's 2+2?"}],
    reasoning_effort="minimal",
)

Simple instruction following

response = completion(
    model="gemini/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    reasoning_effort="low",
)

Balanced reasoning for complex tasks ✨

response = completion(
    model="gemini/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Analyze this dataset and find patterns"}],
    reasoning_effort="medium",  # NEW!
)

Maximum reasoning depth

response = completion(
    model="gemini/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Prove this mathematical theorem"}],
    reasoning_effort="high",
)

Key Features

✅ Thinking Levels: MINIMAL, LOW, MEDIUM, HIGH
✅ Thought Signatures: Track reasoning with unique identifiers
✅ Seamless Integration: Works with existing OpenAI-compatible client
✅ Backward Compatible: Gemini 2.5 models continue using thinkingBudget

Installation

pip install litellm --upgrade

import litellm
from litellm import completion

response = completion(
    model="gemini/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Your question here"}],
    reasoning_effort="medium",  # Use MEDIUM thinking
)
print(response)

`reasoning_effort` Mapping for Gemini 3+

reasoning_effort	thinking_level
`minimal`	`minimal`
`low`	`low`
`medium`	`medium`
`high`	`high`
`disable`	`minimal`
`none`	`minimal`

Deploy this version​

What's New​

1. New Thinking Levels: thinkingLevel with MINIMAL & MEDIUM​

2. Thought Signatures​

Supported Endpoints​

Quick Start​

Key Features​

Installation​

reasoning_effort Mapping for Gemini 3+​