Skip to content

Conversation

Toy-97
Copy link
Contributor

@Toy-97 Toy-97 commented Sep 26, 2025

Relevant issues

This PR updates DeepInfra model data
Addresses model availability - new models added
Addresses model deprecation - models removed
Addresses model parameter updates - pricing/metadata changes

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem.

Type

📖 Documentation

Changes

Added Models:
deepinfra/deepseek-ai/DeepSeek-V3.1-Terminus

Removed Models:
deepinfra/zai-org/GLM-4.5-Air

Modified Models:
deepinfra/NousResearch/Hermes-3-Llama-3.1-70B:

  • input_cost_per_token: 1.2e-07 → 3e-07

deepinfra/Qwen/Qwen3-32B:

  • output_cost_per_token: 3e-07 → 2.8e-07

deepinfra/Qwen/Qwen3-Next-80B-A3B-Instruct:

  • max_tokens: 4096 → 262144
  • max_output_tokens: 4096 → 262144
  • max_input_tokens: 4096 → 262144

deepinfra/Qwen/Qwen3-Next-80B-A3B-Thinking:

  • max_tokens: 4096 → 262144
  • max_output_tokens: 4096 → 262144
  • max_input_tokens: 4096 → 262144

deepinfra/Qwen/Qwen3-235B-A22B-Instruct-2507:

  • input_cost_per_token: 1.3e-07 → 9e-08

deepinfra/meta-llama/Meta-Llama-3.1-70B-Instruct:

  • input_cost_per_token: 2.3e-07 → 4e-07

deepinfra/google/gemini-2.5-flash:

  • output_cost_per_token: 1.75e-06 → 2.5e-06
  • input_cost_per_token: 2.1e-07 → 3e-07

deepinfra/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo:

  • output_cost_per_token: 2e-08 → 3e-08
  • input_cost_per_token: 1.5e-08 → 2e-08

deepinfra/meta-llama/Llama-3.2-3B-Instruct:

  • output_cost_per_token: 2.4e-08 → 2e-08
  • input_cost_per_token: 1.2e-08 → 2e-08

deepinfra/Sao10K/L3-8B-Lunaris-v1-Turbo:

  • input_cost_per_token: 2e-08 → 4e-08

deepinfra/openai/gpt-oss-120b:

  • input_cost_per_token: 9e-08 → 5e-08

deepinfra/google/gemini-2.5-pro:

  • output_cost_per_token: 7e-06 → 1e-05
  • input_cost_per_token: 8.75e-07 → 1.25e-06

deepinfra/NousResearch/Hermes-3-Llama-3.1-405B:

  • output_cost_per_token: 8e-07 → 1e-06
  • input_cost_per_token: 7e-07 → 1e-06

deepinfra/Qwen/Qwen3-235B-A22B:

  • output_cost_per_token: 6e-07 → 5.4e-07
  • input_cost_per_token: 1.3e-07 → 1.8e-07

deepinfra/nvidia/Llama-3.1-Nemotron-70B-Instruct:

  • output_cost_per_token: 3e-07 → 6e-07
  • input_cost_per_token: 1.2e-07 → 6e-07

deepinfra/meta-llama/Llama-3.3-70B-Instruct-Turbo:

  • output_cost_per_token: 1.2e-07 → 3.9e-07
  • input_cost_per_token: 3.8e-08 → 1.3e-07

deepinfra/deepseek-ai/DeepSeek-V3-0324:

  • input_cost_per_token: 2.8e-07 → 2.5e-07
  • cache_read_input_token_cost: 2.24e-07 → None

deepinfra/mistralai/Mistral-Small-3.2-24B-Instruct-2506:

  • output_cost_per_token: 1e-07 → 2e-07
  • input_cost_per_token: 5e-08 → 7.5e-08

deepinfra/Qwen/Qwen3-235B-A22B-Thinking-2507:

  • output_cost_per_token: 6e-07 → 2.9e-06
  • input_cost_per_token: 1.3e-07 → 3e-07

deepinfra/zai-org/GLM-4.5:

  • output_cost_per_token: 2e-06 → 1.6e-06
  • input_cost_per_token: 5.5e-07 → 4e-07

deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1:

  • output_cost_per_token: 2.4e-07 → 4e-07
  • input_cost_per_token: 8e-08 → 4e-07

deepinfra/openai/gpt-oss-20b:

  • output_cost_per_token: 1.6e-07 → 1.5e-07

deepinfra/google/gemma-3-27b-it:

  • output_cost_per_token: 1.7e-07 → 1.6e-07

Added models:
deepinfra/deepseek-ai/DeepSeek-V3.1-Terminus

Removed models:
deepinfra/zai-org/GLM-4.5-Air

Modified models:
deepinfra/NousResearch/Hermes-3-Llama-3.1-70B:
   - input_cost_per_token: 1.2e-07 → 3e-07

deepinfra/Qwen/Qwen3-32B:
   - output_cost_per_token: 3e-07 → 2.8e-07

deepinfra/Qwen/Qwen3-Next-80B-A3B-Instruct:
   - max_tokens: 4096 → 262144
   - max_output_tokens: 4096 → 262144
   - max_input_tokens: 4096 → 262144

deepinfra/Qwen/Qwen3-Next-80B-A3B-Thinking:
   - max_tokens: 4096 → 262144
   - max_output_tokens: 4096 → 262144
   - max_input_tokens: 4096 → 262144

deepinfra/Qwen/Qwen3-235B-A22B-Instruct-2507:
   - input_cost_per_token: 1.3e-07 → 9e-08

deepinfra/meta-llama/Meta-Llama-3.1-70B-Instruct:
   - input_cost_per_token: 2.3e-07 → 4e-07

deepinfra/google/gemini-2.5-flash:
   - output_cost_per_token: 1.75e-06 → 2.5e-06
   - input_cost_per_token: 2.1e-07 → 3e-07

deepinfra/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo:
   - output_cost_per_token: 2e-08 → 3e-08
   - input_cost_per_token: 1.5e-08 → 2e-08

deepinfra/meta-llama/Llama-3.2-3B-Instruct:
   - output_cost_per_token: 2.4e-08 → 2e-08
   - input_cost_per_token: 1.2e-08 → 2e-08

deepinfra/Sao10K/L3-8B-Lunaris-v1-Turbo:
   - input_cost_per_token: 2e-08 → 4e-08

deepinfra/openai/gpt-oss-120b:
   - input_cost_per_token: 9e-08 → 5e-08

deepinfra/google/gemini-2.5-pro:
   - output_cost_per_token: 7e-06 → 1e-05
   - input_cost_per_token: 8.75e-07 → 1.25e-06

deepinfra/NousResearch/Hermes-3-Llama-3.1-405B:
   - output_cost_per_token: 8e-07 → 1e-06
   - input_cost_per_token: 7e-07 → 1e-06

deepinfra/Qwen/Qwen3-235B-A22B:
   - output_cost_per_token: 6e-07 → 5.4e-07
   - input_cost_per_token: 1.3e-07 → 1.8e-07

deepinfra/nvidia/Llama-3.1-Nemotron-70B-Instruct:
   - output_cost_per_token: 3e-07 → 6e-07
   - input_cost_per_token: 1.2e-07 → 6e-07

deepinfra/meta-llama/Llama-3.3-70B-Instruct-Turbo:
   - output_cost_per_token: 1.2e-07 → 3.9e-07
   - input_cost_per_token: 3.8e-08 → 1.3e-07

deepinfra/deepseek-ai/DeepSeek-V3-0324:
   - input_cost_per_token: 2.8e-07 → 2.5e-07
   - cache_read_input_token_cost: 2.24e-07 → None

deepinfra/mistralai/Mistral-Small-3.2-24B-Instruct-2506:
   - output_cost_per_token: 1e-07 → 2e-07
   - input_cost_per_token: 5e-08 → 7.5e-08

deepinfra/Qwen/Qwen3-235B-A22B-Thinking-2507:
   - output_cost_per_token: 6e-07 → 2.9e-06
   - input_cost_per_token: 1.3e-07 → 3e-07

deepinfra/zai-org/GLM-4.5:
   - output_cost_per_token: 2e-06 → 1.6e-06
   - input_cost_per_token: 5.5e-07 → 4e-07

deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1:
   - output_cost_per_token: 2.4e-07 → 4e-07
   - input_cost_per_token: 8e-08 → 4e-07

deepinfra/openai/gpt-oss-20b:
   - output_cost_per_token: 1.6e-07 → 1.5e-07

deepinfra/google/gemma-3-27b-it:
   - output_cost_per_token: 1.7e-07 → 1.6e-07
Copy link

vercel bot commented Sep 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
litellm Error Error Sep 26, 2025 0:13am

@krrishdholakia krrishdholakia merged commit c83a3ac into BerriAI:main Oct 4, 2025
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants