Skip to content

Conversation

AlexsanderHamir
Copy link
Collaborator

@AlexsanderHamir AlexsanderHamir commented Sep 25, 2025

Title

Pre-load Users

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature

Changes

If a user already has a database and starts or restarts the proxy server, they can now choose to load the most recent users into memory.

Observations

This feature has space to be expended upon, maybe loading the most active users instead of the most recent

Performance Improvements

  • When this configuration is enabled, it prevents the initial latency spikes seen in load balancing tests, where all concurrent users would otherwise hit the database directly.
  • It was observed that latency is way more stable, and recovers quickly from spikes.

With Cache Warmup

cache_warmup copy

Without Cache Warmup

no_warmup copy

Copy link

vercel bot commented Sep 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
litellm Error Error Sep 26, 2025 6:25pm

Copy link
Contributor

@ishaan-jaff ishaan-jaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add details on perf improvement this PR showed on testing @AlexsanderHamir ?

response = await self.db.litellm_endusertable.find_many(
where={"budget_id": {"in": budget_id_list}}
where={"budget_id": {"in": budget_id_list}},
order={"litellm_budget_table": {"created_at": "desc"}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a mistake, thank you for catching it.

)

### PRELOAD USERS INTO CACHE ###
ProxyStartupEvent._start_user_preload_background_task(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be an asyncio.create_task, so it does not block startup

Copy link
Contributor

@ishaan-jaff ishaan-jaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed


### PRELOAD USERS INTO CACHE ###
if prisma_client is not None and general_settings is not None:
preload_limit = general_settings.get("preload_users_limit", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexsanderHamir we want this to run by default. the current code requires the user to opt into this by setting it on general_settings

Copy link
Contributor

@ishaan-jaff ishaan-jaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed

description="[DEPRECATED] Use 'user_header_mappings' instead. When set, the header value is treated as the end user id unless overridden by user_header_mappings.",
)
user_header_mappings: Optional[List[UserHeaderMapping]] = None
preload_users_limit: Optional[int] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this

MAX_SIZE_PER_ITEM_IN_MEMORY_CACHE_IN_KB = int(
os.getenv("MAX_SIZE_PER_ITEM_IN_MEMORY_CACHE_IN_KB", 1024)
) # 1MB = 1024KB
_DEFAULT_CACHE_WARMUP_USERS = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it DEFAULT_CACHE_WARMUP_USERS and allow it to be overrideable using env vars

if prisma_client is not None:
default_preload_limit = _DEFAULT_CACHE_WARMUP_USERS
preload_limit = (
general_settings.get("preload_users_limit", default_preload_limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for general settings just use the constant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants