-
Notifications
You must be signed in to change notification settings - Fork 106
fix(web): batch findMany queries fixes #360 #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis change introduces a configurable batch size for database queries to address memory issues with large datasets, specifically as a workaround for a known Prisma bug. It adds the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant WebServer
participant RepoBatchQueries
participant Database
Client->>WebServer: Request repositories (large dataset)
WebServer->>RepoBatchQueries: batchedFindReposByIds/names(...)
loop For each batch
RepoBatchQueries->>Database: Query batch (size = dbQueryBatchSize)
Database-->>RepoBatchQueries: Return batch results
end
RepoBatchQueries-->>WebServer: Combined results
WebServer-->>Client: Aggregated repository data
Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
packages/web/src/auth.ts (1)
144-144
: Consider simplifying handler signature by removing unusedaccount
parameter
The underscore-prefixed_account
correctly suppresses unused-variable linting. For clarity, you could destructure only{ user }
sinceaccount
isn’t used in this handler.docs/docs/configuration/environment-variables.mdx (1)
23-23
: Clarify passive sentence in description.Consider adding a subject:
It can also be configured via the
dbQueryBatchSize
setting in the configuration file.
This will form a complete sentence.packages/web/src/lib/repoBatchQueries.ts (2)
60-87
: Consistent implementation with good API design.This function properly mirrors
batchedFindReposByIds
with the same secure and efficient batching approach. The consistency between the two functions provides a clean, predictable API.Both functions could potentially be refactored to reduce duplication by extracting a generic batching function, but the current approach is clear and maintainable.
36-47
: Consider error handling and execution strategy enhancements.While the current sequential execution is safe for memory management, consider these potential enhancements for future iterations:
Error resilience: Currently, if any batch fails, the entire operation fails. Consider whether partial results or retry logic would be beneficial.
Include support: The functions don't support Prisma's
include
option, limiting reusability compared to the batched approach inactions.ts
.Parallel execution option: For deployments with adequate memory, a configurable parallel execution mode could improve performance.
The current implementation is solid and appropriate for the immediate need.
Also applies to: 72-83
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docs/docs/configuration/environment-variables.mdx
(1 hunks)docs/snippets/schemas/v3/index.schema.mdx
(2 hunks)packages/backend/src/constants.ts
(1 hunks)packages/schemas/src/v3/index.schema.ts
(2 hunks)packages/schemas/src/v3/index.type.ts
(1 hunks)packages/web/src/actions.ts
(1 hunks)packages/web/src/auth.ts
(1 hunks)packages/web/src/env.mjs
(1 hunks)packages/web/src/features/search/searchApi.ts
(2 hunks)packages/web/src/lib/repoBatchQueries.ts
(1 hunks)schemas/v3/index.json
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/docs/configuration/environment-variables.mdx
[style] ~23-~23: To form a complete sentence, be sure to include a subject.
Context: ...github.com/prisma/prisma/issues/13864). Can also be configured via the `dbQueryBatc...
(MISSING_IT_THERE)
🔇 Additional comments (13)
packages/backend/src/constants.ts (1)
19-19
: Confirm default batch size aligns with schema constraints.The default
500
matches the 100–10000 range defined in the JSON schema. Looks good.packages/web/src/env.mjs (1)
22-24
: Validate environment variable default.The new
DB_QUERY_BATCH_SIZE
is correctly added with a default of500
. Ensure this is destructured inexperimental__runtimeEnv
if client access is required in the future.schemas/v3/index.json (1)
69-76
: JSON schema addition looks correct.The
dbQueryBatchSize
property is defined with the proper type, description, and constraints (100–10000, default 500). Good job.packages/web/src/features/search/searchApi.ts (2)
15-15
: LGTM: Import statement correctly brings in the batched query functions.The import properly references the new batched query utilities that implement the pagination solution for the Prisma memory issue.
202-214
: Excellent implementation of batched queries to resolve memory exhaustion.The code properly addresses the core issue by:
- Type-safe separation: Correctly filters
repoIdentifiers
intonumericIds
andstringNames
arrays- Conditional querying: Only executes database queries when the respective arrays contain data, avoiding unnecessary operations
- Memory efficiency: Replaces unbounded
findMany()
calls with batched operations that prevent the "Failed to convert rust String into napi string" error- Data consistency: Maintains the same
Map<string | number, Repo>
structure, ensuring downstream code continues to work unchangedThis implementation directly addresses Prisma issue #13864 while maintaining functional compatibility.
docs/snippets/schemas/v3/index.schema.mdx (2)
72-77
: Well-designed configuration property with appropriate constraints.The
dbQueryBatchSize
property is properly designed with:
- Sensible bounds: Minimum of 100 prevents inefficient micro-batches, maximum of 10000 prevents oversized batches
- Reasonable default: 500 provides a good balance between memory usage and query efficiency
- Clear documentation: Explicitly references Prisma issue #13864 and explains the purpose
- Proper validation: Uses JSON schema constraints to enforce valid values
The configuration will allow administrators to tune batch sizes based on their specific database and memory constraints.
193-198
: Consistent schema duplication maintains structural integrity.The property is correctly duplicated in both the
definitions.Settings
section and the mainproperties.settings
section, which is the expected pattern for this schema structure. This ensures the configuration is available in both contexts where the Settings type is referenced.packages/schemas/src/v3/index.schema.ts (2)
71-77
: Consistent schema definition with proper TypeScript typing.The
dbQueryBatchSize
property definition matches the documentation schema perfectly, ensuring consistency between:
- Runtime validation: The schema will properly validate configuration values
- Type safety: TypeScript will enforce the correct property type
- Documentation: Generated docs will accurately reflect the implementation
The constraint values (min: 100, max: 10000, default: 500) are identical to the documentation, maintaining consistency across the schema system.
192-198
: Proper schema structure with consistent property duplication.The property is correctly defined in both schema locations, maintaining the established pattern for configuration properties. This ensures the
dbQueryBatchSize
setting will be available wherever the Settings schema is used throughout the application.packages/web/src/actions.ts (2)
665-705
: LGTM! Well-implemented batched query solution.The refactoring successfully addresses the Prisma memory issue while preserving all existing functionality. The implementation correctly:
- Maintains the same filtering logic via
whereClause
- Uses standard skip/take pagination for batching
- Preserves the
include
relationships forconnections
- Accumulates results properly across batches
The approach is memory-efficient and maintains backward compatibility.
685-686
: Good use of configurable batch size.Using the environment variable
DB_QUERY_BATCH_SIZE
makes the batch size configurable as described in the PR objectives, providing flexibility for different deployment sizes.packages/web/src/lib/repoBatchQueries.ts (2)
1-16
: Excellent documentation and module setup.The comprehensive documentation clearly explains the purpose, references the specific Prisma issue, and documents configuration options. The imports and constant setup are appropriate.
24-51
: Solid implementation with good security practices.The function correctly implements batching with proper edge case handling (empty arrays) and includes essential
orgId
filtering for security. The sequential execution approach is appropriate for memory management, trading some performance for stability.The batching logic using
Math.ceil
andslice
is standard and correct.
/** | ||
* The batch size for database queries to prevent memory issues with large datasets. This is a workaround for Prisma issue #13864. Defaults to 500. | ||
*/ | ||
dbQueryBatchSize?: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid manual edits in auto-generated file.
This interface is marked as auto-generated; manual modifications will be overwritten. Confirm the schema generator or codegen configuration was updated to include dbQueryBatchSize
so that this change persists.
🤖 Prompt for AI Agents
In packages/schemas/src/v3/index.type.ts around lines 86 to 89, the
dbQueryBatchSize property was manually added to an auto-generated file, which
will be overwritten. To fix this, revert the manual change here and instead
update the schema generator or codegen configuration to include dbQueryBatchSize
so that it is generated automatically and persists across regenerations.
Fix: Implement pagination for unbounded database queries to prevent memory issues
Problem
In large installations with thousands of repositories, Sourcebot becomes unresponsive during large queries due to unbounded
prisma.repo.findMany()
calls. This causes the error:This is caused by a known Prisma issue (#13864) where loading large datasets into memory exhausts available resources.
NOTE: While, it seems that while this fix addresses the issue with a large fetch and at least Sourcebot can continue to operate (which is far better), the web container does suffer from other limitations (e.g. using the results from this now working
findMany
and runs out of heap, so there are other places where the results from this query are being used that are either overselecting or also not batching/paginating/cursor-driven work and there is a broader/deeper arch issue that may need to be fixed in the UI, but I would consider that a different/separate issue and I was able to successful work around it with a NODE_OPTIONS="--max-old-space-size=16000", but this likely will scale linearly or so with data set size, so this is something/somewhere that should be optimized.I am keeping it in draft b/c now that this is fixed there are things further up well beyond just the heap that are causing freezes, etc, too, so going to fix those to get things to a usable state and will move the PR out of draft.
Solution
This PR implements pagination for all repository database queries to prevent memory exhaustion. The implementation includes:
1. Batched Query Functions
packages/web/src/lib/repoBatchQueries.ts
with specialized batch query functions:batchedFindReposByIds()
- Fetches repos by IDs in configurable batchesbatchedFindReposByNames()
- Fetches repos by names in configurable batchesbatchedGetAllRepos()
- Fetches all repos with pagination support2. Configurable Batch Size
The batch size is now configurable through two methods:
Environment Variable:
DB_QUERY_BATCH_SIZE=1000 # Default: 500, Range: 100-10000
Configuration File (
config.json
):3. Updated Queries
findMany()
calls insearchApi.ts
with batched queriesgetRepos()
inactions.ts
to use pagination with progress tracking4. Documentation
DB_QUERY_BATCH_SIZE
to environment variables documentationdbQueryBatchSize
settingTesting
Breaking Changes
None. The changes are backward compatible with a sensible default value.
Notes
Fixes #360
Summary by CodeRabbit
dbQueryBatchSize
) with a default value of 500, adjustable between 100 and 10,000.