You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(sagemaker): update serverless endpoint concurrency limits to match AWS specs
- Update maxConcurrency validation range from 1-200 to 1-1000
- Update provisionedConcurrency validation range from 1-200 to 1-1000
- Fix memory size documentation from 3008MB to 3072MB in requirements
- Add comprehensive test coverage for upper bound validation
- Update TypeScript definitions and JSDoc comments
This aligns the implementation with AWS SageMaker serverless endpoint specifications and RFC 431 requirements for L2 constructs.
Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.
4
+
5
+
To create a serverless endpoint configuration, use the `serverlessProductionVariant` property:
Serverless inference is ideal for workloads with intermittent or unpredictable traffic patterns. You can configure:
24
+
25
+
-`maxConcurrency`: Maximum concurrent invocations (1-200)
26
+
-`memorySizeInMB`: Memory allocation in 1GB increments (1024, 2048, 3072, 4096, 5120, or 6144 MB)
27
+
-`provisionedConcurrency`: Optional pre-warmed capacity to reduce cold starts
28
+
29
+
**Note**: Provisioned concurrency incurs charges even when the endpoint is not processing requests. Use it only when you need to minimize cold start latency.
30
+
31
+
You cannot mix serverless and instance-based variants in the same endpoint configuration.
0 commit comments