-
Notifications
You must be signed in to change notification settings - Fork 4.3k
feat(sagemaker): add support for serverless inference endpoints #35557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This review is outdated)
aad0c97
to
78ef21c
Compare
taking a look. |
❌ Features must contain a change to a README file. As this is a new feat we need
|
✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.
04fc444
to
5ff7875
Compare
4e97045
to
2ab372b
Compare
Implements SageMaker Serverless Inference endpoints as requested in issue aws#23148. - Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency - Extend EndpointConfig to support serverless variants alongside existing instance variants - Add comprehensive validation for serverless configuration parameters - Enforce mutual exclusivity between instance and serverless variants - Add CloudFormation template generation for ServerlessConfig properties - Include extensive test coverage for validation scenarios and error cases Closes aws#23148
…less inference - Add comprehensive serverless inference documentation to SageMaker alpha README - Update integration test with serverless endpoint configuration examples - Include verification comments for both instance-based and serverless endpoints - Generate CloudFormation snapshots with proper ServerlessConfig properties Addresses reviewer feedback requiring README documentation and integration test coverage for the new serverless inference feature.
…ch AWS specs - Update maxConcurrency validation range from 1-200 to 1-1000 - Update provisionedConcurrency validation range from 1-200 to 1-1000 - Fix memory size documentation from 3008MB to 3072MB in requirements - Add comprehensive test coverage for upper bound validation - Update TypeScript definitions and JSDoc comments This aligns the implementation with AWS SageMaker serverless endpoint specifications and RFC 431 requirements for L2 constructs.
2ab372b
to
d8a868d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pymia,
Thanks for your contribution. I have left some comments :)
|
||
// Validate mutual exclusivity | ||
if (props.instanceProductionVariants && props.serverlessProductionVariant) { | ||
throw new Error('Cannot specify both instanceProductionVariants and serverlessProductionVariant. Choose one variant type.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find any documentation that says instanceProductVariant
and serverlessProductVariant
cannot be used simultaneously for a single endpoint. Could you please provide the source that refers to this restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance based deployment and serverless deployment should not exist at the same time.
Reference: Amazon SageMaker Deploy Model, and AWS::SageMaker::EndpointConfig ProductionVariant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDK has one README per service. SageMaker-alpha
already has a README, so we don't need to create a new one.
We can add the documentation related to this PR in packages/@aws-cdk/aws-sagemaker-alpha/README.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the integration test in one file. We have another integration test file integ.endpoint-config
. We can put all the necessary integration-test related code in that file and remove this one.
Implements SageMaker Serverless Inference endpoints as requested in issue #23148.
Issue # 23148
Closes #23148.
Reason for this change
AWS SageMaker Serverless Inference is not supported in the CDK SageMaker L2 constructs. Users can only configure instance-based endpoints, missing the serverless option for intermittent/unpredictable traffic patterns that could benefit from cost-effective serverless inference.
This feature was explicitly planned in the original SageMaker Endpoint L2 construct RFC with Instance-prefixed classes designed to make room for Serverless-prefixed analogs.
Description of changes
Implements AWS SageMaker Serverless Inference support in CDK SageMaker L2 constructs, enabling cost-effective serverless endpoints for intermittent workloads:
ServerlessProductionVariantProps
interface extendingProductionVariantProps
with AWS-compliant serverless properties:maxConcurrency
: 1-200 range (required)memorySizeInMB
: 1024-6144MB in 1GB increments (required)provisionedConcurrency
: 1-200 range, optional, must be ≤ maxConcurrencyaddServerlessProductionVariant()
method with comprehensive input validationEndpointConfigProps
with optionalserverlessProductionVariant
propertyServerlessConfig
supportUsage Example:
Describe any new or updated permissions being added
N/A - No new IAM permissions required. Leverages existing SageMaker model and endpoint permissions.
Description of how you validated changes
Unit tests: Added 12 comprehensive serverless variant tests covering all validation scenarios:
Integration tests: Extended existing integration test with serverless endpoint configuration, verified CloudFormation template generation with correct
ServerlessConfig
properties:Comprehensive testing results: 63/63 unit tests pass (100% success rate), 4/4 integration tests pass, no regressions detected across 16,024+ CDK tests
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license