Skip to content

Conversation

pymia
Copy link

@pymia pymia commented Sep 23, 2025

Implements SageMaker Serverless Inference endpoints as requested in issue #23148.

  • Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency
  • Extend EndpointConfig to support serverless variants alongside existing instance variants
  • Add comprehensive validation for serverless configuration parameters
  • Enforce mutual exclusivity between instance and serverless variants
  • Add CloudFormation template generation for ServerlessConfig properties
  • Include extensive test coverage for validation scenarios and error cases

Issue # 23148

Closes #23148.

Reason for this change

AWS SageMaker Serverless Inference is not supported in the CDK SageMaker L2 constructs. Users can only configure instance-based endpoints, missing the serverless option for intermittent/unpredictable traffic patterns that could benefit from cost-effective serverless inference.

This feature was explicitly planned in the original SageMaker Endpoint L2 construct RFC with Instance-prefixed classes designed to make room for Serverless-prefixed analogs.

Description of changes

Implements AWS SageMaker Serverless Inference support in CDK SageMaker L2 constructs, enabling cost-effective serverless endpoints for intermittent workloads:

  • New ServerlessProductionVariantProps interface extending ProductionVariantProps with AWS-compliant serverless properties:
    • maxConcurrency: 1-200 range (required)
    • memorySizeInMB: 1024-6144MB in 1GB increments (required)
    • provisionedConcurrency: 1-200 range, optional, must be ≤ maxConcurrency
  • New addServerlessProductionVariant() method with comprehensive input validation
  • Extended EndpointConfigProps with optional serverlessProductionVariant property
  • Mutual exclusivity enforcement between instance and serverless variants per AWS constraints
  • Single serverless variant limit per endpoint configuration (AWS limitation)
  • Comprehensive synthesis-time validation with clear, actionable error messages
  • CloudFormation integration leveraging existing L1 construct ServerlessConfig support

Usage Example:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.IModel;

// Create serverless endpoint configuration
const endpointConfig = new sagemaker.EndpointConfig(this, 'ServerlessEndpointConfig', {
  serverlessProductionVariant: {
    model: model,
    variantName: 'serverlessVariant',
    maxConcurrency: 10,
    memorySizeInMB: 2048,
    provisionedConcurrency: 5, // optional
  },
});

Describe any new or updated permissions being added

N/A - No new IAM permissions required. Leverages existing SageMaker model and endpoint permissions.

Description of how you validated changes

  • Unit tests: Added 12 comprehensive serverless variant tests covering all validation scenarios:

    • Memory size validation (1024-6144MB in 1GB increments)
    • Concurrency range validation (1-200 for both max and provisioned)
    • Mutual exclusivity enforcement between instance and serverless variants
    • Single serverless variant limit per AWS constraints
    • Cross-environment model compatibility validation
    • Error condition testing with clear error messages
    • CloudFormation template generation verification
  • Integration tests: Extended existing integration test with serverless endpoint configuration, verified CloudFormation template generation with correct ServerlessConfig properties:

    ServerlessEndpointConfig:
      Type: AWS::SageMaker::EndpointConfig
      Properties:
        ProductionVariants:
          - ServerlessConfig:
              MaxConcurrency: 10
              MemorySizeInMB: 2048
              ProvisionedConcurrency: 5
            VariantName: serverlessVariant
  • Comprehensive testing results: 63/63 unit tests pass (100% success rate), 4/4 integration tests pass, no regressions detected across 16,024+ CDK tests

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Sep 23, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team September 23, 2025 08:29
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch 2 times, most recently from aad0c97 to 78ef21c Compare September 23, 2025 13:31
@pahud pahud marked this pull request as draft September 23, 2025 14:23
@pahud pahud self-assigned this Sep 23, 2025
@pahud
Copy link
Contributor

pahud commented Sep 23, 2025

taking a look.

@pahud
Copy link
Contributor

pahud commented Sep 23, 2025

❌ Features must contain a change to a README file.
❌ Features must contain a change to an integration test file and the resulting snapshot.

As this is a new feat we need

  1. update README with very focusd and minimal description.
  2. add new intet test or refresh existing relevant integ tests and update snapshots

@aws-cdk-automation aws-cdk-automation dismissed their stale review September 23, 2025 15:05

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@pymia pymia marked this pull request as ready for review September 23, 2025 15:37
@pahud pahud marked this pull request as draft September 23, 2025 15:37
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch from 04fc444 to 5ff7875 Compare September 24, 2025 15:31
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch 3 times, most recently from 4e97045 to 2ab372b Compare September 25, 2025 14:19
Implements SageMaker Serverless Inference endpoints as requested in issue aws#23148.

- Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency
- Extend EndpointConfig to support serverless variants alongside existing instance variants
- Add comprehensive validation for serverless configuration parameters
- Enforce mutual exclusivity between instance and serverless variants
- Add CloudFormation template generation for ServerlessConfig properties
- Include extensive test coverage for validation scenarios and error cases

Closes aws#23148
…less inference

- Add comprehensive serverless inference documentation to SageMaker alpha README
- Update integration test with serverless endpoint configuration examples
- Include verification comments for both instance-based and serverless endpoints
- Generate CloudFormation snapshots with proper ServerlessConfig properties

Addresses reviewer feedback requiring README documentation and integration test coverage for the new serverless inference feature.
…ch AWS specs

- Update maxConcurrency validation range from 1-200 to 1-1000

- Update provisionedConcurrency validation range from 1-200 to 1-1000

- Fix memory size documentation from 3008MB to 3072MB in requirements

- Add comprehensive test coverage for upper bound validation

- Update TypeScript definitions and JSDoc comments

This aligns the implementation with AWS SageMaker serverless endpoint specifications and RFC 431 requirements for L2 constructs.
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch from 2ab372b to d8a868d Compare September 29, 2025 14:58
@pahud pahud removed their assignment Sep 29, 2025
@pahud pahud marked this pull request as ready for review September 29, 2025 16:29
@abidhasan-aws abidhasan-aws self-requested a review September 30, 2025 12:52
@abidhasan-aws abidhasan-aws self-assigned this Sep 30, 2025
@abidhasan-aws abidhasan-aws removed their request for review September 30, 2025 13:40
@abidhasan-aws abidhasan-aws removed their assignment Sep 30, 2025
@abidhasan-aws abidhasan-aws self-requested a review September 30, 2025 14:49
@abidhasan-aws abidhasan-aws self-assigned this Sep 30, 2025
Copy link
Contributor

@abidhasan-aws abidhasan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pymia,
Thanks for your contribution. I have left some comments :)


// Validate mutual exclusivity
if (props.instanceProductionVariants && props.serverlessProductionVariant) {
throw new Error('Cannot specify both instanceProductionVariants and serverlessProductionVariant. Choose one variant type.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find any documentation that says instanceProductVariant and serverlessProductVariant cannot be used simultaneously for a single endpoint. Could you please provide the source that refers to this restriction?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instance based deployment and serverless deployment should not exist at the same time.
Reference: Amazon SageMaker Deploy Model, and AWS::SageMaker::EndpointConfig ProductionVariant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDK has one README per service. SageMaker-alpha already has a README, so we don't need to create a new one.
We can add the documentation related to this PR in packages/@aws-cdk/aws-sagemaker-alpha/README.md.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the integration test in one file. We have another integration test file integ.endpoint-config. We can put all the necessary integration-test related code in that file and remove this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sagemaker: Support serverless variants for endpoints
4 participants