Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions ai-quick-actions/model-deployment-gptoss-tips.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Model Deployment - GPT-OSS

OpenAI has announced the release of [two open weight models](https://openai.com/index/introducing-gpt-oss/), gpt-oss-120b and gpt-oss-20b, their first open-weight language models since GPT‑2. According to OpenAI, their performance are on par or exceed OpenAI's internal models, and both models perform strongly on tool use, few-shot function calling, CoT reasoning and HealthBench.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please link to the page which claims their performance are on par or exceed OpenAI's internal models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page that claims the performance are on par or exceed OpenAI's internal models is https://openai.com/index/introducing-gpt-oss/. The same one as the link for "two open weight models".


Here are the new OpenAI open weight models:

* gpt-oss-120b — designed for production, general-purpose and high-reasoning use cases. The model has 117B parameters with 5.1B active parameters
* gpt-oss-20b — designed for lower latency and local or specialized use cases. The model has 21B parameters with 3.6B active parameters

Both models are now available in OCI Data Science AI Quick Actions. The models are cached in our service and readily available to be deployed and fine tuned, without the need for customers to bring in the model artifacts from external sites. By using AI Quick Actions, customers can leverage our service managed container with the latest vllm version that supports both of the models, eliminating the need to build or bring your own container for working with the models.

![Deploy Model](web_assets/openai_modelcard.png)

![GPT-OSS-20b Model card](web_assets/model-deploy-gptoss.png)


## Deploying an LLM

After picking a model from the model explorer, if the "Deploy Model" is enabled you can use this
form to quickly deploy the model:

![Deploy Model](web_assets/model-deploy-gptoss-2.png)

## Setting Environment Variable
There are multiple shapes which support the deployment of the model. But if you are using shape other than H100 or H200 you are required to pass an extra environment variable while deploying the model. You can do so by:
* go to show advanced section at the bottom of the form
* go to Custom Environment Variables
* Put 'VLLM_ATTENTION_BACKEND' as key
* Put 'TRITON_ATTN_VLLM_V1' as value

![Set Environment Variable](web_assets/model-deploy-gptoss-env-var.png)

Now you can deploy the model in shapes other than H100 or H200

Also to know more on model deployments you can refer [Model Deployment Tips](model-deployment-tips.md) page.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ai-quick-actions/web_assets/openai_modelcard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.