Multi-GPU support for FasterTransformer

I'm interested in how to use FasterTransformer to accelerate the LLM deployment on CoreWeave and by following this [guide](https://docs.coreweave.com/coreweave-machine-learning-and-ai/how-to-guides-and-tutorials/examples/triton-inference-guides/triton-inference-server-fastertransformer), I've successfully deployed an inference service with 1 GPU.

After looking more into FasterTransformer, I would like to get my inference running on Multi-GPU. So I'm wondering if another guide could be provided to address this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-GPU support for FasterTransformer #233

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-GPU support for FasterTransformer #233

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions