Open
Description
I'm interested in how to use FasterTransformer to accelerate the LLM deployment on CoreWeave and by following this guide, I've successfully deployed an inference service with 1 GPU.
After looking more into FasterTransformer, I would like to get my inference running on Multi-GPU. So I'm wondering if another guide could be provided to address this.
Metadata
Metadata
Assignees
Labels
No labels