FP16 inference support in WebGPU

**System information**
- TensorFlow.js version (you are using): 4.20
- Are you willing to contribute it (Yes/No): Maybe :) 

**Describe the feature and the current behavior/state.**
- We are looking into using WebGPU backend for inference and see a decent improvement (~5-10%) over WebGL for our models, but it is much lower than our expectation. 
- One potential way to speed up inference would be to use fp16 instead of fp32 data type for tensors. The WebGL backend already supports fp16 which we use. WebGPU also supports fp16, atleast on Chrome desktop (https://chromestatus.com/feature/5180552617656320)
- Ideally we would like to use [F32_F16 precision](https://github.com/tensorflow/tensorflow/blob/f5b06272ca656c625e1dc8df4475637f51358284/tensorflow/lite/delegates/gpu/common/precision.h#L29) as defined in tflite to get best tradeoff between precision loss and performance.

**Will this change the current api? How?**
- An environment flag to set precision (similar to WebGL) would be ideal for ease of integration.

**Who will benefit with this feature?**
- All consumers of WebGPU backend.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FP16 inference support in WebGPU #8360

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FP16 inference support in WebGPU #8360

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions