-
Notifications
You must be signed in to change notification settings - Fork 11
Convert qdq conv pattern with bias to QLinearConv #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/onnx-to-tosa
Are you sure you want to change the base?
Conversation
|
||
// Case 1: Bias is already int32 ----------------------- | ||
if (biasQType.getElementType().isInteger(32)) { | ||
biasInt32Val = biasQ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you ensure in this case that the bias is quantized with scale = x_scale * w_scale and zero_point = 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the bias formula is bias(int) = bias(float)/(x_scale*w_scale)
So if the bias is already int32, we can assume the bias is already quantized in the above way. So we can pass the value without any recomputation.
static bool extractScalarFloatFromConst(mlir::Value v, float &out) { | ||
auto def = v.getDefiningOp<ONNXConstantOp>(); | ||
if (!def) | ||
return false; | ||
|
||
mlir::Attribute raw; | ||
if (def.getValue().has_value()) | ||
raw = *def.getValue(); | ||
else | ||
raw = def.getValueAttr(); | ||
|
||
if (auto elts = raw.dyn_cast<mlir::ElementsAttr>()) { | ||
for (auto apf : elts.getValues<llvm::APFloat>()) { | ||
out = apf.convertToFloat(); | ||
return true; | ||
} | ||
return false; | ||
} | ||
|
||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite similiar to getScalarValue
in OpHelper.hpp
Value biasFloatValue = qBiasOp.getX(); | ||
auto biasFloatDefOp = biasFloatValue.getDefiningOp(); | ||
if (!biasFloatDefOp) | ||
return failure(); | ||
|
||
auto constBiasOp = dyn_cast<ONNXConstantOp>(biasFloatDefOp); | ||
if (!constBiasOp) | ||
return failure(); | ||
|
||
// Try to get the ElementsAttr | ||
auto denseBiasF = extractDenseFloatFromConst(constBiasOp, biasFloatValue); | ||
|
||
if (!denseBiasF) | ||
return failure(); | ||
float xScaleS = 0.0f; | ||
if (!extractScalarFloatFromConst(xScale, xScaleS)) | ||
return failure(); | ||
|
||
float wScaleS = 0.0f; | ||
if (!extractScalarFloatFromConst(wScale, wScaleS)) | ||
return failure(); | ||
|
||
auto biasI32Attrs = | ||
createBiasI32Attrs(denseBiasF, xScaleS, wScaleS, rewriter); | ||
if (biasI32Attrs.empty()) { | ||
return failure(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this calculation for the new bias is correct.
You are ignoring the existing scale/zp of the bias q/dq ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK bias is not an independently quantized tensor. The int32 bias is computed as bias(int) = bias(float)/(x_scale*w_scale). We use the scale of input and weight to quantize the bias.
But I will check on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Second review
return failure(); | ||
} | ||
|
||
Value qInput = dqInputOp.getX(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any checks that the attributes in e.g. https://onnx.ai/onnx/operators/onnx__QuantizeLinear.html#attributes appropriate for QLinearConv
.
@@ -0,0 +1,327 @@ | |||
//===- ConvToQLinearConvPass.cpp ---------------------------------*- C++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a place for recompositions: see RecomposeQLinearMatMulFromQuantizeLinearPattern.
I see option-handling of RecomposeONNXToONNXPass needs to be extended to control which recompositions are run - see getRecomposeONNXToONNXPatterns.
Note it doesn't make sense to discuss assumptions in rewrite algorithms: if your intent is to say the rewrite will not be performed when the weights are non-constant then that is what you should say. Please avoid leaving easy-to-support cases unsupported though: such restrictions tend to cause confusion later (why didn't the rewrite trigger ... debug cycle ... oh, why didn't the original author support that?) and makes upstreaming more difficult. |
The conv pattern has 3 input DQ nodes (input, weight, bias)
If the bias is int8, compute the int32 quantized bias.
Create a new int32 onnx constant with the computed bias
Create a onnx.QLinearConv node
Assumption: The bias is a Constant