Skip to content

Conversation

tuhinp-amd
Copy link

@tuhinp-amd tuhinp-amd commented Sep 18, 2025

The conv pattern has 3 input DQ nodes (input, weight, bias)
If the bias is int8, compute the int32 quantized bias.
Create a new int32 onnx constant with the computed bias
Create a onnx.QLinearConv node
Assumption: The bias is a Constant


// Case 1: Bias is already int32 -----------------------
if (biasQType.getElementType().isInteger(32)) {
biasInt32Val = biasQ;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you ensure in this case that the bias is quantized with scale = x_scale * w_scale and zero_point = 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the bias formula is bias(int) = bias(float)/(x_scale*w_scale)
So if the bias is already int32, we can assume the bias is already quantized in the above way. So we can pass the value without any recomputation.

Comment on lines +25 to +45
static bool extractScalarFloatFromConst(mlir::Value v, float &out) {
auto def = v.getDefiningOp<ONNXConstantOp>();
if (!def)
return false;

mlir::Attribute raw;
if (def.getValue().has_value())
raw = *def.getValue();
else
raw = def.getValueAttr();

if (auto elts = raw.dyn_cast<mlir::ElementsAttr>()) {
for (auto apf : elts.getValues<llvm::APFloat>()) {
out = apf.convertToFloat();
return true;
}
return false;
}

return false;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite similiar to getScalarValue in OpHelper.hpp

Comment on lines 174 to 200
Value biasFloatValue = qBiasOp.getX();
auto biasFloatDefOp = biasFloatValue.getDefiningOp();
if (!biasFloatDefOp)
return failure();

auto constBiasOp = dyn_cast<ONNXConstantOp>(biasFloatDefOp);
if (!constBiasOp)
return failure();

// Try to get the ElementsAttr
auto denseBiasF = extractDenseFloatFromConst(constBiasOp, biasFloatValue);

if (!denseBiasF)
return failure();
float xScaleS = 0.0f;
if (!extractScalarFloatFromConst(xScale, xScaleS))
return failure();

float wScaleS = 0.0f;
if (!extractScalarFloatFromConst(wScale, wScaleS))
return failure();

auto biasI32Attrs =
createBiasI32Attrs(denseBiasF, xScaleS, wScaleS, rewriter);
if (biasI32Attrs.empty()) {
return failure();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this calculation for the new bias is correct.
You are ignoring the existing scale/zp of the bias q/dq ,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK bias is not an independently quantized tensor. The int32 bias is computed as bias(int) = bias(float)/(x_scale*w_scale). We use the scale of input and weight to quantize the bias.
But I will check on this.

Copy link

@jorickert jorickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Second review

@tuhinp-amd tuhinp-amd requested review from ehsan-toosi and ljfitz and removed request for ehsan-toosi September 22, 2025 16:38
return failure();
}

Value qInput = dqInputOp.getX();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any checks that the attributes in e.g. https://onnx.ai/onnx/operators/onnx__QuantizeLinear.html#attributes appropriate for QLinearConv.

@@ -0,0 +1,327 @@
//===- ConvToQLinearConvPass.cpp ---------------------------------*- C++
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a place for recompositions: see RecomposeQLinearMatMulFromQuantizeLinearPattern.

I see option-handling of RecomposeONNXToONNXPass needs to be extended to control which recompositions are run - see getRecomposeONNXToONNXPatterns.

@ljfitz
Copy link
Collaborator

ljfitz commented Sep 22, 2025

Assumption: The bias is a Constant

Note it doesn't make sense to discuss assumptions in rewrite algorithms: if your intent is to say the rewrite will not be performed when the weights are non-constant then that is what you should say. Please avoid leaving easy-to-support cases unsupported though: such restrictions tend to cause confusion later (why didn't the rewrite trigger ... debug cycle ... oh, why didn't the original author support that?) and makes upstreaming more difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants