Skip to content

Conversation

AlexisCompain
Copy link

@AlexisCompain AlexisCompain commented Apr 22, 2025

Tthis fixes the wconversion warning

QR-Code-generator/cpp/qrcodegen.cpp:313:79: warning: conversion from ‘int’ to ‘__gnu_cxx::__alloc_traits<std::allocator<unsigned char>, unsigned char>::value_type’ {aka ‘unsigned char’} may change value [-Wconversion]
  313 |                 dataCodewords.at(i >> 3) |= (bb.at(i) ? 1 : 0) << (7 - (i & 7));

Note: With this warning, the "-Werror" compiler option, can't be enabled

@AlexisCompain AlexisCompain force-pushed the feature/fix-conversion branch from e19268f to 7df1796 Compare April 22, 2025 09:33
@nayuki
Copy link
Owner

nayuki commented Apr 22, 2025

Oh boy, another consequence of C/C++ language stupidity. (I complained a lot in https://www.nayuki.io/page/summary-of-c-cpp-integer-rules )

While your patch addresses the most visible aspect of the problem - which is the compiler warning - I do not believe there is an adequate solution to the underlying issue.

Let's just simplify that bit of code to:

vector<uint8_t> dataCodewords;
int val = condition ? 1 : 0
dataCodewords[i] |= val;

Conceptually speaking, the |= operator works like this:

dataCodewords[i] = dataCodewords[i] | val;

The patch you wrote essentially does this:

dataCodewords[i] = dataCodewords[i] | static_cast<uint8_t>(val);

But the problem is that uint8_t is necessarily smaller than int, so both sides of the expression dataCodewords[i] | static_cast<uint8_t>(val) gets promoted back to int and the result is int, and then the result gets cast down again to uint8_t to store into dataCodewords[i].

So while your patch fixes the very visible compiler warning, it doesn't do anything about the second, implicit, silent conversion to uint8_t. Let me stress again that the calculation dataCodewords[i] |= static_cast<uint8_t>(...); is performed at int width, not uint8_t width! Because of this, I do not feel that it is an improvement on the existing code.

And yes, I am bitter about C/C++'s integer conversion rules and how they apply implicitly and how they're correct 99.9% of the time but have subtle edge cases 0.1% of the time where things blow up in your face (e.g. uint16_t * uint16_t --> int32_t, not uint16_t as you would expect, so signed overflow is possible).

@AlexisCompain
Copy link
Author

AlexisCompain commented Apr 24, 2025

Thanks for this complete answer and yes cpp conversion rules are complex. Even if you are correct for the expanded/standalone part of the code, I'm not sure that this applies for the full line.

So I did a try with compiler explorer code example the compiler effectively make a difference and properly use the correct assembly code instruction with the proposed cast. And hence reflect this by showing the compiler warning only without the cast.

See here.
image

What would be your suggestion here to fix this compiler warning ? Should we cast the result of the Or operation (before assignation) ?

@nayuki
Copy link
Owner

nayuki commented Apr 24, 2025

I usually read x86 assembly code, and this would be one of the few times I read ARM assembly.

I see the point that you're making, where the first C++ statement turns into a cluster of instructions that uses sign extension from 8 bits to whatever the register width is (presumably 64 bits), and the second statement turns into a cluster that uses & 0xFF masking to perform zero extension.

But I think the assembly output you showed is a red herring. I'm pretty sure it's compiled at -O0 optimization level, because I know that good compiled code has way fewer instructions to accomplish simple tasks like this. There is a good chance that at higher optimization, both lines of C++ code would generate the same machine code, especially as the compiler knows that both sides of the bitwise-OR operation fit in uint8_t and that the upper bits don't matter in the calculation or final storage.

The reasoning that I gave in my previous message is based on the C++ abstract machine, which is upstream of the machine-dependent code generation process. I am reasoning about how the arithmetic would work on all possible machines.

I probably have compiled the codebase with -Wconversion in past years using older versions of GCC and Clang, and maybe back then they didn't warn me about integer type conversions in this expression. I'm willing to believe that the set of warnings changed over time. But still, based on my reasoning with the C++ abstract machine, I believe that your fix is not complete, and a future compiler could complain that the |= still implicitly coerces int to uint8_t.

So I think the only truly correct fix is the rather ugly expansion:
dataCodewords.at(i >> 3) = static_cast<uint8_t>(dataCodewords.at(i >> 3) | (bb.at(i) ? 1 : 0) << (7 - (i & 7)));

@nayuki
Copy link
Owner

nayuki commented Apr 24, 2025

Okay, I did a bit more investigation and it confirms what I suspected. Pasting my qrcodegen.hpp and qrcodegen.cpp into the Godbolt Compiler Explorer, I tried a few things. I am using the flags -std=c++11 -Wall -Wextra -Wsuggest-override -Wconversion.

When compiling with essentially any version of Clang (from 5.0.0 up to the current 20.1.0), I get no warnings whatsoever. (Guess I optimized my workflow for Clang, eh?)

When compiling with GCC, from version 4.7.1 (first supporting C++11) through to 9.5, I get 3 conversion warnings:

godbolt

From GCC 10.1 to the current 14.2, I lose the bottom two warnings and only still have the top warning - which is the one you reported. So GCC has gotten smarter over time at analyzing value ranges and how operations affect ranges, and it suppresses unnecessary warnings even though implicit conversions are still happening behind our backs. In the past, GCC was unable to prove that uint8_t& ^= uint8_t; is safe, but now it knows better. But it still doesn't know that (cond ? 1 : 0) << (0 ... 7) fits in uint8_t, so it pessimistically thinks it really requires an int (at least signed 16 bits).

I am almost 100% sure that your cast is technically incorrect because like I said, the | operation will re-promote both sides to int. But your change looks correct because after GCC knows that both sides are uint8_t, it knows that promoting both to int and then performing bitwise | will still yield a result that fits in uint8_t, so the compiler shuts up because the operation is safe. But make no mistake; there is still an implicit casting from int to uint8_t; it just got shifted to a different place in the calculation, and you did nothing to address that.

Another interesting tidbit is that Intel ICC 2021.10.0 even complains about a particular conversion when assigning to a scalar variable:

image

I think the moral of the story here is that there are tons of implicit conversions happening in the codebase. I was aware of this in the back of my mind as I was writing the code, and I could humanly prove that the value ranges were correct and there would be no loss of information (or unwanted changes in values). But getting the compiler to prove these assertions is a different story, and as we see, different compilers have different capabilities at analyzing and proving things.

In the end, adding casts just says "I know what I'm doing; shut up compiler", but does not change the result of the calculations - because I already designed them to be correct in the first place.

@nayuki
Copy link
Owner

nayuki commented Apr 24, 2025

Let me emphasize one more time that when performing an operation x OP y or x OP= y, even if x and y have the same type (such as uint8_t in this conversation thread), the calculation might be performed using a different promoted type, and there might be a loss of information from a subsequent implicit conversion.

Consider this very simple program:

#include <stdint.h>
int main() {
    uint16_t x = 0xFFFF;
    uint16_t y = 0xFFFF;
    x *= y;
}

This code produces no warnings on GCC or Clang, even with the flags -Wall -Wextra -Wpedantic -Wconversion. It does not warn about anything dangerous happening at the line x *= y; in particular.

On most platforms, short is 16 bits and int is 32 bits. So uint16_t is actually mapped to unsigned short. Whenever we perform arithmetic involving unsigned short, it gets promoted to signed int because the 32-bit signed integer type can losslessly hold every 16-bit unsigned value (I don't agree with this rule; it should be promoted to a 32-bit unsigned integer, but I can't control the C/C++ standards).

Now, (int32_t)0xFFFF * (int32_t)0xFFFF overflows the int32_t type, and this is correctly caught at run time by UndefinedBehaviorSanitizer (UBSan, -fsanitize=undefined).

Even if we tweak the values to avoid overflow, say with x = 0x1000; y = 0x1000;, this produces the result (int32_t)0x01000000, and it gets silently and implicitly cast down to (uint16_t)0x0000 at assignment. The flag -Wconversion has done nothing to save us from this risky behavior.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants