-
Notifications
You must be signed in to change notification settings - Fork 1.7k
AVRO-4135: [C] full type name for json encoded unions #3373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Like java use the full type name when encoding the types for unions. This to prevent ambiguities when decoding this json encoded field back to avro using for example the java library.
Can you show an example of the ambiguous JSON data and the Avro schema with which was encoded? https://avro.apache.org/docs/1.12.0/specification/#json-encoding stipulates:
That specifically says "name" rather than "fullname". If the JSON encoders now should output the fullname of the type as the property name, then the specification should be changed. This likewise says "name" rather than "fullname":
Moreover, the schema resolution part of the specification says that schemas match if:
Suppose the specification were changed to allow multiple identically named record schemas as members of the same union, as long as they are in different namespaces. Then, if such a union schema is the reader's schema, then both record schemas can match the writer's schema, in which case the first matching schema must be used in schema resolution — even if the other record schema has the same namespace as in the writer's schema. That seems an undesirable result to me. |
@KalleOlaviNiemitalo the ambiguity you describe is contained in AVRO-2287. What we are seeing is that the C library goes for the name interpretation and the java library goes for the fullname description. To reproduce this with an example, I used the schema at the bottom of this message. To encode:
On the decoding side: C does not have a implementation for decoding the json encoding. However when you try with java to to read
Which shows the implementation differences for both languages. And java is clearly expecting a fully qualified path for unions. This patch proposes to line up the behavior for C with that of Java. Would you prefer the other way around? Are you willing to accept a patch on the Java side which allows reading the C encoding? Example schema:
Java code use to generate exception:
|
@KalleOlaviNiemitalo meanwhile I checked all the other language bindings on how they interpret the spec. All the language bindings which have a JSON encoder seem to fully qualify their types. It is only the C binding which deviates here. So this patch lines up this behavior with the other language bindings.
[1]: possible to configure a non standard encoding where the union value is not wrapped in an object. This variant is not decodable. Again, if you think decoders should, like the js decoder, be more lenient I do not mind to workout a patch for the Java bindings. |
Support unqualified type references for unions when decoding them from json. AVRO-2287 makes it unclear if type reference for a JSON encoded union needs to be qualified or not. Today all encoders use the fully qualified types except the C JSON encoder which uses the unqualified type. In this patch we make the java JSON Decoder more lenient and let it fallback to unqualified types names when no qualified type name matches. Which matches the behavior currently implemented in the Javascript Json decoder. This patch is an alternative for apache#3373 where it is proposed to update the C library instead.
@KalleOlaviNiemitalo in #3374 I propose to add support for non qualified type resolution in the java |
What is the purpose of the change
Like java use the full type name when encoding the types for unions.
This to prevent ambiguities when decoding this json encoded field back to avro using for example the java library.
Without this patch you cannot use the Java
JsonDecoder
to decode union with records having anamespace
entry generated by the avro C library.Verifying this change
This change added tests and can be verified by running the
test_avro_4135
Documentation