-
Notifications
You must be signed in to change notification settings - Fork 22
[Perf] Implement copy() and use while loop in serializers #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Implement copy() and use while loop in serializers #279
Conversation
ddf5534
to
7718ab5
Compare
Add checks on type information Add Scalatest matchers "haveTypeInfo" and beSerializable
7718ab5
to
bf0d998
Compare
Looks interesting. Enjoy your holidays, you deserve them! :-) I think we need think on bringing some Micro-benchmarks tests to asses all performance improvements on the spot (in CI or at least locally). I have some example to share soon. |
While I am still checking this PR, I want to share this Apache flink-benchmark with my addition for Scala 3: https://github.com/novakov-alexey/flink-benchmarks?tab=readme-ov-file#diff-with-main-repository Below are results after running all existing benchmarks and then my addition for Scala 3 in this repository: ![]() horizontal bar shows a number of operations per millisecond. |
modules/flink-common-api/src/test/scala/org/apache/flinkx/api/TestUtils.scala
Outdated
Show resolved
Hide resolved
modules/flink-common-api/src/test/scala/org/apache/flinkx/api/TestUtils.scala
Outdated
Show resolved
Hide resolved
Nice! As you may have noticed, I’ve been dealing with performance issues for several weeks. I’m in the process of migrating our most critical and most optimized application from Flink’s old Scala API to flink-scala-api, and thus from Kryo to Flinkx serializers.
In any case, perhaps we should add a benchmark to compare the performance of custom Flinkx serializers with custom Kryo custom serializers. |
I think something "custom" will be always faster than something generic as Fllinx serializers are. Feel free to bring more performance improvements. In general, I think we should always stay faster than standard Flink Row and Tuple serializers or at least be on the same level. My latest local benchmarks show that we are at this level and this is great to get to this state. |
modules/flink-common-api/src/main/scala/org/apache/flinkx/api/typeinfo/CaseClassTypeInfo.scala
Outdated
Show resolved
Hide resolved
Thank you for outstanding effort! |
Hi,
Here is a follow-up of previous PRs that were optimizing CaseClassSerializer copy and foreach. I was waiting for the Flink 1 & 2 codebase reunification to submit it:
TypeSerializer#copy(DataInputView, DataOutputView)
when more optimized than simple generic de/ser.TestUtils
:I think the checks on type informations need to be discussed. In one hand, they were useful to check our custom type informations and even identified bugs. In the other hand, some checks are tricky to implement correctly, most notably on the arity and the total fields.
The idea is to check the type information against the Class it is supposed to describe using reflection. But the check of total fields requires complex recursion on each fields. Even the definition of "total fields" given in the javadoc is hard to understand.
For now, I had to remove checks on arity and total fields because they are not correct on some case class and sealed traits:
CoproductTypeInformation
always set an arity of 1 but my check found an arity of 0 because traits don't have fields.TupleTypeInfoBase
parent ofCaseClassTypeInfo
doesn't handle the case stated in the javadoc where "The total number of fields must be at least 1."Tell me your thought about these checks, do you think it worth to fix all edge cases to make them functional?