Skip to content

Sped up parsing by ~0..40%-ish #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

toughengineer
Copy link

Disclaimer:

this is a quick and dirty experiment, you will not find rigorous measurements, statistics and whatever, but the figures suggest that there is something that one may want to investigate further.

tl;dr:

first few pushbacks into a vector without calling reserve() are relatively expensive, these changes offset this cost by using arrays of temporarily stored first few parsed JSON array elements.

The gist

The type ArrayElements crudely models what some people call a "static vector", i.e. a vector-like container with capacity fixed at compile time and storage allocated inside the instance itself.

A proper "static vector" would allow to skip construction and destruction of "unused" elements so I believe it will allow to gain a little bit more speed by accomodating more elements.

With current code 4 elements seems to be the sweet spot where allocation, and construction/destruction of the elements in the array and moving them balances out.

Crude measurements

I crudely measured performance difference with the current main branch.
I used the latest MSVC and clang-cl (that comes with MSVS) because that's what I had immediately available.
I built tao-json-perf-parse_file.exe executable (in RelWithDebInfo mode, but that shouldn't make a difference) and started it with JSON files from tests directory.

The figures are the timings "per iteration" from the benchmark output in milliseconds.

MSVC clang-cl
base changed diff base changed diff
canada.json 43 39 -9% 35 31 -11%
citm_catalog.json 23 21 -9% 21 18 -14%
twitter.json 8,5 8 -6% 8 7 -13%
blns.json 0,2 0,2 0% 0,18 0,16 -11%

The measurements were quite consistent across few runs.
The parsed JSON files were on a relatively fast SSD so file reading speed should not have had a large influence.
I couldn't use clang (without "-cl") driver because of warnings that were treated as errors.

Click/tap to show
>------ Build started: Project: CMakeLists, Configuration: RelWithDebInfo ------
  [1/2] Building CXX object src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj
  FAILED: src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj 
  C:\PROGRA~1\MIB055~1\2022\COMMUN~1\VC\Tools\Llvm\x64\bin\CLANG_~1.EXE  -ID:/dev/taojson/include -ID:/dev/taojson/external/PEGTL/include --target=amd64-pc-windows-msvc -fdiagnostics-absolute-paths -O2 -DNDEBUG -g -Xclang -gcodeview -std=c++17 -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -pedantic -Wall -Wextra -Wshadow -Werror -MD -MT src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj -MF src\perf\json\CMakeFiles\tao-json-perf-parse_file.dir\parse_file.cpp.obj.d -o src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj -c D:/dev/taojson/src/perf/json/parse_file.cpp
  In file included from D:/dev/taojson/src/perf/json/parse_file.cpp:4:
  In file included from D:/dev/taojson/include\tao/json.hpp:11:
  In file included from D:/dev/taojson/include\tao/json/from_file.hpp:10:
  In file included from D:/dev/taojson/include\tao/json/events/from_file.hpp:9:
  In file included from D:/dev/taojson/include\tao/json/events/../internal/action.hpp:16:
  In file included from D:/dev/taojson/include\tao/json/internal/number_state.hpp:13:
D:\dev\taojson\include\tao\json\external\double.hpp(89,9): error G748BFC68: extension used [-Werror,-Wlanguage-extension-token]
  typedef __int64 int64_t;
          ^
D:\dev\taojson\include\tao\json\external\double.hpp(90,18): error G748BFC68: extension used [-Werror,-Wlanguage-extension-token]
  typedef unsigned __int64 uint64_t;
  
                   ^
  
  2 errors generated.
  
  ninja: build stopped: subcommand failed.

Build failed.

To be at least a little bit conservative I rounded measurements of base implementation down-ish (overestimated) and that of changed implementation up-ish (underestimated).

Conclusion

All in all I would say that you would want to investigate this idea of speedup. It seems to not hurt where it is not used, e.g when JSON does not contain (a lot of) arrays, and on the other hand even this crude implementation gives noticeable speedup.

I do not plan to develop fully fledged implementation of this idea in this library.
Please feel free to use this code as is or as a starting point.

@ColinH ColinH self-assigned this Dec 1, 2023
@toughengineer
Copy link
Author

toughengineer commented Aug 4, 2025

OK, I managed to speed up parsing by 30%-ish compared to base and then by 40%-ish compared to base.

Measuring methodology is exactly the same as in the title message.

I used the latest MSVC with clang-cl that comes with it, and re-measured the base, i.e. current main branch without my changes.

Here are the results (in milliseconds).

MSVC
base previous change diff new change diff even faster diff
canada.json 53 52 -2% 45 -15% 45 -15%
citm_catalog.json 26 24 -8% 18 -31% 17 -35%
twitter.json 9.4 8.6 -9% 6.9 -27% 5.7 -39%
blns.json 0.21 0.21 0% 0.2 -5% 0.21 0%
clang-cl
base previous change diff new change diff even faster diff
canada.json 44 41 -7% 36 -18% 35 -20%
citm_catalog.json 23 20 -13% 16 -30% 14 -39%
twitter.json 8.4 7.1 -15% 6 -29% 4.7 -44%
blns.json 0.18 0.17 -6% 0.17 -6% 0.17 -6%

"previous change" refers to sped up parsing by ~0..10%.
"new change" refers to sped up parsing by ~0..30%.
"even faster" refers to sped up parsing by ~0..40%.
All differences (in "diff" columns) are with respect to base measurements (in "base" columns).

In the "new change" I replaced the array with a bare minimum implementation of in place vector (InPlaceVector). This allowed to construct (and then destruct) elements only when necessary, which in turn allowed to have more elements which offset overhead of vector's reallocations. My measurements show that the least overhead (maximum performance) is achieved between sizes 32 and 64, where 42 lands nicely around the middle.

In the "even faster" version additionally the starting elements are assigned at the very end when all the rest of the elements are already there, which somewhat offsets overhead of copying non-trivial values on reallocations.

@toughengineer toughengineer changed the title Sped up parsing by ~0..10%-ish Sped up parsing by ~0..40%-ish Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants