Sped up parsing by ~0..40%-ish #149

toughengineer · 2023-11-30T16:53:48Z

Disclaimer:

this is a quick and dirty experiment, you will not find rigorous measurements, statistics and whatever, but the figures suggest that there is something that one may want to investigate further.

tl;dr:

first few pushbacks into a vector without calling reserve() are relatively expensive, these changes offset this cost by using arrays of temporarily stored first few parsed JSON array elements.

The gist

The type ArrayElements crudely models what some people call a "static vector", i.e. a vector-like container with capacity fixed at compile time and storage allocated inside the instance itself.

A proper "static vector" would allow to skip construction and destruction of "unused" elements so I believe it will allow to gain a little bit more speed by accomodating more elements.

With current code 4 elements seems to be the sweet spot where allocation, and construction/destruction of the elements in the array and moving them balances out.

Crude measurements

I crudely measured performance difference with the current main branch.
I used the latest MSVC and clang-cl (that comes with MSVS) because that's what I had immediately available.
I built tao-json-perf-parse_file.exe executable (in RelWithDebInfo mode, but that shouldn't make a difference) and started it with JSON files from tests directory.

The figures are the timings "per iteration" from the benchmark output in milliseconds.

	MSVC			clang-cl
	base	changed	diff	base	changed	diff
canada.json	43	39	-9%	35	31	-11%
citm_catalog.json	23	21	-9%	21	18	-14%
twitter.json	8,5	8	-6%	8	7	-13%
blns.json	0,2	0,2	0%	0,18	0,16	-11%

The measurements were quite consistent across few runs.
The parsed JSON files were on a relatively fast SSD so file reading speed should not have had a large influence.
I couldn't use clang (without "-cl") driver because of warnings that were treated as errors.

Click/tap to show

>------ Build started: Project: CMakeLists, Configuration: RelWithDebInfo ------
  [1/2] Building CXX object src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj
  FAILED: src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj 
  C:\PROGRA~1\MIB055~1\2022\COMMUN~1\VC\Tools\Llvm\x64\bin\CLANG_~1.EXE  -ID:/dev/taojson/include -ID:/dev/taojson/external/PEGTL/include --target=amd64-pc-windows-msvc -fdiagnostics-absolute-paths -O2 -DNDEBUG -g -Xclang -gcodeview -std=c++17 -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -pedantic -Wall -Wextra -Wshadow -Werror -MD -MT src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj -MF src\perf\json\CMakeFiles\tao-json-perf-parse_file.dir\parse_file.cpp.obj.d -o src/perf/json/CMakeFiles/tao-json-perf-parse_file.dir/parse_file.cpp.obj -c D:/dev/taojson/src/perf/json/parse_file.cpp
  In file included from D:/dev/taojson/src/perf/json/parse_file.cpp:4:
  In file included from D:/dev/taojson/include\tao/json.hpp:11:
  In file included from D:/dev/taojson/include\tao/json/from_file.hpp:10:
  In file included from D:/dev/taojson/include\tao/json/events/from_file.hpp:9:
  In file included from D:/dev/taojson/include\tao/json/events/../internal/action.hpp:16:
  In file included from D:/dev/taojson/include\tao/json/internal/number_state.hpp:13:
D:\dev\taojson\include\tao\json\external\double.hpp(89,9): error G748BFC68: extension used [-Werror,-Wlanguage-extension-token]
  typedef __int64 int64_t;
          ^
D:\dev\taojson\include\tao\json\external\double.hpp(90,18): error G748BFC68: extension used [-Werror,-Wlanguage-extension-token]
  typedef unsigned __int64 uint64_t;
  
                   ^
  
  2 errors generated.
  
  ninja: build stopped: subcommand failed.

Build failed.

To be at least a little bit conservative I rounded measurements of base implementation down-ish (overestimated) and that of changed implementation up-ish (underestimated).

Conclusion

All in all I would say that you would want to investigate this idea of speedup. It seems to not hurt where it is not used, e.g when JSON does not contain (a lot of) arrays, and on the other hand even this crude implementation gives noticeable speedup.

I do not plan to develop fully fledged implementation of this idea in this library.
Please feel free to use this code as is or as a starting point.

toughengineer · 2025-08-04T13:30:00Z

OK, I managed to speed up parsing by 30%-ish compared to base and then by 40%-ish compared to base.

Measuring methodology is exactly the same as in the title message.

I used the latest MSVC with clang-cl that comes with it, and re-measured the base, i.e. current main branch without my changes.

Here are the results (in milliseconds).

	MSVC
	base	previous change	diff	new change	diff	even faster	diff
canada.json	53	52	-2%	45	-15%	45	-15%
citm_catalog.json	26	24	-8%	18	-31%	17	-35%
twitter.json	9.4	8.6	-9%	6.9	-27%	5.7	-39%
blns.json	0.21	0.21	0%	0.2	-5%	0.21	0%

	clang-cl
	base	previous change	diff	new change	diff	even faster	diff
canada.json	44	41	-7%	36	-18%	35	-20%
citm_catalog.json	23	20	-13%	16	-30%	14	-39%
twitter.json	8.4	7.1	-15%	6	-29%	4.7	-44%
blns.json	0.18	0.17	-6%	0.17	-6%	0.17	-6%

"previous change" refers to sped up parsing by ~0..10%.
"new change" refers to sped up parsing by ~0..30%.
"even faster" refers to sped up parsing by ~0..40%.
All differences (in "diff" columns) are with respect to base measurements (in "base" columns).

In the "new change" I replaced the array with a bare minimum implementation of in place vector (InPlaceVector). This allowed to construct (and then destruct) elements only when necessary, which in turn allowed to have more elements which offset overhead of vector's reallocations. My measurements show that the least overhead (maximum performance) is achieved between sizes 32 and 64, where 42 lands nicely around the middle.

In the "even faster" version additionally the starting elements are assigned at the very end when all the rest of the elements are already there, which somewhat offsets overhead of copying non-trivial values on reallocations.

ColinH self-assigned this Dec 1, 2023

toughengineer added 3 commits August 4, 2025 13:25

sped up parsing by ~0..10%

b8a9d7f

sped up parsing by ~0..30%

0fffbc3

sped up parsing by ~0..40%

784a467

toughengineer force-pushed the parsing_speedup branch from faaa07e to 784a467 Compare August 4, 2025 12:42

toughengineer changed the title ~~Sped up parsing by ~0..10%-ish~~ Sped up parsing by ~0..40%-ish Aug 4, 2025

ensured correct work when begin_array(const std::size_t) is also used

2e81c9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sped up parsing by ~0..40%-ish #149

Sped up parsing by ~0..40%-ish #149

Uh oh!

toughengineer commented Nov 30, 2023

Uh oh!

toughengineer commented Aug 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Sped up parsing by ~0..40%-ish #149

Are you sure you want to change the base?

Sped up parsing by ~0..40%-ish #149

Uh oh!

Conversation

toughengineer commented Nov 30, 2023

Disclaimer:

tl;dr:

The gist

Crude measurements

Conclusion

Uh oh!

toughengineer commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

toughengineer commented Aug 4, 2025 •

edited

Loading