ext/bcmath: Performance improvement `bcsqrt()` #18771

SakiTakamachi · 2025-06-05T04:58:11Z

Test

Approximately 80 million calculations were performed using values of various magnitudes and scales, and all results matched those from the previous implementation.

Benchmark

Performance improvements are particularly noticeable in the following cases:

When dealing with small values (due to the use of a fast path)
When handling decimal numbers less than 1 with many leading zeros (as leading zeros are removed before computation)
When the scale of the input number (num) is much larger than the required result scale (since unnecessary fractional digits are now ignored during computation)

For large values that do not fall into the above categories, most of the execution time is spent on the iterative process of the Newton-Raphson method, especially on division operations. As a result, while there may be some minor gains from reducing memory allocations and lowering the cost of converting BCD to BC_VECTOR, there is no significant improvement in overall performance.

Small size value (fast path)

Code:

for ($i = 0; $i < 1000000; $i++) {
    bcsqrt(20, 6);
}

Result:

Benchmark 1: /php-dev2/sapi/cli/php /mount/bc/sqrt/0.php
  Time (mean ± σ):      65.9 ms ±   0.8 ms    [User: 60.4 ms, System: 3.2 ms]
  Range (min … max):    64.1 ms …  67.3 ms    46 runs
 
Benchmark 2: /master/sapi/cli/php /mount/bc/sqrt/0.php
  Time (mean ± σ):     746.1 ms ±   4.4 ms    [User: 740.2 ms, System: 3.5 ms]
  Range (min … max):   741.5 ms … 756.3 ms    10 runs
 
Summary
  '/php-dev2/sapi/cli/php /mount/bc/sqrt/0.php' ran
   11.32 ± 0.15 times faster than '/master/sapi/cli/php /mount/bc/sqrt/0.php'

Middle size value (< 1) (standard path)

Code:

for ($i = 0; $i < 500000; $i++) {
    bcsqrt('0.000045689101', 20);
}

Result:

Benchmark 1: /php-dev2/sapi/cli/php /mount/bc/sqrt/1.php
  Time (mean ± σ):     108.0 ms ±   0.3 ms    [User: 102.5 ms, System: 3.2 ms]
  Range (min … max):   107.5 ms … 108.8 ms    27 runs
 
Benchmark 2: /master/sapi/cli/php /mount/bc/sqrt/1.php
  Time (mean ± σ):      1.082 s ±  0.002 s    [User: 1.076 s, System: 0.003 s]
  Range (min … max):    1.078 s …  1.085 s    10 runs
 
Summary
  '/php-dev2/sapi/cli/php /mount/bc/sqrt/1.php' ran
   10.01 ± 0.04 times faster than '/master/sapi/cli/php /mount/bc/sqrt/1.php'

Middle size value 1 (fast path)

The new logic ignores unnecessary scales in the calculation, so this is a fast path.

Code:

for ($i = 0; $i < 700000; $i++) {
    bcsqrt('15151324141414.412312232141241', 0);
}

Result:

Benchmark 1: /php-dev2/sapi/cli/php /mount/bc/sqrt/2.php
  Time (mean ± σ):      55.0 ms ±   2.4 ms    [User: 49.6 ms, System: 3.1 ms]
  Range (min … max):    52.6 ms …  71.1 ms    53 runs

Benchmark 2: /master/sapi/cli/php /mount/bc/sqrt/2.php
  Time (mean ± σ):      1.099 s ±  0.003 s    [User: 1.092 s, System: 0.004 s]
  Range (min … max):    1.093 s …  1.102 s    10 runs
 
Summary
  '/php-dev2/sapi/cli/php /mount/bc/sqrt/2.php' ran
   19.98 ± 0.89 times faster than '/master/sapi/cli/php /mount/bc/sqrt/2.php'

Middle size value 2 (standard path)

Code:

for ($i = 0; $i < 500000; $i++) {
    bcsqrt('15151324141414.412312232141241', 30);
}

Result:

Benchmark 1: /php-dev2/sapi/cli/php /mount/bc/sqrt/3.php
  Time (mean ± σ):     204.1 ms ±   0.8 ms    [User: 198.5 ms, System: 3.3 ms]
  Range (min … max):   202.7 ms … 205.9 ms    14 runs
 
Benchmark 2: /master/sapi/cli/php /mount/bc/sqrt/3.php
  Time (mean ± σ):      1.315 s ±  0.034 s    [User: 1.308 s, System: 0.003 s]
  Range (min … max):    1.297 s …  1.411 s    10 runs

Summary
  '/php-dev2/sapi/cli/php /mount/bc/sqrt/3.php' ran
    6.44 ± 0.17 times faster than '/master/sapi/cli/php /mount/bc/sqrt/3.php'

Big size value (standard path)

Code:

for ($i = 0; $i < 100; $i++) {
    bcsqrt('15151324141414.412312232141241', 3000);
}

Result:

Benchmark 1: /php-dev2/sapi/cli/php /mount/bc/sqrt/4.php
  Time (mean ± σ):     195.7 ms ±   1.8 ms    [User: 189.7 ms, System: 3.4 ms]
  Range (min … max):   194.2 ms … 201.8 ms    15 runs

Benchmark 2: /master/sapi/cli/php /mount/bc/sqrt/4.php
  Time (mean ± σ):     280.0 ms ±   4.5 ms    [User: 274.3 ms, System: 3.2 ms]
  Range (min … max):   276.2 ms … 289.2 ms    10 runs
 
Summary
  '/php-dev2/sapi/cli/php /mount/bc/sqrt/4.php' ran
    1.43 ± 0.03 times faster than '/master/sapi/cli/php /mount/bc/sqrt/4.php'

SakiTakamachi · 2025-06-11T09:04:08Z

The code is ready for review.

SakiTakamachi · 2025-06-11T13:59:08Z

It looks like nearzero.c is no longer in use, so it can probably be removed.

SakiTakamachi · 2025-06-11T14:00:28Z

done

SakiTakamachi · 2025-06-12T01:29:51Z

No longer using bc_add_ex, bc_int2num and bc_raise_bc_exponent, so I'll remove them.

SakiTakamachi · 2025-06-12T01:33:05Z

done

nielsdos

First two commits are fine, but simultaneously refactored and optimized code is too hard to follow. The commits need to be split between refactoring and actual optimization, and a high level picture must be explained in the commit descriptions.

nielsdos · 2025-06-27T21:40:40Z

ext/bcmath/libbcmath/src/sqrt.c

 	/* Initial checks. */
-	if (bc_is_neg(local_num)) {
+	if (bc_is_neg(*num)) {


One thing I wonder is whether this was the right move.
One the one hand, keeping *num in a local variable may save a pointer load, on the other hand it may get spilled in between calls anyway.

SakiTakamachi · 2025-06-27T22:20:03Z

@nielsdos

Got it. I’ll split the commits starting from the third one.
Is it okay if I rebase the commits from the third one onward?

nielsdos · 2025-06-27T22:30:31Z

Sure

… are no longer used.

SakiTakamachi · 2025-07-10T11:51:18Z

@nielsdos

The original second commit was completely unnecessary, so I deleted it.
As a result, everything except the initial commit has been split through rebase.

If it seems like further splitting is needed, feel free to let me know!

nielsdos · 2025-07-23T13:18:50Z

ext/bcmath/libbcmath/src/sqrt.c

@@ -100,6 +165,10 @@ bool bc_sqrt(bc_num *num, size_t scale)
 		/* Cannot take the square root of a negative number */
 		return false;
 	}
+
+	size_t num_calc_scale = (scale + 1) * 2;


Are there cases where this can overflow? The maximum value of scale is, as far as I know, ZEND_LONG_MAX. So theoretically it could overflow?

nielsdos · 2025-07-23T13:19:56Z

ext/bcmath/libbcmath/src/sqrt.c

+	size_t leading_zeros = 0;
+	size_t num_calc_full_len = (*num)->n_len + num_calc_scale;
+	size_t num_use_full_len = (*num)->n_len + num_use_scale;
+	if (num_cmp_one == BCMATH_RIGHT_GREATER) {


Isn't this comparison always true? The code checks early if the number is negative, 0, or 1.

nielsdos · 2025-07-23T13:22:28Z

ext/bcmath/libbcmath/src/sqrt.c

@@ -115,7 +184,26 @@ bool bc_sqrt(bc_num *num, size_t scale)
 		return true;
 	}

-	bc_standard_sqrt(num, scale, num_cmp_one);
+	/* Initialize the variables. */


This needs a better comment.
Something along the lines of "Compute the actual length of the number by disregarding the number of leading zeros."

nielsdos · 2025-07-23T13:24:32Z

ext/bcmath/libbcmath/src/sqrt.c


 /* Take the square root NUM and return it in NUM with SCALE digits
   after the decimal place. */

+static inline BC_VECTOR bc_sqrt_get_pow_10(size_t exponent)


On 32-bit systems, this is only overflow-safe if exponent<=9.
Is this guaranteed?

nielsdos · 2025-07-23T13:27:42Z

ext/bcmath/libbcmath/src/sqrt.c

+	char *rptr = ret->n_value;
+	char *rend = rptr + ret_len + scale - 1;
+
+	guess_vector /= BASE; /* Since the scale of guess_vector is scale + 1, reduce the scale by 1. */


Didn't we have a common function to do this loop?

nielsdos · 2025-07-23T13:29:28Z

ext/bcmath/libbcmath/src/sqrt.c

@@ -101,61 +102,132 @@ static inline void bc_fast_sqrt(bc_num *num, size_t scale, size_t num_calc_full_
 	*num = ret;
 }

-static inline void bc_standard_sqrt(bc_num *num, size_t scale, bcmath_compare_result num_cmp_one)
+static inline void bc_standard_sqrt(bc_num *num, size_t scale, size_t num_calc_full_len, size_t num_use_full_len, size_t leading_zeros)


I really don't think we should open-code the actual algorithm with inline operations. Rather we should keep the algorithm using the standard calls to other bcmath functions.
It may be worth changing how some of these functions work to avoid unnecessary allocation and free operations and that may speed up the process already a bit.

github-actions bot added the Extension: bcmath label Jun 5, 2025

SakiTakamachi force-pushed the bcmath/optimize_sqrt branch 7 times, most recently from bbfe6d5 to 9cf6b03 Compare June 6, 2025 13:05

SakiTakamachi changed the title ~~[WIP] ext/bcmath: optimized sqrt()~~ [WIP] ext/bcmath: optimized bcsqrt() Jun 7, 2025

SakiTakamachi force-pushed the bcmath/optimize_sqrt branch 21 times, most recently from bd573f4 to b60f757 Compare June 9, 2025 01:55

SakiTakamachi force-pushed the bcmath/optimize_sqrt branch 3 times, most recently from 193a1a5 to 9c4af5c Compare June 11, 2025 06:40

local_num is not needed so it has been removed.

57b91d6

SakiTakamachi force-pushed the bcmath/optimize_sqrt branch 2 times, most recently from 70613a0 to d7d1d80 Compare June 11, 2025 07:55

SakiTakamachi marked this pull request as ready for review June 11, 2025 09:04

github-actions bot added the Category: Build System label Jun 11, 2025

SakiTakamachi force-pushed the bcmath/optimize_sqrt branch from 9da96c2 to aef9c17 Compare June 11, 2025 14:00

nielsdos requested changes Jun 27, 2025

View reviewed changes

SakiTakamachi added 10 commits July 10, 2025 20:27

move old logic to standard path

e1eccaa

move bc_standard_sqrt to the top

df858ae

create fast path

c1303df

Efficiently adjust the scale used for comparison with 0 and 1

b04b189

Optimized standard path calculation

b7cf057

Efficiently obtain the initial value for the standard path.

1cfa420

Omit low precision digits in calculations

4940980

Added test cases

dbf83da

removed nearzero.c

c40b249

Removed bc_add_ex, bc_int2num, and bc_raise_bc_exponent as they…

8c63551

… are no longer used.

SakiTakamachi force-pushed the bcmath/optimize_sqrt branch from 7555b2f to 8c63551 Compare July 10, 2025 11:46

nielsdos requested changes Jul 23, 2025

View reviewed changes

ext/bcmath: Performance improvement bcsqrt() #18771

Are you sure you want to change the base?

ext/bcmath: Performance improvement bcsqrt() #18771

Uh oh!

Conversation

SakiTakamachi commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test

Benchmark

Small size value (fast path)

Middle size value (< 1) (standard path)

Middle size value 1 (fast path)

Middle size value 2 (standard path)

Big size value (standard path)

Uh oh!

SakiTakamachi commented Jun 11, 2025

Uh oh!

SakiTakamachi commented Jun 11, 2025

Uh oh!

SakiTakamachi commented Jun 11, 2025

Uh oh!

SakiTakamachi commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SakiTakamachi commented Jun 12, 2025

Uh oh!

nielsdos left a comment

Choose a reason for hiding this comment

Uh oh!

nielsdos Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

SakiTakamachi commented Jun 27, 2025

Uh oh!

nielsdos commented Jun 27, 2025

Uh oh!

SakiTakamachi commented Jul 10, 2025

Uh oh!

nielsdos Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

nielsdos Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ext/bcmath: Performance improvement `bcsqrt()` #18771

ext/bcmath: Performance improvement `bcsqrt()` #18771

SakiTakamachi commented Jun 5, 2025 •

edited

Loading

SakiTakamachi commented Jun 12, 2025 •

edited

Loading