-
Notifications
You must be signed in to change notification settings - Fork 7.9k
ext/bcmath: Performance improvement bcsqrt()
#18771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
bbfe6d5
to
9cf6b03
Compare
sqrt()
bcsqrt()
bd573f4
to
b60f757
Compare
193a1a5
to
9c4af5c
Compare
70613a0
to
d7d1d80
Compare
The code is ready for review. |
It looks like |
9da96c2
to
aef9c17
Compare
done |
No longer using |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First two commits are fine, but simultaneously refactored and optimized code is too hard to follow. The commits need to be split between refactoring and actual optimization, and a high level picture must be explained in the commit descriptions.
/* Initial checks. */ | ||
if (bc_is_neg(local_num)) { | ||
if (bc_is_neg(*num)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I wonder is whether this was the right move.
One the one hand, keeping *num
in a local variable may save a pointer load, on the other hand it may get spilled in between calls anyway.
Got it. I’ll split the commits starting from the third one. |
Sure |
… are no longer used.
7555b2f
to
8c63551
Compare
The original second commit was completely unnecessary, so I deleted it. If it seems like further splitting is needed, feel free to let me know! |
@@ -100,6 +165,10 @@ bool bc_sqrt(bc_num *num, size_t scale) | |||
/* Cannot take the square root of a negative number */ | |||
return false; | |||
} | |||
|
|||
size_t num_calc_scale = (scale + 1) * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where this can overflow? The maximum value of scale is, as far as I know, ZEND_LONG_MAX. So theoretically it could overflow?
size_t leading_zeros = 0; | ||
size_t num_calc_full_len = (*num)->n_len + num_calc_scale; | ||
size_t num_use_full_len = (*num)->n_len + num_use_scale; | ||
if (num_cmp_one == BCMATH_RIGHT_GREATER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this comparison always true? The code checks early if the number is negative, 0, or 1.
@@ -115,7 +184,26 @@ bool bc_sqrt(bc_num *num, size_t scale) | |||
return true; | |||
} | |||
|
|||
bc_standard_sqrt(num, scale, num_cmp_one); | |||
/* Initialize the variables. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a better comment.
Something along the lines of "Compute the actual length of the number by disregarding the number of leading zeros."
|
||
/* Take the square root NUM and return it in NUM with SCALE digits | ||
after the decimal place. */ | ||
|
||
static inline BC_VECTOR bc_sqrt_get_pow_10(size_t exponent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 32-bit systems, this is only overflow-safe if exponent<=9.
Is this guaranteed?
char *rptr = ret->n_value; | ||
char *rend = rptr + ret_len + scale - 1; | ||
|
||
guess_vector /= BASE; /* Since the scale of guess_vector is scale + 1, reduce the scale by 1. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we have a common function to do this loop?
@@ -101,61 +102,132 @@ static inline void bc_fast_sqrt(bc_num *num, size_t scale, size_t num_calc_full_ | |||
*num = ret; | |||
} | |||
|
|||
static inline void bc_standard_sqrt(bc_num *num, size_t scale, bcmath_compare_result num_cmp_one) | |||
static inline void bc_standard_sqrt(bc_num *num, size_t scale, size_t num_calc_full_len, size_t num_use_full_len, size_t leading_zeros) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think we should open-code the actual algorithm with inline operations. Rather we should keep the algorithm using the standard calls to other bcmath functions.
It may be worth changing how some of these functions work to avoid unnecessary allocation and free operations and that may speed up the process already a bit.
Test
Approximately 80 million calculations were performed using values of various magnitudes and scales, and all results matched those from the previous implementation.
Benchmark
Performance improvements are particularly noticeable in the following cases:
For large values that do not fall into the above categories, most of the execution time is spent on the iterative process of the Newton-Raphson method, especially on division operations. As a result, while there may be some minor gains from reducing memory allocations and lowering the cost of converting BCD to BC_VECTOR, there is no significant improvement in overall performance.
Small size value (fast path)
Code:
Result:
Middle size value (< 1) (standard path)
Code:
Result:
Middle size value 1 (fast path)
The new logic ignores unnecessary scales in the calculation, so this is a fast path.
Code:
Result:
Middle size value 2 (standard path)
Code:
Result:
Big size value (standard path)
Code:
Result: