Skip to content

BUG: Fix bug with tolerances #13342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BUG: Fix bug with tolerances #13342

wants to merge 1 commit into from

Conversation

larsoner
Copy link
Member

@Genuster see this failure:

https://app.circleci.com/pipelines/github/mne-tools/mne-bids-pipeline/5106/workflows/2b623eb6-112a-4f57-a71c-6962e53b76c3/jobs/82110

locally I can reproduce running pytest mne_bids_pipeline/ -k ERN (you might want to add a --download if you want to try).

This shouldn't fail the check, but it does. The values are generally around 1e11, so the atol here becomes meaningless. A rtol=1e-7 or 1e-6 is generally safer because our data are often stored in float32 so we end up operating near float32 tolerances (especially after a save-load round trip of a covariance for example).

Not 100% sure what to do about atol. A safer value for atol is zero, since in general we don't know the scale of the data. Really it should probably auto-scale based on the abs max of the data -- the current value of 1e-11 will pass for data that are tiny, e.g., if the values are all 1e-13 scale then any matrix will pass the test, which is bad. Here I've made atol=None to mean "use 1e-7 times the largest absolute value in the data". The same argument holds for eval_tol, which I've changed correspondingly.

WDYT? If you agree with these changes, can you take over? You can cherry-pick my commit or start over and open another PR with any additional changes. Or you can review and I can make changes etc.

Copy link
Contributor

@Genuster Genuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replicate it locally also. Both cov-cov.T and np.max(cov-cov.T)=0.5 don't look pretty, but given np.max(cov)~2e17 and np.min(cov)~-4e12 your atol fix makes sense to me. In general I think this kind of stuff should be fixed/verified at the level of covariance estimation, but I won't dare going down there myself.

is_pos_semidef = np.all(scipy.linalg.eigvalsh(cov) >= -eval_tol)
evals = scipy.linalg.eigvalsh(cov)
if eval_tol is None:
eval_tol = 1e-7 * np.max(np.abs(evals))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval_tol shouldn't be relative, I used it here to work around weird cov estimators that return very small near-zero negative eigenvalues that should've been zeros. Making it relative like that will pass big negative evals as zeros when maxabs is bigger than 1e7. I'd suggest returning it to 1e-15, or other very-close-to-zero value. It's the symmetricity check that break this BIDS test.

Copy link
Contributor

@Genuster Genuster Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now this eval_tol for positive semi-definiteness isn't even used anywhere as it breaks CSP tests. Here, for example the eigenvalues of some covariance from CSP's own tests:
array([-2.26552620e-11, -8.03291570e-12, 1.91338985e-11, 3.35130268e+02, 3.71512488e+02, 4.06907570e+02, 5.12625554e+02, 5.31962633e+02, 5.73633394e+02, 6.16912893e+02, 6.50660956e+02, 6.90394012e+02, 7.58521774e+02, 7.89443189e+02, 8.61412270e+02, 9.75834757e+02, 1.09661655e+03, 1.12261246e+03, 1.20060347e+03, 1.52876843e+03, 1.53530751e+03, 1.56971614e+03, 1.90257161e+03, 2.11024084e+03, 2.19654123e+03, 2.46030208e+03, 2.74133710e+03, 2.88006187e+03, 3.15945830e+03, 3.73536293e+03, 4.00254104e+03, 4.33736444e+03, 4.95140195e+03, 6.73984573e+03, 7.38063829e+03, 7.99412226e+03, 9.55654794e+03, 1.21489569e+04, 1.34064310e+04, 1.46991932e+04, 2.54027572e+04, 2.90281782e+04, 3.60708863e+04, 5.87350846e+04, 6.38125549e+04, 6.83752532e+04, 9.33762844e+04, 1.15836350e+05, 1.53455748e+05, 5.43566324e+05, 8.57232749e+05])

Not sure whether we can pass the first three as zeros...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval_tol shouldn't be relative,

I think we have to, IIRC "near zero" is relative to the largest singular value in these computations. To me those first all appear to be zero-like / in the numerical noise for the computation. Given that the magnitude of the data is ~1e3, I'd expect anything with an absolute value less than ~1e-5 to be zero-like. Just to take a concrete simple example by setting `:

+            is_cov = _is_cov_symm_pos_semidef(cov, check_pos_semidef=True)

We see:

mne/decoding/tests/test_csp.py:327: in test_regularized_csp
    X = csp.fit_transform(epochs_data, epochs.events[:, -1])
...
mne/decoding/base.py:135: in fit
    self._validate_covariances(covs)
mne/decoding/base.py:263: in _validate_covariances
    raise ValueError(
E   ValueError: One of covariances is not symmetric (or positive semidefinite), check your cov_callable

(Pdb) p np.linalg.eigvalsh(covs[0])
array([-4.63053370e-12,  7.02204761e-12,  4.77606729e-11,  3.35130268e+02,
        3.71512488e+02,  4.06907570e+02,  5.12625554e+02,  5.31962633e+02,
        5.73633394e+02,  6.16912893e+02,  6.50660956e+02,  6.90394012e+02,
        7.58521774e+02,  7.89443189e+02,  8.61412270e+02,  9.75834757e+02,
        1.09661655e+03,  1.12261246e+03,  1.20060347e+03,  1.52876843e+03,
        1.53530751e+03,  1.56971614e+03,  1.90257161e+03,  2.11024084e+03,
        2.19654123e+03,  2.46030208e+03,  2.74133710e+03,  2.88006187e+03,
        3.15945830e+03,  3.73536293e+03,  4.00254104e+03,  4.33736444e+03,
        4.95140195e+03,  6.73984573e+03,  7.38063829e+03,  7.99412226e+03,
        9.55654794e+03,  1.21489569e+04,  1.34064310e+04,  1.46991932e+04,
        2.54027572e+04,  2.90281782e+04,  3.60708863e+04,  5.87350846e+04,
        6.38125549e+04,  6.83752532e+04,  9.33762844e+04,  1.15836350e+05,
        1.53455748e+05,  5.43566324e+05,  8.57232749e+05])
(Pdb) p np.linalg.eigvalsh(covs[0] * 1e10)
array([-2.88491603e-01,  4.17431701e-02,  2.01562850e-01,  3.35130268e+12,
        3.71512488e+12,  4.06907570e+12,  5.12625554e+12,  5.31962633e+12,
        5.73633394e+12,  6.16912893e+12,  6.50660956e+12,  6.90394012e+12,
        7.58521774e+12,  7.89443189e+12,  8.61412270e+12,  9.75834757e+12,
        1.09661655e+13,  1.12261246e+13,  1.20060347e+13,  1.52876843e+13,
        1.53530751e+13,  1.56971614e+13,  1.90257161e+13,  2.11024084e+13,
        2.19654123e+13,  2.46030208e+13,  2.74133710e+13,  2.88006187e+13,
        3.15945830e+13,  3.73536293e+13,  4.00254104e+13,  4.33736444e+13,
        4.95140195e+13,  6.73984573e+13,  7.38063829e+13,  7.99412226e+13,
        9.55654794e+13,  1.21489569e+14,  1.34064310e+14,  1.46991932e+14,
        2.54027572e+14,  2.90281782e+14,  3.60708863e+14,  5.87350846e+14,
        6.38125549e+14,  6.83752532e+14,  9.33762844e+14,  1.15836350e+15,
        1.53455748e+15,  5.43566324e+15,  8.57232749e+15])

i.e., if you scale the matrix, everything scales up. But in reality these data should have three zero eigenvalues because they come from data with:

(Pdb) p epochs.info
...
 projs: PCA-v1: on, PCA-v2: on, PCA-v3: on
 sfreq: 120.1 Hz
>
(Pdb) p mne.compute_rank(epochs)
{'mag': 48}
(Pdb) len(epochs.ch_names)
51

i.e., data with 51 channels and 3 projectors, i.e., the data should have rank 48 and all computations should be "zero-like eigenvalue" aware to compute things properly. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense, thanks for explaining!
I'm pretty much ignorant about covariance estimation theory and its numerical heuristics (seeing negative evals was already a culture shock), but what you're saying is convincing about this eval_tol.

But in is_pos_cov_def check, is it also reasonable to expect that all the small positive evals will be no further than 1e7 from the largest eval? E.g., if the largest one is 1e15, and there are any evals smaller than 1e8, the covariance will not be considered positive definite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in is_pos_cov_def check, is it also reasonable to expect that all the small positive evals will be no further than 1e7 from the largest eval? E.g., if the largest one is 1e15, and there are any evals smaller than 1e8, the covariance will not be considered positive definite.

So for the case where the largest is 1e15, we can say:

  • Anything greater than 1e8 is a non-zero positive eigenvalue
  • Anything between 1e8 and -1e8 are "zero" eigenvalues (i.e., if such values exist, the matrix is not not positive definite anymore, just at best semidefinite)
  • Anything less than -1e8 is non-zero negative (i.e., if such values exist then the matrix is not positive semidefinite)

If you play around with some examples of high-dim (e.g., 52x52) matrices of different scale factors -- like the one I messed with above -- and different rank-reduced-ness, you should see that this works reasonably well.

Incidentally, this is also why when working with covariances, we do pre-scaling (or handle independently entirely) channels of different units -- MEG on the scale of ~1e-13 is in the numerical noise compared to EEG data on the scale of ~1e-6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything between 1e8 and -1e8 are "zero" eigenvalues

😭😭

Ok! Then it all looks convincing to me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Genuster do you want to take over and try adding the proper checks? I started to but actually got a failure for example in SPoC where the cov matrix being checked was clearly not even positive-semidefinite (eigenvalues went just as far negative as they did positive!). So I think it might be better for you to try adapting the code to check whenever it can...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, will open another PR today/tomorrow then!

is_pos_def = np.all(scipy.linalg.eigvalsh(cov) > eval_tol)
evals = scipy.linalg.eigvalsh(cov)
if eval_tol is None:
eval_tol = 1e-7 * np.max(np.abs(evals))
Copy link
Contributor

@Genuster Genuster Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I would put it back to 1e-15 to ensure that matrix is positive definite not because of numerical imprecision with zero eigenvalues, without making the tolerance too large when maxabs is very big

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants