Skip to content

Conversation

stefanozampini
Copy link

This allows to create BDDC preconditioners.

For now, there's a demo on mixed-poisson. To run it, you need fixes for getLocalSubMatrix here https://gitlab.com/petsc/petsc/-/merge_requests/8668.

@garth-wells
Copy link
Member

Super!

Just today we have a branch that does CI tests via Spack for dependencies - using

petsc@git.stefanozampini/fix-dolfinx-release

in our CI setup we could test it :).

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

Comment on lines +635 to +639
auto adjlist = bc.get().function_space()->dofmap()->index_map->index_to_dest_ranks();
const auto off = adjlist.offsets();
std::vector<std::int32_t> number_of_sharing_ranks(off.size());
for (size_t i = 0; i < off.size() - 1; ++i)
number_of_sharing_ranks[i] = off[i+1] - off[i] + 1;
Copy link
Author

@stefanozampini stefanozampini Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this setup code should be moved to IndexMap
I was thinking of implementing it lazily, with a mutable class member, but I see you don't use mutable anywhere in dolfinx. Would it be something you would accept?

Copy link
Member

@jorgensd jorgensd Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stefano,
I know I suggested this function on discourse. However, as I now see the use-case, what I think we should do is to

  1. create an la::Vector on the index map.
  2. Fill vector with 1 (including ghosts)
  3. scatter reverse add
  4. Scatter forward

that would give you an array of the number of shared procs per dof

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be a vector we compute once inside the DirichletBC class. What do you think @garth-wells ?

Copy link
Author

@stefanozampini stefanozampini Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgensd Can you paste some code here to do it? I will wait for @garth-wells confirmation and move the setup of the counting vector in the IndexMap or DirichletBC class, your call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m without a computer until Sunday. @IgorBaratta could you do me a solid and make a simple example for Stefano?:)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to divide the insertion of 1 on the diagonal by the number of processes that has the dof, as MATIS cannot use insert, and has to add the values. I.e. if a DirichletBC dof is shared between N processes, we assign 1/N on each process.

I suggest we use a vector, stored in the DirichletBC to keep track of this, a illustrated with the snippet above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgensd summarized it properly. If you want to keep the same workflow for assembled and unassembled matrices, the counting vector is the only way to realize it. I think such information should belong to IndexMap instead of DirichletBC, but I'm fine either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanozampini @garth-wells I've made an attempt at this at: https://github.com/FEniCS/dolfinx/tree/dokken/suggested-matis
I'll have a go at installing PETSc main locally and run this, to see if I can get the demo to run:)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A likely good approach is consistent handling of the LHS and RHS, i.e. use accumulate rather than set for the bc on the RHS. We did this is legacy DOLFIN. I'd need to look closely at the code for how this might work with the latest design.

Copy link
Member

@jorgensd jorgensd Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be quite straightforward with the approach suggested in my branch above, as it lets the user choose the insertion mode of the diagonal (which can trivially be extended to set_bc), as the DirichletBC itself holds the information about the count.

@stefanozampini
Copy link
Author

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

@jorgensd
Copy link
Member

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

See for instance the stokes demo which illustrates how we have solved it for 4 different configurations
https://docs.fenicsproject.org/dolfinx/main/python/demos/demo_stokes.html#implementation

@garth-wells
Copy link
Member

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

Put the different solvers in different functions. Can pass in common data, e.g. mesh.

A reason we want 'more' in demos is to structure demos with more functions. We see over and over that users start from a simple, flat 20 line demo script and turn it into a monster, multi-thousand line flat script. With more functions in demos we're educating people on how to structure a program :).

template <dolfinx::scalar T, std::floating_point U>
void set_diagonal(
auto set_fn, const FunctionSpace<U>& V,
const std::vector<std::reference_wrapper<const DirichletBC<T, U>>>& bcs,
T diagonal = 1.0)
T diagonal = 1.0, bool unassembled = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks problematic - quite a few types can be cast to T and bool, very easy for a user to make an error.

Probably better to make unassembled an enum. Easier to read too.

Copy link
Author

@stefanozampini stefanozampini Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an enum for a single function seems overkill to me.
If there's a way to extract back the Mat from set_fn, we could test if it is a MATIS inside the function, and we won't need the extra optional argument. @garth-wells what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the branch I made, I use an insertion mode with add or insert to distinguish between the two, which resonates well with the scatter_fwd/scatter_reverse modes we have in the Python-interface for vectors.

Copy link
Author

@stefanozampini stefanozampini Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at your branch. I don't think it is correct for MATAIJ. set_diagonal uses (correctly) set_fn = dolfinx::la::petsc::Matrix::set_fn(A, INSERT_VALUES), and the values won't be accumulated into MATAIJ . I think it is best to use an optional boolean that clearly indicates we want to insert into an unassembled (MATIS) matrix. All these problems appear because the interface for set_diagonal only takes set_fn and not the matrix directly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, use add_values formalism everywhere in matrix assembly (above all this is FEM :-)). The optimization you get by not communicating the values to be inserted into the diagonal is minimal @garth-wells ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at your branch. I don't think it is correct for MATAIJ. set_diagonal uses (correctly) set_fn = dolfinx::la::petsc::Matrix::set_fn(A, INSERT_VALUES), and the values won't be accumulated into MATAIJ . I think it is best to use an optional boolean that clearly indicates we want to insert into an unassembled (MATIS) matrix. All these problems appear because the interface for set_diagonal only takes set_fn and not the matrix directly

Right, but if we modify the input to the set_fn to then use ADD_VALUES it would resolve the issue, right?
i.e.

        PetscBool flg;
        PetscObjectTypeCompare((PetscObject)A, MATIS, &flg);
        dolfinx::la::VectorInsertMode mode
            = flg ? dolfinx::la::VectorInsertMode::add
                  : dolfinx::la::VectorInsertMode::insert;
        InsertMode petsc_mode = flg ? ADD_VALUES : INSERT_VALUES;
        dolfinx::fem::set_diagonal(
            dolfinx::la::petsc::Matrix::set_fn(A, petsc_mode), V, _bcs,
            diagonal);
            diagonal, mode);
      },

Copy link
Member

@jorgensd jorgensd Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, I kind of agree that we should only have one approach, which preferably is the add-mode, as the approach of assuming that two independent input arguments are set correctly by the user is likely to lead to a lot of confusion.

Having only a single, accumulate data from all processes as 1/num_shared_procs seems sensible.

The reason for DOLFINx not interfacing directly with the PETSc matrix, is that it allows us to have a single code path for multiple matrix types, like our own, built in matrices, and potentially TRILLINOS, without having to change the set_diagonal, dolfinx::fem::assemble_* functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with any solution you think is reasonable. Note that if you switch back to add values, you won't need to flush assembly anymore in the fem callbacks. And you can still use the optimization of adding only to diagonal entries in AIJ format (since the value in the diagonal before the call is 0.0, so add or insert is the same)

Right, but if we modify the input to the set_fn to then use ADD_VALUES it would resolve the issue, right?
i.e

If you want equivalent behaviour, it should be the opposite way, i.e. petsc_mode = !flg ? ADD_VALUES : INSERT_VALUES. This is because MATIS inserts directly into the local matrix. matrix-vector products in MATIS are done via R^t A R, where R restricts from global to local, A does the local multiplications, and R^T sums up the local contributions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I got that the wrong way around.

@garth-wells garth-wells added enhancement New feature or request demo New demo or demo enhancements/improvements labels Aug 29, 2025
@stefanozampini
Copy link
Author

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

Put the different solvers in different functions. Can pass in common data, e.g. mesh.

Done

@jorgensd
Copy link
Member

jorgensd commented Sep 1, 2025

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist.
It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

I installed the release PETSc branch and your branch using spack.

@stefanozampini
Copy link
Author

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains
In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

@jorgensd
Copy link
Member

jorgensd commented Sep 1, 2025

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:

 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0

when using superlu_dist, as opposed to mumps as the factorization backend in parallel.
It seems to only be a problem when one specifies the local_solver_type to be superlu_dist.
Using

    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"

or

    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"

is fine.

@stefanozampini
Copy link
Author

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:

 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0

when using superlu_dist, as opposed to mumps as the factorization backend in parallel. It seems to only be a problem when one specifies the local_solver_type to be superlu_dist. Using

    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"

or

    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"

is fine.

I think in that case one of the processes has zero entries, and superlu_dist fails.

@jorgensd
Copy link
Member

jorgensd commented Sep 1, 2025

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:

 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0

when using superlu_dist, as opposed to mumps as the factorization backend in parallel. It seems to only be a problem when one specifies the local_solver_type to be superlu_dist. Using

    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"

or

    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"

is fine.

I think in that case one of the processes has zero entries, and superlu_dist fails.

This happen even if I increase the mesh to be 500x500, and use only 2 processes, where it is unlikely that one of the coarse levels should end up with zero entries? Is it because we do not have a (1, 1) diagonal block in the system?

@stefanozampini
Copy link
Author

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:

 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0

when using superlu_dist, as opposed to mumps as the factorization backend in parallel. It seems to only be a problem when one specifies the local_solver_type to be superlu_dist. Using

    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"

or

    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"

is fine.

I think in that case one of the processes has zero entries, and superlu_dist fails.

This happen even if I increase the mesh to be 500x500, and use only 2 processes, where it is unlikely that one of the coarse levels should end up with zero entries? Is it because we do not have a (1, 1) diagonal block in the system?

The issue you posted was with a local problem. run with the option -pc_bddc_check_level 1 and post the output

@jorgensd
Copy link
Member

jorgensd commented Sep 1, 2025

pc_bddc_check_level 1

Solving mixed poisson problem with monolithic solver.
Error benign extension 0.00000000000000e+00
Error scaling extension 0.00000000000000e+00
Error scaling restriction 0.00000000000000e+00
--------------------------------------------------
Subdomain 0000 infinity error for Dirichlet solve (demo_mixed_poisson_pc_bddc_dirichlet_) =  inf 
Subdomain 0001 infinity error for Dirichlet solve (demo_mixed_poisson_pc_bddc_dirichlet_) =  inf 
--------------------------------------------------
Local BDDC graph for subdomain 0000 (seq 0)
Number of vertices 625767
Number of local subdomains 1
Custom minimal size 1
Topological two dim? TRUE (set TRUE)
Total number of connected components 1
  cc 0 (size 514, fid 0, neighs: 0 1)
Local BDDC graph for subdomain 0001 (seq 0)
Number of vertices 625747
Number of local subdomains 1
Custom minimal size 1
Topological two dim? TRUE (set TRUE)
Total number of connected components 1
  cc 0 (size 514, fid 0, neighs: 0 1)
--------------------------------------------------------------
Subdomain 0000 got 00 local candidate vertices (1)
Subdomain 0000 got 01 local candidate edges    (1)
Subdomain 0000 got 00 local candidate faces    (1)
--------------------------------------------------------------
Subdomain 0001 got 00 local candidate vertices (1)
Subdomain 0001 got 01 local candidate edges    (1)
Subdomain 0001 got 00 local candidate faces    (1)
--------------------------------------------------
Subdomain 0000 local dimensions
local_size = 625767, dirichlet_size = 625253, boundary_size = 514
r_size = 625767, v_size = 0, constraints = 1, local_primal_size = 1
Subdomain 0001 local dimensions
local_size = 625747, dirichlet_size = 625233, boundary_size = 514
r_size = 625747, v_size = 0, constraints = 1, local_primal_size = 1
--------------------------------------------------
Subdomain 0000 infinity error for Neumann solve (demo_mixed_poisson_pc_bddc_neumann_) =  inf
Subdomain 0001 infinity error for Neumann solve (demo_mixed_poisson_pc_bddc_neumann_) =  inf

@stefanozampini
Copy link
Author

that is odd, I don't know what you are running. This is what I get running the demo in my branch with 2 processes

$ mpiexec -n 2 python demo_mixed-poisson.py 
....
Solving mixed poisson problem with monolithic solver.
  iteration:    0, residual: 1.997e+01
  iteration:    1, residual: 8.376e-01
  iteration:    2, residual: 6.651e-03
  iteration:    3, residual: 1.814e-04
  iteration:    4, residual: 3.579e-06
  iteration:    5, residual: 5.895e-08
KSP Object: (demo_mixed_poisson_) 2 MPI processes
  type: gmres
    restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: (demo_mixed_poisson_) 2 MPI processes
  type: bddc
    Use verbose output: 0
    Use user-defined CSR: 0
    Use local mat graph: 0
    Connectivity graph topological dimension: 2
    Corner selection: 0 (selected 0)
    Use vertices: 1 (vertex size 1)
    Use edges: 1
    Use faces: 1
    Use true near null space: 0
    Use QR for single constraints on cc: 0
    Use change of basis on local edge nodes: 0
    Use change of basis on local face nodes: 0
    User defined change of basis matrix: 0
    Has change of basis matrix: 0
    Eliminate dirichlet boundary dofs: 0
    Switch on static condensation ops around the interface preconditioner: 0
    Use exact dirichlet trick: 0
    Interface extension: DIRICHLET
    Multilevel max levels: 0
    Multilevel coarsening ratio: 8
    Use estimated eigs for coarse problem: 0
    Use deluxe scaling: 0
    Use deluxe zerorows: 0
    Use deluxe singlemat: 0
    Rebuild interface graph for Schur principal minors: 0
    Number of dofs' layers for the computation of principal minors: -1
    Use user CSR graph to compute successive layers: 0
    Adaptive constraint selection threshold (active 0, userdefined 0): 0.
    Min constraints / connected component: 0
    Max constraints / connected component: 0
    Invert exact Schur complement for adaptive selection: 0
    Symmetric computation of primal basis functions: 1
    Num. Procs. to map coarse adjacency list: 0
    Coarse eqs per proc (significant at the coarsest level): 1
    Detect disconnected: 1 (filter 0)
    Benign subspace trick: 1 (change explicit 1)
    Benign subspace trick is active: 0
    Algebraic computation of no-net-flux: 1
  ********************************** STATISTICS AT LEVEL 0 **********************************
    Global dofs sizes: all 46272 interface 96 coarse 1
    Coarsening ratios: all/coarse 46272 interface/coarse 96
    Active processes : 2
    Total subdomains : 2
    Dofs type        :	MIN	MAX	MEAN
    Interior  dofs   :	23084	23092	23088
    Interface dofs   :	96	96	96
    Primal    dofs   :	1	1	1
    Local     dofs   :	23180	23188	23184
    Local     subs   :	1	1
  --- Interior solver (rank 0)
    KSP Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
      type: cholesky
        out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: external
        factor fill ratio given 0., needed 0.
          Factored matrix follows:
            Mat Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
              type: mumps
              rows=23084, cols=23084
              package used to perform factorization: mumps
              total: nonzeros=529736, allocated nonzeros=529736
                MUMPS run parameters:
                  Use -demo_mixed_poisson_pc_bddc_dirichlet_ksp_view ::ascii_info_detail to display information for all processes
                  RINFOG(1) (global estimated flops for the elimination after analysis): 1.73252e+07
                  RINFOG(2) (global estimated flops for the assembly after factorization): 257024.
                  RINFOG(3) (global estimated flops for the elimination after factorization): 1.96096e+07
                  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
                  INFOG(3) (estimated real workspace for factors on all processors after analysis): 708752
                  INFOG(4) (estimated integer workspace for factors on all processors after analysis): 114624
                  INFOG(5) (estimated maximum front size in the complete tree): 125
                  INFOG(6) (number of nodes in the complete tree): 1462
                  INFOG(7) (ordering option effectively used after analysis): 2
                  INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
                  INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 748486
                  INFOG(10) (total integer space store the matrix factors after factorization): 117220
                  INFOG(11) (order of largest frontal matrix after factorization): 125
                  INFOG(12) (number of off-diagonal pivots): 9216
                  INFOG(13) (number of delayed pivots after factorization): 1298
                  INFOG(14) (number of memory compress after factorization): 0
                  INFOG(15) (number of steps of iterative refinement after solution): 0
                  INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 10
                  INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 10
                  INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 10
                  INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 10
                  INFOG(20) (estimated number of entries in the factors): 529736
                  INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 9
                  INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 9
                  INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
                  INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3
                  INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
                  INFOG(28) (after factorization: number of null pivots encountered): 0
                  INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 567071
                  INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 9, 9
                  INFOG(32) (after analysis: type of analysis done): 1
                  INFOG(33) (value used for ICNTL(8)): -2
                  INFOG(34) (exponent of the determinant if determinant is requested): 0
                  INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 567071
                  INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
                  INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
                  INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
                  INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
      linear system matrix = precond matrix:
      Mat Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
        type: seqaij
        rows=23084, cols=23084
        total: nonzeros=123884, allocated nonzeros=123884
        total number of mallocs used during MatSetValues calls=0
          not using I-node routines
  --- Correction solver (rank 0)
    KSP Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
      type: cholesky
        out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: external
        factor fill ratio given 0., needed 0.
          Factored matrix follows:
            Mat Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
              type: mumps
              rows=23180, cols=23180
              package used to perform factorization: mumps
              total: nonzeros=521359, allocated nonzeros=521359
                MUMPS run parameters:
                  Use -demo_mixed_poisson_pc_bddc_neumann_ksp_view ::ascii_info_detail to display information for all processes
                  RINFOG(1) (global estimated flops for the elimination after analysis): 1.63329e+07
                  RINFOG(2) (global estimated flops for the assembly after factorization): 248685.
                  RINFOG(3) (global estimated flops for the elimination after factorization): 1.81966e+07
                  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
                  INFOG(3) (estimated real workspace for factors on all processors after analysis): 698339
                  INFOG(4) (estimated integer workspace for factors on all processors after analysis): 116004
                  INFOG(5) (estimated maximum front size in the complete tree): 133
                  INFOG(6) (number of nodes in the complete tree): 1494
                  INFOG(7) (ordering option effectively used after analysis): 2
                  INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
                  INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 731773
                  INFOG(10) (total integer space store the matrix factors after factorization): 118396
                  INFOG(11) (order of largest frontal matrix after factorization): 133
                  INFOG(12) (number of off-diagonal pivots): 9216
                  INFOG(13) (number of delayed pivots after factorization): 1196
                  INFOG(14) (number of memory compress after factorization): 0
                  INFOG(15) (number of steps of iterative refinement after solution): 0
                  INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 10
                  INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 10
                  INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 10
                  INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 10
                  INFOG(20) (estimated number of entries in the factors): 521359
                  INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 9
                  INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 9
                  INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
                  INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3
                  INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
                  INFOG(28) (after factorization: number of null pivots encountered): 0
                  INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 553885
                  INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 9, 9
                  INFOG(32) (after analysis: type of analysis done): 1
                  INFOG(33) (value used for ICNTL(8)): -2
                  INFOG(34) (exponent of the determinant if determinant is requested): 0
                  INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 553885
                  INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
                  INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
                  INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
                  INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
      linear system matrix = precond matrix:
      Mat Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
        type: seqaij
        rows=23180, cols=23180
        total: nonzeros=124556, allocated nonzeros=124556
        total number of mallocs used during MatSetValues calls=0
          not using I-node routines
  --- Coarse solver
    KSP Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
      type: cholesky
        out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: external
        factor fill ratio given 0., needed 0.
          Factored matrix follows:
            Mat Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
              type: mumps
              rows=1, cols=1
              package used to perform factorization: mumps
              total: nonzeros=1, allocated nonzeros=1
                MUMPS run parameters:
                  Use -demo_mixed_poisson_pc_bddc_coarse_ksp_view ::ascii_info_detail to display information for all processes
                  RINFOG(1) (global estimated flops for the elimination after analysis): 0.
                  RINFOG(2) (global estimated flops for the assembly after factorization): 0.
                  RINFOG(3) (global estimated flops for the elimination after factorization): 0.
                  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
                  INFOG(3) (estimated real workspace for factors on all processors after analysis): 1
                  INFOG(4) (estimated integer workspace for factors on all processors after analysis): 22
                  INFOG(5) (estimated maximum front size in the complete tree): 1
                  INFOG(6) (number of nodes in the complete tree): 1
                  INFOG(7) (ordering option effectively used after analysis): 2
                  INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
                  INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 1
                  INFOG(10) (total integer space store the matrix factors after factorization): 22
                  INFOG(11) (order of largest frontal matrix after factorization): 1
                  INFOG(12) (number of off-diagonal pivots): 0
                  INFOG(13) (number of delayed pivots after factorization): 0
                  INFOG(14) (number of memory compress after factorization): 0
                  INFOG(15) (number of steps of iterative refinement after solution): 0
                  INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 0
                  INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 0
                  INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 0
                  INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 0
                  INFOG(20) (estimated number of entries in the factors): 1
                  INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 0
                  INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 0
                  INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0
                  INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
                  INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
                  INFOG(28) (after factorization: number of null pivots encountered): 0
                  INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 1
                  INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0
                  INFOG(32) (after analysis: type of analysis done): 1
                  INFOG(33) (value used for ICNTL(8)): 0
                  INFOG(34) (exponent of the determinant if determinant is requested): 0
                  INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 1
                  INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
                  INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
                  INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
                  INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
      linear system matrix = precond matrix:
      Mat Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
        type: seqaij
        rows=1, cols=1
        total: nonzeros=1, allocated nonzeros=1
        total number of mallocs used during MatSetValues calls=0
          not using I-node routines
  linear system matrix = precond matrix:
  Mat Object: (demo_mixed_poisson_A_) 2 MPI processes
    type: is
    rows=46272, cols=46272
    total: nonzeros=249120, allocated nonzeros=249696
    total number of mallocs used during MatSetValues calls=0
      has attached near null space

@jorgensd
Copy link
Member

jorgensd commented Sep 1, 2025

@stefanozampini
I would get something similar if i used mumps for the factorization of the matrices.

If I instead use superlu_dist as the backend, i get the issue I mentioned above. In the logs you provide for the running parallel code, you use mumps for the factorization.

@stefanozampini
Copy link
Author

@stefanozampini I would get something similar if i used mumps for the factorization of the matrices.

If I instead use superlu_dist as the backend, i get the issue I mentioned above. In the logs you provide for the running parallel code, you use mumps for the factorization.

This is a problem in superlu_dist, I have added the failing test to our CI and hopefully they will fix it in superlu_dist https://gitlab.com/petsc/petsc/-/merge_requests/8681

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo New demo or demo enhancements/improvements enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants