Initial support for PETSc MATIS matrix format #3885

stefanozampini · 2025-08-28T16:02:10Z

This allows to create BDDC preconditioners.

For now, there's a demo on mixed-poisson. To run it, you need fixes for getLocalSubMatrix here https://gitlab.com/petsc/petsc/-/merge_requests/8668.

garth-wells · 2025-08-28T16:51:12Z

Super!

Just today we have a branch that does CI tests via Spack for dependencies - using

petsc@git.stefanozampini/fix-dolfinx-release

in our CI setup we could test it :).

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

stefanozampini · 2025-08-28T17:50:16Z

cpp/dolfinx/fem/assembler.h

+        auto adjlist = bc.get().function_space()->dofmap()->index_map->index_to_dest_ranks();
+        const auto off = adjlist.offsets();
+        std::vector<std::int32_t> number_of_sharing_ranks(off.size());
+        for (size_t i = 0; i < off.size() - 1; ++i)
+            number_of_sharing_ranks[i] = off[i+1] - off[i] + 1;


Ideally, this setup code should be moved to IndexMap
I was thinking of implementing it lazily, with a mutable class member, but I see you don't use mutable anywhere in dolfinx. Would it be something you would accept?

Hi Stefano,
I know I suggested this function on discourse. However, as I now see the use-case, what I think we should do is to

create an la::Vector on the index map.

Fill vector with 1 (including ghosts)

scatter reverse add

Scatter forward

that would give you an array of the number of shared procs per dof

This could probably be a vector we compute once inside the DirichletBC class. What do you think @garth-wells ?

@jorgensd Can you paste some code here to do it? I will wait for @garth-wells confirmation and move the setup of the counting vector in the IndexMap or DirichletBC class, your call

I’m without a computer until Sunday. @IgorBaratta could you do me a solid and make a simple example for Stefano?:)

We need a way to divide the insertion of 1 on the diagonal by the number of processes that has the dof, as MATIS cannot use insert, and has to add the values. I.e. if a DirichletBC dof is shared between N processes, we assign 1/N on each process.

I suggest we use a vector, stored in the DirichletBC to keep track of this, a illustrated with the snippet above.

@jorgensd summarized it properly. If you want to keep the same workflow for assembled and unassembled matrices, the counting vector is the only way to realize it. I think such information should belong to IndexMap instead of DirichletBC, but I'm fine either way.

@stefanozampini @garth-wells I've made an attempt at this at: https://github.com/FEniCS/dolfinx/tree/dokken/suggested-matis
I'll have a go at installing PETSc main locally and run this, to see if I can get the demo to run:)

A likely good approach is consistent handling of the LHS and RHS, i.e. use accumulate rather than set for the bc on the RHS. We did this is legacy DOLFIN. I'd need to look closely at the code for how this might work with the latest design.

I think it should be quite straightforward with the approach suggested in my branch above, as it lets the user choose the insertion mode of the diagonal (which can trivially be extended to set_bc), as the DirichletBC itself holds the information about the count.

stefanozampini · 2025-08-28T19:30:13Z

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

jorgensd · 2025-08-28T20:22:59Z

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

See for instance the stokes demo which illustrates how we have solved it for 4 different configurations
https://docs.fenicsproject.org/dolfinx/main/python/demos/demo_stokes.html#implementation

garth-wells · 2025-08-28T20:33:11Z

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

Put the different solvers in different functions. Can pass in common data, e.g. mesh.

A reason we want 'more' in demos is to structure demos with more functions. We see over and over that users start from a simple, flat 20 line demo script and turn it into a monster, multi-thousand line flat script. With more functions in demos we're educating people on how to structure a program :).

python/demo/demo_mixed-poisson-bddc.py

garth-wells · 2025-08-29T16:58:44Z

cpp/dolfinx/fem/assembler.h

 template <dolfinx::scalar T, std::floating_point U>
 void set_diagonal(
    auto set_fn, const FunctionSpace<U>& V,
    const std::vector<std::reference_wrapper<const DirichletBC<T, U>>>& bcs,
-    T diagonal = 1.0)
+    T diagonal = 1.0, bool unassembled = false)


This looks problematic - quite a few types can be cast to T and bool, very easy for a user to make an error.

Probably better to make unassembled an enum. Easier to read too.

Having an enum for a single function seems overkill to me.
If there's a way to extract back the Mat from set_fn, we could test if it is a MATIS inside the function, and we won't need the extra optional argument. @garth-wells what do you think?

In the branch I made, I use an insertion mode with add or insert to distinguish between the two, which resonates well with the scatter_fwd/scatter_reverse modes we have in the Python-interface for vectors.

I looked at your branch. I don't think it is correct for MATAIJ. set_diagonal uses (correctly) set_fn = dolfinx::la::petsc::Matrix::set_fn(A, INSERT_VALUES), and the values won't be accumulated into MATAIJ . I think it is best to use an optional boolean that clearly indicates we want to insert into an unassembled (MATIS) matrix. All these problems appear because the interface for set_diagonal only takes set_fn and not the matrix directly

Alternatively, use add_values formalism everywhere in matrix assembly (above all this is FEM :-)). The optimization you get by not communicating the values to be inserted into the diagonal is minimal @garth-wells ?

I looked at your branch. I don't think it is correct for MATAIJ. set_diagonal uses (correctly) set_fn = dolfinx::la::petsc::Matrix::set_fn(A, INSERT_VALUES), and the values won't be accumulated into MATAIJ . I think it is best to use an optional boolean that clearly indicates we want to insert into an unassembled (MATIS) matrix. All these problems appear because the interface for set_diagonal only takes set_fn and not the matrix directly

Right, but if we modify the input to the set_fn to then use ADD_VALUES it would resolve the issue, right?
i.e.

PetscBool flg; PetscObjectTypeCompare((PetscObject)A, MATIS, &flg); dolfinx::la::VectorInsertMode mode = flg ? dolfinx::la::VectorInsertMode::add : dolfinx::la::VectorInsertMode::insert; InsertMode petsc_mode = flg ? ADD_VALUES : INSERT_VALUES; dolfinx::fem::set_diagonal( dolfinx::la::petsc::Matrix::set_fn(A, petsc_mode), V, _bcs, diagonal); diagonal, mode); },

Thinking more about it, I kind of agree that we should only have one approach, which preferably is the add-mode, as the approach of assuming that two independent input arguments are set correctly by the user is likely to lead to a lot of confusion.

Having only a single, accumulate data from all processes as 1/num_shared_procs seems sensible.

The reason for DOLFINx not interfacing directly with the PETSc matrix, is that it allows us to have a single code path for multiple matrix types, like our own, built in matrices, and potentially TRILLINOS, without having to change the set_diagonal, dolfinx::fem::assemble_* functions.

I'm fine with any solution you think is reasonable. Note that if you switch back to add values, you won't need to flush assembly anymore in the fem callbacks. And you can still use the optimization of adding only to diagonal entries in AIJ format (since the value in the diagonal before the call is 0.0, so add or insert is the same)

Right, but if we modify the input to the set_fn to then use ADD_VALUES it would resolve the issue, right?
i.e

If you want equivalent behaviour, it should be the opposite way, i.e. petsc_mode = !flg ? ADD_VALUES : INSERT_VALUES. This is because MATIS inserts directly into the local matrix. matrix-vector products in MATIS are done via R^t A R, where R restricts from global to local, A does the local multiplications, and R^T sums up the local contributions

Right, I got that the wrong way around.

cpp/dolfinx/fem/assembler.h

stefanozampini · 2025-08-30T10:14:31Z

It might be better to add this to the exiting mixed Poisson demo. We're trying to avoid too many demos that solve very similar problems in favour of demos that show different ways to solve the same problem (in the same demo). The mixed Poisson demo could demonstrate a range of different solvers.

That test uses a different Pmat and has extra code to set up the preconditioner. Wouldn't it be too messy to add another preconditioner in the same script?

Put the different solvers in different functions. Can pass in common data, e.g. mesh.

Done

field information for fields how to clone a matrix

jorgensd · 2025-09-01T11:27:57Z

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist.
It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

I installed the release PETSc branch and your branch using spack.

stefanozampini · 2025-09-01T13:09:31Z

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains
In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

jorgensd · 2025-09-01T14:13:19Z

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:

 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0

when using superlu_dist, as opposed to mumps as the factorization backend in parallel.
It seems to only be a problem when one specifies the local_solver_type to be superlu_dist.
Using

    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"

or

    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"

is fine.

stefanozampini · 2025-09-01T14:16:22Z

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:

 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0

when using superlu_dist, as opposed to mumps as the factorization backend in parallel. It seems to only be a problem when one specifies the local_solver_type to be superlu_dist. Using

    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"

or

    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"

is fine.

I think in that case one of the processes has zero entries, and superlu_dist fails.

jorgensd · 2025-09-01T14:33:57Z

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:

 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0

when using superlu_dist, as opposed to mumps as the factorization backend in parallel. It seems to only be a problem when one specifies the local_solver_type to be superlu_dist. Using

    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"

or

    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"

is fine.

I think in that case one of the processes has zero entries, and superlu_dist fails.

This happen even if I increase the mesh to be 500x500, and use only 2 processes, where it is unlikely that one of the coarse levels should end up with zero entries? Is it because we do not have a (1, 1) diagonal block in the system?

stefanozampini · 2025-09-01T14:43:06Z

@stefanozampini I get a segfault when running the demo_mixed-poisson.py with superlu_dist. It runs fine with mumps, both in serial and parallel, potentially a downstream issue I guess..

if you run sequentially, BDDC is a direct solver because there no subdomains In that case, the coarse solver is empty and superludist segfaults because it does not support the case of matrices of size 0x0

I installed the release PETSc branch and your branch using spack.

The changes have been merged in main yesterday, no need to track my branch

Even running with PETSc main brain (using spack) and your branch of DOLFINx, I get:
 File "/home/dokken/Documents/src/dolfinx/python/demo/demo_mixed-poisson.py", line 418, in <module>
    problem.solve()
  File "/home/dokken/Documents/src/spack/var/spack/environments/test_matis/.spack-env/view/lib/python3.12/site-packages/dolfinx/fem/petsc.py", line 987, in solve
    self.solver.solve(self.b, self.x)
  File "petsc4py/PETSc/KSP.pyx", line 1782, in petsc4py.PETSc.KSP.solve
petsc4py.PETSc.Error: error code 81
[3] KSPSolve() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:1091
[3] KSPSolve_Private() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:840
[3] KSPSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/ksp/interface/itfunc.c:429
[3] PCSetUp() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/interface/precon.c:1120
[3] PCSetUp_BDDC() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddc.c:1672
[3] PCBDDCSetUpSolvers() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:3852
[3] PCBDDCSetUpCorrection() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/ksp/pc/impls/bddc/bddcprivate.c:4560
[3] MatCholeskyFactor() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/interface/matrix.c:3372
[3] MatCholeskyFactor_SeqDense() at /tmp/dokken/spack-stage/spack-stage-petsc-main-bygphg6hok5rdbzsbmlhkt3p6alpuhyp/spack-src/src/mat/impls/dense/seq/dense.c:882
[3] Zero pivot in Cholesky factorization: https://petsc.org/release/faq/#zeropivot
[3] Bad factorization: zero pivot in row 0
when using superlu_dist, as opposed to mumps as the factorization backend in parallel. It seems to only be a problem when one specifies the local_solver_type to be superlu_dist. Using
    coarse_solver = "lu"
    coarse_solver_type = "superlu_dist"
    local_solver_type = "mumps"
or
    coarse_solver = "lu"
    coarse_solver_type = "mumps"
    local_solver_type = "mumps"
is fine.
I think in that case one of the processes has zero entries, and superlu_dist fails.
This happen even if I increase the mesh to be 500x500, and use only 2 processes, where it is unlikely that one of the coarse levels should end up with zero entries? Is it because we do not have a (1, 1) diagonal block in the system?

The issue you posted was with a local problem. run with the option -pc_bddc_check_level 1 and post the output

jorgensd · 2025-09-01T14:59:34Z

pc_bddc_check_level 1

Solving mixed poisson problem with monolithic solver.
Error benign extension 0.00000000000000e+00
Error scaling extension 0.00000000000000e+00
Error scaling restriction 0.00000000000000e+00
--------------------------------------------------
Subdomain 0000 infinity error for Dirichlet solve (demo_mixed_poisson_pc_bddc_dirichlet_) =  inf 
Subdomain 0001 infinity error for Dirichlet solve (demo_mixed_poisson_pc_bddc_dirichlet_) =  inf 
--------------------------------------------------
Local BDDC graph for subdomain 0000 (seq 0)
Number of vertices 625767
Number of local subdomains 1
Custom minimal size 1
Topological two dim? TRUE (set TRUE)
Total number of connected components 1
  cc 0 (size 514, fid 0, neighs: 0 1)
Local BDDC graph for subdomain 0001 (seq 0)
Number of vertices 625747
Number of local subdomains 1
Custom minimal size 1
Topological two dim? TRUE (set TRUE)
Total number of connected components 1
  cc 0 (size 514, fid 0, neighs: 0 1)
--------------------------------------------------------------
Subdomain 0000 got 00 local candidate vertices (1)
Subdomain 0000 got 01 local candidate edges    (1)
Subdomain 0000 got 00 local candidate faces    (1)
--------------------------------------------------------------
Subdomain 0001 got 00 local candidate vertices (1)
Subdomain 0001 got 01 local candidate edges    (1)
Subdomain 0001 got 00 local candidate faces    (1)
--------------------------------------------------
Subdomain 0000 local dimensions
local_size = 625767, dirichlet_size = 625253, boundary_size = 514
r_size = 625767, v_size = 0, constraints = 1, local_primal_size = 1
Subdomain 0001 local dimensions
local_size = 625747, dirichlet_size = 625233, boundary_size = 514
r_size = 625747, v_size = 0, constraints = 1, local_primal_size = 1
--------------------------------------------------
Subdomain 0000 infinity error for Neumann solve (demo_mixed_poisson_pc_bddc_neumann_) =  inf
Subdomain 0001 infinity error for Neumann solve (demo_mixed_poisson_pc_bddc_neumann_) =  inf

stefanozampini · 2025-09-01T15:04:09Z

that is odd, I don't know what you are running. This is what I get running the demo in my branch with 2 processes

$ mpiexec -n 2 python demo_mixed-poisson.py 
....
Solving mixed poisson problem with monolithic solver.
  iteration:    0, residual: 1.997e+01
  iteration:    1, residual: 8.376e-01
  iteration:    2, residual: 6.651e-03
  iteration:    3, residual: 1.814e-04
  iteration:    4, residual: 3.579e-06
  iteration:    5, residual: 5.895e-08
KSP Object: (demo_mixed_poisson_) 2 MPI processes
  type: gmres
    restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: (demo_mixed_poisson_) 2 MPI processes
  type: bddc
    Use verbose output: 0
    Use user-defined CSR: 0
    Use local mat graph: 0
    Connectivity graph topological dimension: 2
    Corner selection: 0 (selected 0)
    Use vertices: 1 (vertex size 1)
    Use edges: 1
    Use faces: 1
    Use true near null space: 0
    Use QR for single constraints on cc: 0
    Use change of basis on local edge nodes: 0
    Use change of basis on local face nodes: 0
    User defined change of basis matrix: 0
    Has change of basis matrix: 0
    Eliminate dirichlet boundary dofs: 0
    Switch on static condensation ops around the interface preconditioner: 0
    Use exact dirichlet trick: 0
    Interface extension: DIRICHLET
    Multilevel max levels: 0
    Multilevel coarsening ratio: 8
    Use estimated eigs for coarse problem: 0
    Use deluxe scaling: 0
    Use deluxe zerorows: 0
    Use deluxe singlemat: 0
    Rebuild interface graph for Schur principal minors: 0
    Number of dofs' layers for the computation of principal minors: -1
    Use user CSR graph to compute successive layers: 0
    Adaptive constraint selection threshold (active 0, userdefined 0): 0.
    Min constraints / connected component: 0
    Max constraints / connected component: 0
    Invert exact Schur complement for adaptive selection: 0
    Symmetric computation of primal basis functions: 1
    Num. Procs. to map coarse adjacency list: 0
    Coarse eqs per proc (significant at the coarsest level): 1
    Detect disconnected: 1 (filter 0)
    Benign subspace trick: 1 (change explicit 1)
    Benign subspace trick is active: 0
    Algebraic computation of no-net-flux: 1
  ********************************** STATISTICS AT LEVEL 0 **********************************
    Global dofs sizes: all 46272 interface 96 coarse 1
    Coarsening ratios: all/coarse 46272 interface/coarse 96
    Active processes : 2
    Total subdomains : 2
    Dofs type        :	MIN	MAX	MEAN
    Interior  dofs   :	23084	23092	23088
    Interface dofs   :	96	96	96
    Primal    dofs   :	1	1	1
    Local     dofs   :	23180	23188	23184
    Local     subs   :	1	1
  --- Interior solver (rank 0)
    KSP Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
      type: cholesky
        out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: external
        factor fill ratio given 0., needed 0.
          Factored matrix follows:
            Mat Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
              type: mumps
              rows=23084, cols=23084
              package used to perform factorization: mumps
              total: nonzeros=529736, allocated nonzeros=529736
                MUMPS run parameters:
                  Use -demo_mixed_poisson_pc_bddc_dirichlet_ksp_view ::ascii_info_detail to display information for all processes
                  RINFOG(1) (global estimated flops for the elimination after analysis): 1.73252e+07
                  RINFOG(2) (global estimated flops for the assembly after factorization): 257024.
                  RINFOG(3) (global estimated flops for the elimination after factorization): 1.96096e+07
                  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
                  INFOG(3) (estimated real workspace for factors on all processors after analysis): 708752
                  INFOG(4) (estimated integer workspace for factors on all processors after analysis): 114624
                  INFOG(5) (estimated maximum front size in the complete tree): 125
                  INFOG(6) (number of nodes in the complete tree): 1462
                  INFOG(7) (ordering option effectively used after analysis): 2
                  INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
                  INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 748486
                  INFOG(10) (total integer space store the matrix factors after factorization): 117220
                  INFOG(11) (order of largest frontal matrix after factorization): 125
                  INFOG(12) (number of off-diagonal pivots): 9216
                  INFOG(13) (number of delayed pivots after factorization): 1298
                  INFOG(14) (number of memory compress after factorization): 0
                  INFOG(15) (number of steps of iterative refinement after solution): 0
                  INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 10
                  INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 10
                  INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 10
                  INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 10
                  INFOG(20) (estimated number of entries in the factors): 529736
                  INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 9
                  INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 9
                  INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
                  INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3
                  INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
                  INFOG(28) (after factorization: number of null pivots encountered): 0
                  INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 567071
                  INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 9, 9
                  INFOG(32) (after analysis: type of analysis done): 1
                  INFOG(33) (value used for ICNTL(8)): -2
                  INFOG(34) (exponent of the determinant if determinant is requested): 0
                  INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 567071
                  INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
                  INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
                  INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
                  INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
      linear system matrix = precond matrix:
      Mat Object: (demo_mixed_poisson_pc_bddc_dirichlet_) 1 MPI process
        type: seqaij
        rows=23084, cols=23084
        total: nonzeros=123884, allocated nonzeros=123884
        total number of mallocs used during MatSetValues calls=0
          not using I-node routines
  --- Correction solver (rank 0)
    KSP Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
      type: cholesky
        out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: external
        factor fill ratio given 0., needed 0.
          Factored matrix follows:
            Mat Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
              type: mumps
              rows=23180, cols=23180
              package used to perform factorization: mumps
              total: nonzeros=521359, allocated nonzeros=521359
                MUMPS run parameters:
                  Use -demo_mixed_poisson_pc_bddc_neumann_ksp_view ::ascii_info_detail to display information for all processes
                  RINFOG(1) (global estimated flops for the elimination after analysis): 1.63329e+07
                  RINFOG(2) (global estimated flops for the assembly after factorization): 248685.
                  RINFOG(3) (global estimated flops for the elimination after factorization): 1.81966e+07
                  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
                  INFOG(3) (estimated real workspace for factors on all processors after analysis): 698339
                  INFOG(4) (estimated integer workspace for factors on all processors after analysis): 116004
                  INFOG(5) (estimated maximum front size in the complete tree): 133
                  INFOG(6) (number of nodes in the complete tree): 1494
                  INFOG(7) (ordering option effectively used after analysis): 2
                  INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
                  INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 731773
                  INFOG(10) (total integer space store the matrix factors after factorization): 118396
                  INFOG(11) (order of largest frontal matrix after factorization): 133
                  INFOG(12) (number of off-diagonal pivots): 9216
                  INFOG(13) (number of delayed pivots after factorization): 1196
                  INFOG(14) (number of memory compress after factorization): 0
                  INFOG(15) (number of steps of iterative refinement after solution): 0
                  INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 10
                  INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 10
                  INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 10
                  INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 10
                  INFOG(20) (estimated number of entries in the factors): 521359
                  INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 9
                  INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 9
                  INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5
                  INFOG(24) (after analysis: value of ICNTL(12) effectively used): 3
                  INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
                  INFOG(28) (after factorization: number of null pivots encountered): 0
                  INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 553885
                  INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 9, 9
                  INFOG(32) (after analysis: type of analysis done): 1
                  INFOG(33) (value used for ICNTL(8)): -2
                  INFOG(34) (exponent of the determinant if determinant is requested): 0
                  INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 553885
                  INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
                  INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
                  INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
                  INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
      linear system matrix = precond matrix:
      Mat Object: (demo_mixed_poisson_pc_bddc_neumann_) 1 MPI process
        type: seqaij
        rows=23180, cols=23180
        total: nonzeros=124556, allocated nonzeros=124556
        total number of mallocs used during MatSetValues calls=0
          not using I-node routines
  --- Coarse solver
    KSP Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
      type: cholesky
        out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: external
        factor fill ratio given 0., needed 0.
          Factored matrix follows:
            Mat Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
              type: mumps
              rows=1, cols=1
              package used to perform factorization: mumps
              total: nonzeros=1, allocated nonzeros=1
                MUMPS run parameters:
                  Use -demo_mixed_poisson_pc_bddc_coarse_ksp_view ::ascii_info_detail to display information for all processes
                  RINFOG(1) (global estimated flops for the elimination after analysis): 0.
                  RINFOG(2) (global estimated flops for the assembly after factorization): 0.
                  RINFOG(3) (global estimated flops for the elimination after factorization): 0.
                  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
                  INFOG(3) (estimated real workspace for factors on all processors after analysis): 1
                  INFOG(4) (estimated integer workspace for factors on all processors after analysis): 22
                  INFOG(5) (estimated maximum front size in the complete tree): 1
                  INFOG(6) (number of nodes in the complete tree): 1
                  INFOG(7) (ordering option effectively used after analysis): 2
                  INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
                  INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 1
                  INFOG(10) (total integer space store the matrix factors after factorization): 22
                  INFOG(11) (order of largest frontal matrix after factorization): 1
                  INFOG(12) (number of off-diagonal pivots): 0
                  INFOG(13) (number of delayed pivots after factorization): 0
                  INFOG(14) (number of memory compress after factorization): 0
                  INFOG(15) (number of steps of iterative refinement after solution): 0
                  INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 0
                  INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 0
                  INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 0
                  INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 0
                  INFOG(20) (estimated number of entries in the factors): 1
                  INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 0
                  INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 0
                  INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0
                  INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
                  INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
                  INFOG(28) (after factorization: number of null pivots encountered): 0
                  INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 1
                  INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0
                  INFOG(32) (after analysis: type of analysis done): 1
                  INFOG(33) (value used for ICNTL(8)): 0
                  INFOG(34) (exponent of the determinant if determinant is requested): 0
                  INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 1
                  INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
                  INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
                  INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
                  INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
      linear system matrix = precond matrix:
      Mat Object: (demo_mixed_poisson_pc_bddc_coarse_) 1 MPI process
        type: seqaij
        rows=1, cols=1
        total: nonzeros=1, allocated nonzeros=1
        total number of mallocs used during MatSetValues calls=0
          not using I-node routines
  linear system matrix = precond matrix:
  Mat Object: (demo_mixed_poisson_A_) 2 MPI processes
    type: is
    rows=46272, cols=46272
    total: nonzeros=249120, allocated nonzeros=249696
    total number of mallocs used during MatSetValues calls=0
      has attached near null space

jorgensd · 2025-09-01T16:07:48Z

@stefanozampini
I would get something similar if i used mumps for the factorization of the matrices.

If I instead use superlu_dist as the backend, i get the issue I mentioned above. In the logs you provide for the running parallel code, you use mumps for the factorization.

stefanozampini · 2025-09-02T08:17:39Z

@stefanozampini I would get something similar if i used mumps for the factorization of the matrices.

If I instead use superlu_dist as the backend, i get the issue I mentioned above. In the logs you provide for the running parallel code, you use mumps for the factorization.

This is a problem in superlu_dist, I have added the failing test to our CI and hopefully they will fix it in superlu_dist https://gitlab.com/petsc/petsc/-/merge_requests/8681

PetscErrorCode can be an enum

3034b59

stefanozampini commented Aug 28, 2025

View reviewed changes

michalhabera reviewed Aug 29, 2025

View reviewed changes

python/demo/demo_mixed-poisson-bddc.py Outdated Show resolved Hide resolved

Support MATIS

c2d11dd

garth-wells reviewed Aug 29, 2025

View reviewed changes

cpp/dolfinx/fem/assembler.h Show resolved Hide resolved

garth-wells added enhancement New feature or request demo New demo or demo enhancements/improvements labels Aug 29, 2025

stefanozampini force-pushed the main branch from b309147 to 43f2214 Compare August 30, 2025 10:13

stefanozampini added 3 commits August 30, 2025 14:32

la::petsc::create_matrix: the Python interface can pass "mpi"

a4e7d2e

PETSc FEM python wrapper: attach DM shell basic customization

4d4591e

field information for fields how to clone a matrix

demo BDDC mixed poisson

cc73f78

stefanozampini force-pushed the main branch from 43f2214 to cc73f78 Compare August 30, 2025 11:33

Uh oh!

Initial support for PETSc MATIS matrix format #3885

Are you sure you want to change the base?

Initial support for PETSc MATIS matrix format #3885

Uh oh!

Conversation

stefanozampini commented Aug 28, 2025

Uh oh!

garth-wells commented Aug 28, 2025

Uh oh!

stefanozampini Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorgensd Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanozampini Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorgensd Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanozampini commented Aug 28, 2025

Uh oh!

jorgensd commented Aug 28, 2025

Uh oh!

garth-wells commented Aug 28, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanozampini Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanozampini Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorgensd Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stefanozampini commented Aug 30, 2025

Uh oh!

jorgensd commented Sep 1, 2025

Uh oh!

stefanozampini commented Sep 1, 2025

Uh oh!

jorgensd commented Sep 1, 2025

Uh oh!

stefanozampini commented Sep 1, 2025

Uh oh!

stefanozampini Aug 28, 2025 •

edited

Loading

jorgensd Aug 28, 2025 •

edited

Loading

stefanozampini Aug 29, 2025 •

edited

Loading

jorgensd Sep 1, 2025 •

edited

Loading

stefanozampini Aug 30, 2025 •

edited

Loading

stefanozampini Sep 1, 2025 •

edited

Loading

jorgensd Sep 1, 2025 •

edited

Loading