Changes to support DoMINO Design Sensitivities work + DoMINO model code fixes #973

peterdsharpe · 2025-06-13T16:28:06Z

PhysicsNeMo Pull Request

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

…nctions to use pathlib for improved readability and functionality. Updates type hints to support both str and Path types.

…eplaced simply by the built-in sum(lists), and (c) as-written will always create an error, since `newlist` is a tuple and hence does not have a .extend() method.

peterdsharpe · 2025-06-13T16:29:17Z

physicsnemo/utils/domino/utils.py

-            newlist.extend(x)
-    return newlist
-
-


Review note: this PR deletes merge(); this function is (a) not used anywhere, (b) can and should be replaced simply by the built-in sum(lists, start=[]), and (c) as-written will always create an error, since newlist is a tuple and hence does not have a .extend() method.

…Torch documentation in GeoProcessor, GeometryRep, NNBasisFunctions, ParameterModel, and DoMINO classes.

… and performance; updates type hints and formatting, and modifies input handling for mesh data.

…rdinates; updates DoMINOInference to improve memory management and adds detailed docstrings for clarity.

…g random sampling, and adding detailed docstrings for initialization and item retrieval methods.

…n; updates DoMINOInference to utilize the new smoothing function and modifies sensitivity calculations accordingly. Enhances type hints and formatting for clarity.

…alysis

… version control.

…erdsharpe/physicsnemo into psharpe/domino-sensitivities

…amline input handling. Updates input file processing and enhances results storage for mesh data.

…e/physicsnemo into psharpe/domino-sensitivities

…ng script for gradient checking.

…treamlines input file path construction.

…MINOInference, including sensitivity analysis and output to text files.

…cking.py to save output data in a dedicated gradient_checking_results directory. Adds new raw and smooth drag gradient data files.

…ivities (e.g., drag adjoint) with respect to underlying input geometry in CHANGELOG.md.

peterdsharpe · 2025-06-19T19:15:20Z

physicsnemo/utils/domino/utils.py

@@ -18,7 +18,7 @@
 Important utilities for data processing and training, testing DoMINO.
 """

-import os


Note for future work: in general, pathlib is preferred over os.path, since as far back as the Python 3.4 days:

https://medium.com/@barila/choose-your-path-pythons-pathlib-vs-os-path-4de0b1e752dd

https://www.reddit.com/r/learnpython/comments/i29h6j/when_or_why_should_i_use_pathlibpath_over_ospath/

…ge, features, and configuration for aerodynamic design optimization.

peterdsharpe · 2025-06-20T13:25:26Z

/blossom-ci

peterdsharpe · 2025-06-20T14:37:16Z

Review note: this introduces changes to the activation functions of the DoMINO model, and hence, existing DoMINO checkpoints will need to be retrained.

Last remaining to-do to pass CI: update the cached DoMINO checkpoint at ./test/models/data/domino_output.pth so that it can be loaded and validated at test-time. Going to hold off on this until review comments come back, in case other changes need to be made before generating this.

coreyjadams · 2025-06-24T14:42:20Z

examples/cfd/external_aerodynamics/domino/src/conf/config.yaml

+  output_dir: /user/aws_data_all/
+  input_dir: /data/drivaer_aws/drivaer_data_full/
+  cached_dir: /user/cached/drivaer_aws/drivaer_data_full/


We already declare both project_dir and output in this config. Is it possible to reuse those with ${output}? I don't want to leave a config that requires many manual changes, it'd be nice to format all these paths as sensible relative paths.

This is a good consideration! (Actually, I hadn't realized that you could use ${}-variables in YAML, so I learned something new today!)

Let me investigate what I can do here and get back to you on this one.

coreyjadams

I think this is a great example use of DoMINO and a really cool extension to it's utility. As written, I think it's great, but it will be easier to maintain and extend with a few changes. I know you have another PR open on the utils cleanup, so I won't focus there too much. My main thoughts are:

The mesh utility functions should live somewhere they can get unit tested and used more broadly.
The config files have a lot of repetition (from the original configs) that could be cleaned up and consolidated
gelu -> relu is a good idea but we should do it more cautiously so as not to break the model for existing users.
The second half of the README is mostly just a list of bullets. Can we elaborate on those where they are useful? And a few are repetitive, perhaps consolidate?
The datapipe uses scipy KDTree which is good, but we can also use cuml and do it on the GPUs.

examples/cfd/external_aerodynamics/domino_sensitivity/conf/config.yaml

coreyjadams · 2025-06-24T14:52:16Z

examples/cfd/external_aerodynamics/domino_sensitivity/design_datapipe.py

We did a decent amount of work in the last release to accelerate the data pipeline for DoMINO. I see a number of pieces here using CPU-only methods. If we have time to do it before merge, we should isolate the pieces from the training datapipe to reuse them here on CPU/GPU agnostic codes.

Really, I'm staring at the KDTree. That's a CPU op but cuml has a faster implementation on GPU.

coreyjadams · 2025-06-24T14:54:02Z

...fd/external_aerodynamics/domino_sensitivity/gradient_checking_results/drag_gradients_raw.txt

I checked your math, might have a decimal error on line 62 column 17?

Just kidding. But - are these meant to be for unit testing and math checks? Might be better placed in the test area if so?

coreyjadams · 2025-06-24T15:02:16Z

examples/cfd/external_aerodynamics/domino_sensitivity/design_datapipe.py

Also - the way it's implemented here should be pretty easy to do with cupy, too. Worth doing?

examples/cfd/external_aerodynamics/domino_sensitivity/main.py

examples/cfd/external_aerodynamics/domino_sensitivity/requirements.txt

examples/cfd/external_aerodynamics/domino_sensitivity/conf/cached.yaml

examples/cfd/external_aerodynamics/domino_sensitivity/utilities/mesh_postprocessing.py

coreyjadams · 2025-06-24T15:29:10Z

examples/cfd/external_aerodynamics/domino_sensitivity/utilities/mesh_postprocessing.py

This file appears to have some good stuff. I don't want it to get buried and reimplemented later, and anyways it looks like it could use some tests implemented too.

Could this get reorganized into, say, physicsnemo.utils.mesh or similar?

Sure! It's actually based on some utilities that @ktangsali and I were working on in PhysicsNeMo-CFD, which I had included here since I needed our neighbor-finding algorithm to implement the Laplacian smoothing kernel, and I couldn't depend on end-users of this specific example having PhysicsNeMo-CFD installed.

Perhaps one alternative strategy (rather than including these utilities here) would be to have this example list PN-CFD in its requirements.txt - but I'd really like to keep the dependency direction one-way: PhysicsNeMo-CFD depends on PhysicsNeMo, but not vice versa. It gets super messy when you have a bidirectional dependency and need to enforce mutual version pins. So, I don't love that alternative.

I do think this would be a good thing to upstream into something like physicsnemo.utils.mesh, since it's just generally useful and not CFD-specific - but I'd like to run it by @ktangsali first.

coreyjadams · 2025-06-24T15:31:21Z

physicsnemo/models/domino/model.py

I think wholesale switching to gelu makes sense for your design sensitivities work but it's going to be a bumpy shift for everyone who has a pretrained or fine tuned model. Can I propose, instead, the activation is configurable at the model level? Default to relu (for backward compatibility) and set it to gelu in your configs. We can target a pivot from relu -> gelu after the next trained model is released, presumably with gelu?

This is a valid concern!

I agree that there may be growing pains here, and I actually think it's symptomatic of a much larger problem that goes beyond this instance. Namely, the fundamental issue is that saved checkpoints currently do not encode much info about the model architecture, so we're loading fixed checkpoints (via saved files) into a moving-target architecture (via the repository).

On the GeLU/ReLU issue: I agree that making activations configurable for DoMINO is a workable fix for this instance of the issue. I'm open to doing this, though I think it's a bit of a band-aid fix to the larger true issue.

For the larger issue, it's an open question whether/how we tackle it:

One possible fix would be if, when the user saves a model checkpoint, a hash of the PhysicsNeMo repo is also saved (most convenient could be the latest Git commit hash). This has some implementation issues (what if the user isn't using Git?), but something conceptually similar would be nice. This way, when the checkpoint is loaded, we can throw a warning if the architecture is different than when it was saved. (PyTorch will already throw this for some model changes, but not for all changes.)

I think something like ONNX is not a great option here, since some of our models end up using a lot of bespoke code that doesn't fit nicely into ONNX primitives (e.g., anything with Warp)

We could just leave this as-is and expect users to pin commit hashes / versions for serious deployments. In an ideal world this should be happening anyway, but I'm not sure how reliable this solution is "in the wild".

More long-term: when it comes to new model architectures, I would recommend defaulting to $C^\infty$-continuous modeling where possible - at least for problems where the underlying PDE is expected to be smooth. This shows up most notably in activation functions, but also in more subtle places like KNN kernels (depending on how the results are then handled). Reworking how we handle KNN-based layers to be higher-order continuous is a challenge, but for the low-hanging fruit (like activations), I think encouraging continuity would be a good default policy.

This is because many applications benefit from this continuity property: for example, @ktangsali 's current work on adding PDE-residual-based physics loss to DoMINO also fundamentally requires meaningful spatial gradients, which any architecture that passes spatial information through ReLU cannot provide via AD (this is explicitly true for derivatives of order 2+, but even first-derivative accuracy will suffer via loss of smoothness).

My personal preference would be as follows:

Make GeLU the default for now, since higher-order continuity is required for multiple downstream applications and we expect the true solution of the RANS PDE to exhibit at least $C^2$-continuity

Not add activation as an exposed configuration option, trusting that users should be pinning their PhysicsNeMo repo versions when using a fixed saved checkpoint.

Choose whether/how to address the fundamental issue of separation-of-checkpoints-and-architectures later.

However, I'm definitely open to being convinced - can other reviewers weigh in here? @ktangsali @RishikeshRanade

…INO), as well as eliminating relative paths

peterdsharpe added 8 commits June 12, 2025 12:59

clarifies I/O in domino train.py

eb2166e

Gives paths in config.yaml user-agnostic pathnames

0b2460d

Switches from relu -> gelu to allow smooth gradients

7f4dbce

Adds initial commit for design sensitivities study

7cdbfd2

Corrects outdated type hint

474f9a7

Refactors parameters in signed_distance_field calls for clarity

ea87787

Refactors directory handling in create_directory and get_filenames fu…

db42406

…nctions to use pathlib for improved readability and functionality. Updates type hints to support both str and Path types.

Deletes merge(); this function is (a) not used anywhere, (b) can be r…

71723c7

…eplaced simply by the built-in sum(lists), and (c) as-written will always create an error, since `newlist` is a tuple and hence does not have a .extend() method.

peterdsharpe self-assigned this Jun 13, 2025

peterdsharpe commented Jun 13, 2025

View reviewed changes

peterdsharpe added 20 commits June 13, 2025 12:31

black formatting

10e5967

Code quality improvements

007348b

Replaces 'axis' with 'dim' in torch.cat calls for correctness with Py…

36c4bd8

…Torch documentation in GeoProcessor, GeometryRep, NNBasisFunctions, ParameterModel, and DoMINO classes.

Adds initial changes for DoMINO sensitivity

392b273

Refactors DesignDatapipe and DoMINOInference for improved readability…

1e3e7b5

… and performance; updates type hints and formatting, and modifies input handling for mesh data.

Refactors DesignDatapipe to directly use STL centers for geometry coo…

2de7ac0

…rdinates; updates DoMINOInference to improve memory management and adds detailed docstrings for clarity.

Enhances DesignDatapipe by updating bounding box type hints, improvin…

5b88395

…g random sampling, and adding detailed docstrings for initialization and item retrieval methods.

Implements Laplacian smoothing for mesh data in a new utility functio…

9d7d7da

…n; updates DoMINOInference to utilize the new smoothing function and modifies sensitivity calculations accordingly. Enhances type hints and formatting for clarity.

Adds numba to requirements for improved performance in sensitivity an…

1e1af3b

…alysis

Adds sbatch_logs/ to .gitignore to exclude SLURM batch log files from…

73cf5f4

… version control.

Merge branch 'psharpe/domino-sensitivities' of https://github.com/pet…

0c05b1a

…erdsharpe/physicsnemo into psharpe/domino-sensitivities

Adds compute-optimized mesh_postprocessing utilities

14b5303

Working main.py with abstracted postprocessing step

a214c03

formatting

ca48cb9

Refactors main.py to remove duplicate STL combining function and stre…

b0c9c3f

…amline input handling. Updates input file processing and enhances results storage for mesh data.

Commits configuration files for sensitivity studies

74c5ebd

Adds requirements.txt

4dbe76a

Merge branch 'psharpe/domino-sensitivities' of github.com:peterdsharp…

0efce6e

…e/physicsnemo into psharpe/domino-sensitivities

Adds raw and smooth drag gradient data files, and implements a plotti…

ef19d41

…ng script for gradient checking.

Refactors import statements in main.py for consistency and clarity. S…

0f12306

…treamlines input file path construction.

peterdsharpe added 4 commits June 18, 2025 11:32

Creates main_gradient_checking.py for drag gradient checking using Do…

6be2f88

…MINOInference, including sensitivity analysis and output to text files.

Updates file paths in main_gradient_checking.py and plot_gradient_che…

c4b809a

…cking.py to save output data in a dedicated gradient_checking_results directory. Adds new raw and smooth drag gradient data files.

Adds a new aerodynamics example using DoMINO to compute design sensit…

1a51eb2

…ivities (e.g., drag adjoint) with respect to underlying input geometry in CHANGELOG.md.

Merge branch 'physicsnemo/main' into psharpe/domino-sensitivities

679bf8e

peterdsharpe marked this pull request as ready for review June 19, 2025 19:10

peterdsharpe commented Jun 19, 2025

View reviewed changes

peterdsharpe requested a review from RishikeshRanade June 19, 2025 19:16

peterdsharpe added the 3 - Ready for Review Ready for review by team label Jun 19, 2025

peterdsharpe requested review from ktangsali and coreyjadams June 19, 2025 19:17

Add README.md for DoMINO sensitivity analysis pipeline, detailing usa…

9e02933

…ge, features, and configuration for aerodynamic design optimization.

peterdsharpe mentioned this pull request Jun 20, 2025

Refactor on DoMINO utils.py to improve code quality #985

Open

5 tasks

peterdsharpe added 6 commits June 20, 2025 09:16

black formatting fixes

1b96606

Add SPDX license headers to plot_gradient_checking.py

c56ac24

Fixes markdownlint

270d2ed

Removes unused import

b81910f

Updates license year

f4859ee

Fixes license year

15facd8

coreyjadams reviewed Jun 24, 2025

View reviewed changes

peterdsharpe added 2 commits June 24, 2025 16:59

Removes unused main block sections

278d1c4

Removes erroneous uv.lock commit

fa46229

coreyjadams requested changes Jun 24, 2025

View reviewed changes

peterdsharpe and others added 5 commits June 24, 2025 19:26

Removes some optimization language

2e90446

Merge branch 'main' into psharpe/domino-sensitivities

3c8e84f

Remove unnecessary cached yaml

38b5a59

Refactors to not require separate config (instead pulling it from DoM…

900427b

…INO), as well as eliminating relative paths

Add warning for loading model without checkpoint in DoMINOInference

085077f

Changes to support DoMINO Design Sensitivities work + DoMINO model code fixes #973

Are you sure you want to change the base?

Changes to support DoMINO Design Sensitivities work + DoMINO model code fixes #973

Uh oh!

Conversation

peterdsharpe commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Uh oh!

peterdsharpe Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterdsharpe Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterdsharpe commented Jun 20, 2025

Uh oh!

peterdsharpe commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterdsharpe Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterdsharpe Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

peterdsharpe commented Jun 13, 2025 •

edited

Loading

peterdsharpe Jun 13, 2025 •

edited

Loading

peterdsharpe Jun 19, 2025 •

edited

Loading

peterdsharpe commented Jun 20, 2025 •

edited

Loading

peterdsharpe Jun 25, 2025 •

edited

Loading

peterdsharpe Jun 25, 2025 •

edited

Loading