-
Notifications
You must be signed in to change notification settings - Fork 377
Changes to support DoMINO Design Sensitivities work + DoMINO model code fixes #973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes to support DoMINO Design Sensitivities work + DoMINO model code fixes #973
Conversation
…nctions to use pathlib for improved readability and functionality. Updates type hints to support both str and Path types.
…eplaced simply by the built-in sum(lists), and (c) as-written will always create an error, since `newlist` is a tuple and hence does not have a .extend() method.
newlist.extend(x) | ||
return newlist | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review note: this PR deletes merge()
; this function is (a) not used anywhere, (b) can and should be replaced simply by the built-in sum(lists, start=[])
, and (c) as-written will always create an error, since newlist
is a tuple and hence does not have a .extend()
method.
…Torch documentation in GeoProcessor, GeometryRep, NNBasisFunctions, ParameterModel, and DoMINO classes.
… and performance; updates type hints and formatting, and modifies input handling for mesh data.
…rdinates; updates DoMINOInference to improve memory management and adds detailed docstrings for clarity.
…g random sampling, and adding detailed docstrings for initialization and item retrieval methods.
…n; updates DoMINOInference to utilize the new smoothing function and modifies sensitivity calculations accordingly. Enhances type hints and formatting for clarity.
… version control.
…erdsharpe/physicsnemo into psharpe/domino-sensitivities
…amline input handling. Updates input file processing and enhances results storage for mesh data.
…e/physicsnemo into psharpe/domino-sensitivities
…ng script for gradient checking.
…treamlines input file path construction.
…MINOInference, including sensitivity analysis and output to text files.
…cking.py to save output data in a dedicated gradient_checking_results directory. Adds new raw and smooth drag gradient data files.
…ivities (e.g., drag adjoint) with respect to underlying input geometry in CHANGELOG.md.
@@ -18,7 +18,7 @@ | |||
Important utilities for data processing and training, testing DoMINO. | |||
""" | |||
|
|||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future work: in general, pathlib
is preferred over os.path
, since as far back as the Python 3.4 days:
…ge, features, and configuration for aerodynamic design optimization.
/blossom-ci |
Review note: this introduces changes to the activation functions of the DoMINO model, and hence, existing DoMINO checkpoints will need to be retrained. Last remaining to-do to pass CI: update the cached DoMINO checkpoint at |
output_dir: /user/aws_data_all/ | ||
input_dir: /data/drivaer_aws/drivaer_data_full/ | ||
cached_dir: /user/cached/drivaer_aws/drivaer_data_full/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already declare both project_dir
and output
in this config. Is it possible to reuse those with ${output}
? I don't want to leave a config that requires many manual changes, it'd be nice to format all these paths as sensible relative paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good consideration! (Actually, I hadn't realized that you could use ${}
-variables in YAML, so I learned something new today!)
Let me investigate what I can do here and get back to you on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great example use of DoMINO and a really cool extension to it's utility. As written, I think it's great, but it will be easier to maintain and extend with a few changes. I know you have another PR open on the utils cleanup, so I won't focus there too much. My main thoughts are:
- The mesh utility functions should live somewhere they can get unit tested and used more broadly.
- The config files have a lot of repetition (from the original configs) that could be cleaned up and consolidated
gelu
->relu
is a good idea but we should do it more cautiously so as not to break the model for existing users.- The second half of the README is mostly just a list of bullets. Can we elaborate on those where they are useful? And a few are repetitive, perhaps consolidate?
- The datapipe uses scipy KDTree which is good, but we can also use cuml and do it on the GPUs.
examples/cfd/external_aerodynamics/domino_sensitivity/conf/config.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did a decent amount of work in the last release to accelerate the data pipeline for DoMINO. I see a number of pieces here using CPU-only methods. If we have time to do it before merge, we should isolate the pieces from the training datapipe to reuse them here on CPU/GPU agnostic codes.
Really, I'm staring at the KDTree. That's a CPU op but cuml
has a faster implementation on GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked your math, might have a decimal error on line 62 column 17?
Just kidding. But - are these meant to be for unit testing and math checks? Might be better placed in the test area if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - the way it's implemented here should be pretty easy to do with cupy, too. Worth doing?
examples/cfd/external_aerodynamics/domino_sensitivity/conf/cached.yaml
Outdated
Show resolved
Hide resolved
examples/cfd/external_aerodynamics/domino_sensitivity/utilities/mesh_postprocessing.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file appears to have some good stuff. I don't want it to get buried and reimplemented later, and anyways it looks like it could use some tests implemented too.
Could this get reorganized into, say, physicsnemo.utils.mesh
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! It's actually based on some utilities that @ktangsali and I were working on in PhysicsNeMo-CFD, which I had included here since I needed our neighbor-finding algorithm to implement the Laplacian smoothing kernel, and I couldn't depend on end-users of this specific example having PhysicsNeMo-CFD installed.
Perhaps one alternative strategy (rather than including these utilities here) would be to have this example list PN-CFD in its requirements.txt
- but I'd really like to keep the dependency direction one-way: PhysicsNeMo-CFD depends on PhysicsNeMo, but not vice versa. It gets super messy when you have a bidirectional dependency and need to enforce mutual version pins. So, I don't love that alternative.
I do think this would be a good thing to upstream into something like physicsnemo.utils.mesh
, since it's just generally useful and not CFD-specific - but I'd like to run it by @ktangsali first.
physicsnemo/models/domino/model.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think wholesale switching to gelu
makes sense for your design sensitivities work but it's going to be a bumpy shift for everyone who has a pretrained or fine tuned model. Can I propose, instead, the activation is configurable at the model level? Default to relu (for backward compatibility) and set it to gelu in your configs. We can target a pivot from relu -> gelu after the next trained model is released, presumably with gelu
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a valid concern!
I agree that there may be growing pains here, and I actually think it's symptomatic of a much larger problem that goes beyond this instance. Namely, the fundamental issue is that saved checkpoints currently do not encode much info about the model architecture, so we're loading fixed checkpoints (via saved files) into a moving-target architecture (via the repository).
- On the GeLU/ReLU issue: I agree that making activations configurable for DoMINO is a workable fix for this instance of the issue. I'm open to doing this, though I think it's a bit of a band-aid fix to the larger true issue.
- For the larger issue, it's an open question whether/how we tackle it:
- One possible fix would be if, when the user saves a model checkpoint, a hash of the PhysicsNeMo repo is also saved (most convenient could be the latest Git commit hash). This has some implementation issues (what if the user isn't using Git?), but something conceptually similar would be nice. This way, when the checkpoint is loaded, we can throw a warning if the architecture is different than when it was saved. (PyTorch will already throw this for some model changes, but not for all changes.)
- I think something like ONNX is not a great option here, since some of our models end up using a lot of bespoke code that doesn't fit nicely into ONNX primitives (e.g., anything with Warp)
- We could just leave this as-is and expect users to pin commit hashes / versions for serious deployments. In an ideal world this should be happening anyway, but I'm not sure how reliable this solution is "in the wild".
More long-term: when it comes to new model architectures, I would recommend defaulting to
- This is because many applications benefit from this continuity property: for example, @ktangsali 's current work on adding PDE-residual-based physics loss to DoMINO also fundamentally requires meaningful spatial gradients, which any architecture that passes spatial information through ReLU cannot provide via AD (this is explicitly true for derivatives of order 2+, but even first-derivative accuracy will suffer via loss of smoothness).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal preference would be as follows:
- Make GeLU the default for now, since higher-order continuity is required for multiple downstream applications and we expect the true solution of the RANS PDE to exhibit at least
$C^2$ -continuity - Not add activation as an exposed configuration option, trusting that users should be pinning their PhysicsNeMo repo versions when using a fixed saved checkpoint.
- Choose whether/how to address the fundamental issue of separation-of-checkpoints-and-architectures later.
However, I'm definitely open to being convinced - can other reviewers weigh in here? @ktangsali @RishikeshRanade
…INO), as well as eliminating relative paths
PhysicsNeMo Pull Request
Description
Checklist
Dependencies