Fsdp tutorial update #961

coreyjadams · 2025-06-06T21:45:34Z

** NOT FOR RELEASE **

This is an overhaul of the FSDP tutorial. Let's bring it in after the release goes out.

PhysicsNeMo Pull Request

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

…buted applications (NVIDIA#906) * Wrap DeviceMesh in quotes for typing hint, to protect older torch versions (NVIDIA#905) from compatibility issues. * Bumps torch version to >=2.4.0 to minimize support surface for distributed applications. * Adds changelog note * Merge SongUNetPosLtEmb with SongUNetPosEmb and add support for batch>1 (NVIDIA#901) * mult-gpu training supported corrdiff optimization * enable mixed precision for val * clean codebase for opt * add amp_mode aware model architecture * add None checking for params * revise datatype casting schema * Add test cases for corrdiff optimizations Signed-off-by: Neal Pan <nuochengp@nvidia.com> * revised from_checkpoint, update tests and CHANGELOG Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Lint and format code properly Signed-off-by: Neal Pan <nuochengp@nvidia.com> * add multi-gpu optimization * rebase changes and update tests and configs Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * merge ResidualLoss and refactored layer and Unet init based on PR review Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Update layers.py with robust apex import * address incompatibility between dynamo and patching, retain same optimization perf w torch.compile Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update tests Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update changelog Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * initialize global_index directly on device Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * formatting Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * fix loss arguments in train.py Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * merge songunetposembd with songuneyposltembd with index slicing (recompile issue persists) Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * fix small errors in songunet Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * revise positional_embedding_indexing to avoid recompile/graph break and with faster bw comparing to old version Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update changelog Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * add back SongUNetPosLtEmbd class for better ckp loading Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * add forward in SongUnetLtPosEmbd and update train.py Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update test for lt model Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update comments for embedding_selector test for lt model Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update doctest Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Added tiny detail in corrdiff readme Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * minor update to arguments and docstring Signed-off-by: jialusui1102 <jialusui1102@gmail.com> --------- Signed-off-by: Neal Pan <nuochengp@nvidia.com> Signed-off-by: jialusui1102 <jialusui1102@gmail.com> Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Alicia Sui <asui@cw-pdx-cs-001-vscode-01.cm.cluster> Co-authored-by: Neal Pan <nuochengp@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com> * Update CHANGELOG.md Fix lint error --------- Signed-off-by: Neal Pan <nuochengp@nvidia.com> Signed-off-by: jialusui1102 <jialusui1102@gmail.com> Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Corey adams <coreyjadams@gmail.com> Co-authored-by: Jialu (Alicia) Sui <125910753+jialusui1102@users.noreply.github.com> Co-authored-by: Alicia Sui <asui@cw-pdx-cs-001-vscode-01.cm.cluster> Co-authored-by: Neal Pan <nuochengp@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com>

* fixing model.py to make comapatible with NIM * adding freq buffer to ParameterModel * formatting --------- Co-authored-by: Rishi Ranade <rranade@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>

* Make sure that gpu processing and output settings are configurable. Set sensible fdefaults in the example config * Make sure that gpu processing and output settings are configurable. Set sensible fdefaults in the example config

* make dali optional * update Changelog

…ing cuda is available. (NVIDIA#943)

* update to make it compatible for windows * update darcy fno to minimize the dependencies to make it very light-weight and hello-worldy * use pathlib * lint * updates to checkpoint loading

* updating readme * Adding prerequisites section * fixing ci issues * linting --------- Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Kaustubh Tangsali <ktangsali@nvidia.com>

Fix broken ShardTensor link.

… samples (NVIDIA#949) * add requirements.txt for bloodflow and deforming plate * move diffusion example (NVIDIA#930) * move diffusion example * update broken links * add requirements for flow reconstruction

* Add datapipes docs. * Fix class names.

… curation steps (NVIDIA#953) Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>

* update logging, launch, utils api docs with added descriptions and examples * update introductory tutorial for typos and added clarity

* Adding first half of torch compile tutorial. * fixes to formatting and syntax * Add second half of torch.compile tutorial. * Clean up organization of performance docs. * Minor clean up on perf table teasers * remove all but IO section * Fix typos in torch compile tutorial

* add tutorial on physics informing * add geometry stuff * fix typos * add some opening text to index.rst * add summary * typos * address feedback * address feedback * add Ram's changes --------- Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

…ed into the docs.

* update lr_decay_rate to be configurable Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update lr_decay_rate comment Signed-off-by: jialusui1102 <jialusui1102@gmail.com> --------- Signed-off-by: jialusui1102 <jialusui1102@gmail.com>

* Massive refactor on domino utils.py to improve code quality * Adds missing tensorboard requirement * Fixes missing cuml requirement * Begins process of fixing inference_on_stl.py * Fixes outdated type definition * black formatting pass * Fixes import order * black formatting * Reshape accepts a shape, not a splatted iterable * Fixes lost array axis * Enhances docstrings in utils.py with examples and improved clarity; removes outdated examples. * Enhances area_weighted_shuffle_array function by adding area_factor parameter for adjustable sampling bias; updates docstring with detailed explanation and examples. * Updates docstrings in utils.py for accuracy and clarity; modifies examples in calculate_center_of_mass, standardize, nd_interpolator, pad, and pad_inp functions; adjusts k-nearest neighbors parameter in nd_interpolator for flexibility; corrects boolean checks in pad and pad_inp examples. * black format * Add test suite for domino utils module This commit introduces a new test file `test_domino_utils.py` that includes comprehensive unit tests for various functions in the domino utils module. Each test verifies the functionality of the corresponding utility function using examples from the documentation, ensuring correctness and reliability. * Refactor array_type function to handle CuPy import gracefully and optimize area_weighted_shuffle_array for consistent array handling. Remove redundant test for array_type. * Import PyVista conditionally in extract_surface_triangles function to avoid unnecessary dependency loading. * black formatting * Remove unused import

…de fixes (NVIDIA#973) * clarifies I/O in domino train.py * Gives paths in config.yaml user-agnostic pathnames * Switches from relu -> gelu to allow smooth gradients * Adds initial commit for design sensitivities study * Corrects outdated type hint * Refactors parameters in signed_distance_field calls for clarity * Refactors directory handling in create_directory and get_filenames functions to use pathlib for improved readability and functionality. Updates type hints to support both str and Path types. * Deletes merge(); this function is (a) not used anywhere, (b) can be replaced simply by the built-in sum(lists), and (c) as-written will always create an error, since `newlist` is a tuple and hence does not have a .extend() method. * black formatting * Code quality improvements * Replaces 'axis' with 'dim' in torch.cat calls for correctness with PyTorch documentation in GeoProcessor, GeometryRep, NNBasisFunctions, ParameterModel, and DoMINO classes. * Adds initial changes for DoMINO sensitivity * Refactors DesignDatapipe and DoMINOInference for improved readability and performance; updates type hints and formatting, and modifies input handling for mesh data. * Refactors DesignDatapipe to directly use STL centers for geometry coordinates; updates DoMINOInference to improve memory management and adds detailed docstrings for clarity. * Enhances DesignDatapipe by updating bounding box type hints, improving random sampling, and adding detailed docstrings for initialization and item retrieval methods. * Implements Laplacian smoothing for mesh data in a new utility function; updates DoMINOInference to utilize the new smoothing function and modifies sensitivity calculations accordingly. Enhances type hints and formatting for clarity. * Adds numba to requirements for improved performance in sensitivity analysis * Adds sbatch_logs/ to .gitignore to exclude SLURM batch log files from version control. * Adds compute-optimized mesh_postprocessing utilities * Working `main.py` with abstracted postprocessing step * formatting * Refactors main.py to remove duplicate STL combining function and streamline input handling. Updates input file processing and enhances results storage for mesh data. * Commits configuration files for sensitivity studies * Adds requirements.txt * Adds raw and smooth drag gradient data files, and implements a plotting script for gradient checking. * Refactors import statements in main.py for consistency and clarity. Streamlines input file path construction. * Creates main_gradient_checking.py for drag gradient checking using DoMINOInference, including sensitivity analysis and output to text files. * Updates file paths in main_gradient_checking.py and plot_gradient_checking.py to save output data in a dedicated gradient_checking_results directory. Adds new raw and smooth drag gradient data files. * Adds a new aerodynamics example using DoMINO to compute design sensitivities (e.g., drag adjoint) with respect to underlying input geometry in CHANGELOG.md. * Add README.md for DoMINO sensitivity analysis pipeline, detailing usage, features, and configuration for aerodynamic design optimization. * black formatting fixes * Add SPDX license headers to plot_gradient_checking.py * Fixes markdownlint * Removes unused import * Updates license year * Fixes license year * Removes unused main block sections * Removes erroneous uv.lock commit * Removes some optimization language * Remove unnecessary cached yaml * Refactors to not require separate config (instead pulling it from DoMINO), as well as eliminating relative paths * Add warning for loading model without checkpoint in DoMINOInference * Add verbose option to DoMINOInference for memory usage logging * Refactor imports in design_datapipe.py for clarity and efficiency; remove unused imports and reorganize necessary ones. * Refactor DesignDatapipe to use NearestNeighbors from cuML for neighbor finding; update input handling in DoMINOInference for improved tensor management and type consistency. * Enhance DesignDatapipe to accept a device parameter for tensor management; update tensor creation in DoMINOInference for improved efficiency and consistency. * Readme cleanup * Replace GELU activation with a configurable activation function in GeoProcessor. * formatting * remove duplicate section * Makes activations configurable * formatting * add license

* Add PyG version of VortexShedding example and VortexSheddingDataset * Replace Union type hints with an alias. Add MeshNodeBlock tests. * Add distributed sampler to the example. Add MeshEdgeBlock test. Fix DGL inference script. * Fix VortexShedding PyG inference script * Add MGN DGL2PYG tests. * Update inference notebooks * Make linter happy. * Fix test. * Update req.txt. Clean up TODO * Address review feedback. * Update README * Add proper epoch loss reporting * Address review feedback. * Require DGL or PyG only when necessary

* Add correctness test for deterministic ssampler * lint * drop np dep

…#1012) * Removed unecessary check in args overriding Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Replaced exception with warning in argument overriding Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

* Use e2grid healpixpad when possible * Drop unused imports * changelog * formatting

* address vdr comments * fix lint * fix lint --------- Co-authored-by: root <root@eos0014.eos.clusters.nvidia.com>

* Migrate Vortex Shedding Reduced Mesh example to PyG * Update CHANGELOG

…lobal parameters input (NVIDIA#903) * changes based on updated main branch * update to model.py and end to end testing * changes to sharded parts of the code * Update README * Update inference_on_stl.py to comply with new method * minor refactor * update * Tested training * remove hardcoded stuff from inference_on_stl.py * Removed comments from model.py * Remove air_density and stream_velocity from domino_sharded * Remove comments from domino_datapipe * Removed names and make paths generic * make encode_parameters false * Update and remove comments * Update README * Update README, remove redundant text * Update model.py to remove air_density and stream_velocity * Update inference_on_stl.py to be consistent with main * Update README.md to be compliant with main * Update tests * changes based on CI * small cleaning config.yaml * Update changelog * fixing doctest issue --------- Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> Co-authored-by: Rishikesh Ranade <dr.rranade@gmail.com> Co-authored-by: RishikeshRanade <65631160+RishikeshRanade@users.noreply.github.com>

…A#1000) * make dimensions consistent for checkpointing * add use_reentrant=False to checkpoint in songuent for torch.compile support * removed use_patch_grad_acc from loss_valid_kwargs in corrdiff train.py script as the regression loss does not support it * set graph static for corrdiff training to enable checkpoint * change the checkpoint reference dimension from x to y as it is the same dimension used to name the layers * correct positional embedding in song unet * correct embedding for gridtype==test and N_grid_channels==2 * Change single dimension shape with geometric mean to use checkpointing * reformatted --------- Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>

…t gracefully (NVIDIA#1019)

…g qkv and added inference optimization and fixes (NVIDIA#954) * restructured attention into separate class and fix errors in reshaping qkv Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update CHANGELOG Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * revert earlier changes in train.py Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * add multiple inference optimization for CorrDiff Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * minor update Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * add attention ckp conversion and restructure use_fp16 logistics Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * update unit tests for fp16 Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Minor formatting to the Attention docstring Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed private attribute _use_fp16 initialization in UNet end EDMPrecondSuperResolution Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Made overlap_count a private argument in patching and the method _get_overlap_count a private method Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added non-regression test for GridPatching2D and get_overlap_count method Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added API doc for use_fp16 method in UNet wrapper Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added docs for overlap_count argument in image_fuse Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed utils subdirectory in tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed some pytest package confusion in utils testing Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * restructure get_overlap_count() as a static method and update related unit tests Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Minor formatting in docstring for get_overlap_count Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Minor detail in docstring for image_fuse Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Changed exepct path for non-regression reference data used in test_patching Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * only do attn ckp conversion for UNet based models Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * add comment to move attn ckp conversion to classes later Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Consistently set stochastic sampler precision to float32 Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Moved attention module conversion to UNetBlcok load_state_dict method Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Minor renaming in UNetBlock Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Simplified warning logic for attention module's keys mapping Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated corrdiff train and generate recipes with overridable args Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added validation to make sure amp_mode is disabled when torch.autocast is disabled Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Implemented automated channels_last layout in SongUNet when using use_apex_gn Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fix CI: added attribute use_apex_gn to SongUNet Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Refactored amp_mode and profile_mode properties for SongUNets and their wrappers Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added two distinct shape-specific apply_wrapper in stochastic sampler Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated tests to be compatible with the modified amp_mode API Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fix pytorch deprecation warning for is_autocast_enabled Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Implemented property factory for amp_mode and profile_mode in model wrappers + added them to StormCastUNet to pass CI tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CI tests for diffusion models Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * resolve conflicts between cpu and apex and update related CI Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * resolve recompile errors for stochastic sampler in CICD Signed-off-by: jialusui1102 <jialusui1102@gmail.com> * Updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Some comments in SongUNets Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated docs with amp_mode and profile_mode APIs Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: jialusui1102 <jialusui1102@gmail.com> Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com>

* fixed grid effect * added data filter * added data filter * updated comment --------- Co-authored-by: Oliver Hennigh <ohennigh@login-eos01.eos.clusters.nvidia.com>

…IA#982) * Fix regression output shape * Only use act if fused_act is True * Avoid dtype change of buffer/param and fix softmax dtype * Added unit tests for song unet models with learnable positional embedding, lead time aware, with compile, apex_gn, etc... Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated tests for SongUNetPosLtEmbd with AMP, Apex GN and compile Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Renamed variable in SongUNetPOsEmbd Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Revert bug introduced in SongUNetPosEmbd positional_embedding_selector Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Reverted test script to its original state Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed some new CI tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added missing parameter in new tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added dtype casting in SongUNetPosEmbd forward Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed number of channels in new tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added random seed in new tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added more missing random seeds to new tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Removed some random seeds added by mistake in new tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: Julius Berner <jberner@nvidia.com> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: Charlelie Laurent <claurent@nvidia.com>

* first commit * add README.md * add README.md * add README.md * revise for 2nd round code review * revise for 2nd round code review * CHANGELOG update for TopoDiff * code reivew for merge * code review * add command to run the model * add command to run the model * add command to run the model * add command to run the model * avoid floating material in generation * avoid floating material in generation * topodiff merge * topodiff merge * topodiff merge * topodiff merge * Topodiff merge * Topodiff merge * Topodiff merge * Topodiff merge * formatting * .formatting, name change * fix bugs, cleanup * fix pydantic --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: root <root@eos0543.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0307.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0175.eos.clusters.nvidia.com>

* adding moe * address review comments, update readme * Small bug fix for preprocessor * address review comments --------- Co-authored-by: root <root@eos0287.eos.clusters.nvidia.com> Co-authored-by: root <root@eos0247.eos.clusters.nvidia.com>

* fixed grid effect * uv fix * blaa * removed nemo build * added unmannaged --------- Co-authored-by: Oliver Hennigh <ohennigh@login-eos01.eos.clusters.nvidia.com>

* Refactor signed_distance_field function in sdf.py for improved clarity and performance. Update parameter types to use np.ndarray and cp.ndarray, enhance docstring with detailed descriptions and examples, and streamline array conversion logic. * Optimize memory allocation in signed_distance_field function by using wp.empty instead of wp.zeros. Update array dimensions for kernel launch and streamline return logic. * Enhance docstring in signed_distance_field function to clarify parameters and return types, including GPU acceleration details and usage of sign winding number method. Remove unnecessary blank line. * Enhance docstring in signed_distance_field function to provide clearer explanation of the 'include_hit_points' parameter, specifying its role in defining the SDF. * formatting * Fix formatting inconsistencies in docstring of signed_distance_field function in sdf.py. * Adds fix for back-compatibility with input_points arrays with incorrect shape

…o-CFD (NVIDIA#1032)

* Added experimental tEDMPrecondSuperRes Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Some refactors in diffusion ResidualLoss to accomodate t-EDM subclass Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added experimental t-EDM loss Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added warning message when importing from physicsnemo.experimental Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Some fixes in docstrings Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added student-t distribution in StackedRandomGenerator Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added t-student option in corrdiff diffusion_step Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added t-student distribution option in corrdiff generate.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated warning message for student-t distribution Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Corrected wrong import in experimental diffusion metrics Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added t-student distribution option in CorrDiff train.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Minor string modif Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Some minor renaming and reformating Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added another safety check to CorrDiff generate.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added tests for t-EDM models, metrics and utils Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Moved t-EDM tests to existing directories Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Some fixes in t-edm tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Fixed missing device in diffusion_step Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Added a few missing docstrings for StackedRandomGenerator Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Changed default value of P_mean to 0 in t-EDM loss Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Made P_mean and P_std configurable in CorrDiff train.py and generate.py Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Updated CHANGELOG.md to document configurable P_mean and P_std Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * A few fixes in CorrDiff Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com>

…1035) * Bumps ruff from 0.0.290 to 0.12.5. Removes black, which is superseded by ruff-format. * Refactor ruff configuration in pyproject.toml to use non-deprecated settings * Migrates pre-commit settings to repo-wide settings * Replaces black with ruff-format in Makefile and updates linting commands to use ruff-check. * Adds Ruff note to Changelog * Update CONTRIBUTING.md to reflect changes in CI checks, replacing black with ruff for formatting and linting instructions. * Avoids acronyms * Adds docs about Ruff * Markdownlint fixes * Implements Ruff safe fixes * Adds hand-written fixes for lint errors * Refactors _check_checkpoint to remove duplicate code * Addresses Ruff lint issues with tarfile.extractall(), which appropriate modifications for back-compatibility with Python < 3.12.

* add patching support for determinstic sampler * code cleanup and unit test update * use patching wraper and fix pytest functions * change utils.generative to utils.diffusion * set default to torch.float64 * do compilation in determinstic sampler * update * Identified and fixed critical bug in stochastic_sampler and deterministic_sampler Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Format CHANGELOG.md Signed-off-by: Charlelie Laurent <claurent@nvidia.com> * Implements wrapper selector to fix compile issues in tests Signed-off-by: Charlelie Laurent <claurent@nvidia.com> --------- Signed-off-by: Charlelie Laurent <claurent@nvidia.com> Co-authored-by: root <root@cw-dfw-h100-004-251-012.cm.cluster> Co-authored-by: Charlelie Laurent <84199758+CharlelieLrt@users.noreply.github.com> Co-authored-by: root <root@cw-dfw-h100-004-211-033.cm.cluster> Co-authored-by: root <root@cw-dfw-h100-004-270-026.cm.cluster> Co-authored-by: Charlelie Laurent <claurent@nvidia.com>

* resolving merge conflicts with main * fixing bugs * fixing CI errors * fixing merge conflicts in config * modifying Changelog * Update config.yaml * cpu processing in area_weighted_sampling * fixing naming issue in domino_datapipe.py * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update examples/cfd/external_aerodynamics/domino/src/conf/config.yaml Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update physicsnemo/models/domino/model.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Update examples/cfd/external_aerodynamics/domino/src/train.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * fixing PR comments * addressing PR comments * fixing CI issues * fixing pytest issues in utils --------- Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

* Add generic neighbor finding function that is suitable to use in FigConvNet, DoMINO, and mesh graph data pipes. * Fix an illegal device access when using multiple GPUs. * Performance tuning of neighbor query * Add warp-enabled radius search. Also add testing. * Update neighbor search tools to ensure we use 0 as the null index instead of -1 * Switch domino to use the new radius search function instead of ball query. This is functionally the same, though shows a performance enhancement. * Remove neighborlist function. Replaced with radius_search. * Using typing for annotations for CI * Update examples/minimal/neighbor_list/warp_neighbor_list.py Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com> * Address nits and minor comments from PR review. * Relocate radius search code. * Remove old folders; goes with previous commit. * Update test import. * The CI container does not accept list[int] as an acceptable type for pytorch. * Make sure radius search is exported as a function, not a module. * Fixing formatting, since the linter appears to have changed .... * Remove cuda opcheck test temporarily --------- Co-authored-by: Peter Sharpe <peterdsharpe@gmail.com>

coreyjadams · 2025-08-13T18:36:11Z

Moved to unified docs; closing.

ktangsali and others added 28 commits May 21, 2025 18:58

update version

1625598

Bug fix configure domino pipeline device (NVIDIA#921)

fc7e8f7

* Make sure that gpu processing and output settings are configurable. Set sensible fdefaults in the example config * Make sure that gpu processing and output settings are configurable. Set sensible fdefaults in the example config

Make NVIDIA Dali optional (NVIDIA#942)

1b13fa9

* make dali optional * update Changelog

Remove bare cuda usage and protect automatic op registration by check…

f3fa91a

…ing cuda is available. (NVIDIA#943)

Fix issues with readmes (NVIDIA#935)

460b1e2

Updates for Windows (NVIDIA#945)

c796885

* update to make it compatible for windows * update darcy fno to minimize the dependencies to make it very light-weight and hello-worldy * use pathlib * lint * updates to checkpoint loading

Fix typos (NVIDIA#946)

1c39f00

Updating DoMINO Readme (NVIDIA#941)

db52903

* updating readme * Adding prerequisites section * fixing ci issues * linting --------- Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Kaustubh Tangsali <ktangsali@nvidia.com>

Update README.md (NVIDIA#931)

05959a1

Fix broken ShardTensor link.

Adding requirements.txt for bloodflow, diffusion, and deforming plate…

35e2b57

… samples (NVIDIA#949) * add requirements.txt for bloodflow and deforming plate * move diffusion example (NVIDIA#930) * move diffusion example * update broken links * add requirements for flow reconstruction

Add datapipes docs. (NVIDIA#951)

711536b

* Add datapipes docs. * Fix class names.

update weather examples reqs and readmes to include data download and…

80baa4b

… curation steps (NVIDIA#953) Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>

API Documentation Updates for Launch and Utilities (NVIDIA#955)

5a98009

* update logging, launch, utils api docs with added descriptions and examples * update introductory tutorial for typos and added clarity

Snapshot of updates for ST + FSDP tutorial

5e6b6eb

update links to physicsnemo (NVIDIA#958)

d396785

Fix broken link

54b5aab

update links to physicsnemo (NVIDIA#958)

4d2eeb1

Repackage ST+FSDP tutorial model to be more easily and cleanly includ…

74ced64

…ed into the docs.

ensure utility functions are easily in the docs

6e8c2d5

Use nicer formatting extensions for tutorial pages.

1f71508

Updates to fsdp tutorial: bug fix; code blocks

017aee0

Merge branch '1.1.0-rc' into fsdp-tutorial-update

b4a0f35

Update to FSDP tutorial

9fafb4c

coreyjadams changed the base branch from 1.1.0-rc to main August 1, 2025 12:27

CharlelieLrt and others added 27 commits August 1, 2025 08:19

Fix type error in Module's overridable arguments (NVIDIA#1004)

5293946

Fix test_dataloader DistributedSampler (NVIDIA#1002)

cbba021

Add correctness test for deterministic sampler (NVIDIA#993)

a0097c7

* Add correctness test for deterministic ssampler * lint * drop np dep

Use earth2grid HEALPixPad kernels when possible (NVIDIA#1014)

197c735

* Use e2grid healpixpad when possible * Drop unused imports * changelog * formatting

Update blossom-ci.yml (NVIDIA#1016)

736b5a7

Update README (container version, link to curator) (NVIDIA#1018)

9292cca

* address vdr comments * fix lint * fix lint --------- Co-authored-by: root <root@eos0014.eos.clusters.nvidia.com>

Migrate Vortex Shedding Reduced Mesh example to PyG (NVIDIA#1015)

062bfcd

* Migrate Vortex Shedding Reduced Mesh example to PyG * Update CHANGELOG

Refactor ReshapedLayerNorm to handle missing transformer_engine impor…

8da8cb1

…t gracefully (NVIDIA#1019)

Fea tar filter deprecation fix (NVIDIA#1024)

7193f92

* fixed grid effect * added data filter * added data filter * updated comment --------- Co-authored-by: Oliver Hennigh <ohennigh@login-eos01.eos.clusters.nvidia.com>

Fea uv support (NVIDIA#1029)

c08dcb6

* fixed grid effect * uv fix * blaa * removed nemo build * added unmannaged --------- Co-authored-by: Oliver Hennigh <ohennigh@login-eos01.eos.clusters.nvidia.com>

Deletes domino design sensitivities, so it can be moved to PhysicsNeM…

e22befb

…o-CFD (NVIDIA#1032)

coreyjadams closed this Aug 13, 2025

coreyjadams deleted the fsdp-tutorial-update branch August 13, 2025 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fsdp tutorial update #961

Fsdp tutorial update #961

Uh oh!

coreyjadams commented Jun 6, 2025

Uh oh!

coreyjadams commented Aug 13, 2025

Uh oh!

Uh oh!

Fsdp tutorial update #961

Fsdp tutorial update #961

Uh oh!

Conversation

coreyjadams commented Jun 6, 2025

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Uh oh!

coreyjadams commented Aug 13, 2025

Uh oh!

Uh oh!