Skip to content

Releases: Devsh-Graphics-Programming/Nabla

v0.7.1-alpha1

18 Aug 14:18
69c0561
Compare
Choose a tag to compare
v0.7.1-alpha1 Pre-release
Pre-release

What's Changed

New Features

  • Github CI now tests Examples which are not added with EXCLUDE_FROM_ALL to the meta-example project
  • IGeometry, IPolygonGeometry classes
  • Polygon Geometry can be used with Asset Converter from day one
  • CGeometryCreator making basic geometries like cubes, cones, disks, etc.
  • IGeometryLoader base class (Mesh loaders are back!)
  • PLY Geometry Loader
  • Mitsuba Serialized Geometry Loader
  • CAD Example can now do Digital Terrain Models and display isolines and heightshading on tringular and grid meshes
  • Precompile your Shaders to SPIR-V using NSC (with all our STL headers) and also as a CMake command!
  • Precompile Shader Permutations with different JSON generated Device Capability Traits (we also have C++ autogen utilities that let you resolve the keys to them depending on device capabilities)
  • Normal and Quaternion quantization caches done with our HLSL types
  • CGeometryManipulator stub, more functionality from deprecated MeshManipulator to come
  • finished the AABB.hlsl shape
  • added transform, union and intersect functions which can be specialized for shapes in the HLSL library
  • Almost all SPIR-V intrinsics for Raytracing Pipeline exposed (Except Ray Terminators and ReportHit) so you don't need to rely on HLSL intrinsics getting translated/codegenned properly

Removals

  • IRenderpassIndependentPipeline
  • IMesh and IMeshBuffer
  • MeshPackers V1 and V2, we encourage programmable pulling from BDA now

Improvements

  • made the refctd_memory_resource::allocate and deallocate virtual so they can be overriden (the STD::pmr didn't make sense here)
  • created a adoption_memory_resource
  • most of algorithm.hlsl also compiles as C++ Host code
  • Builtin resource and PCH improvements
  • LRU Cache and Doubly Linked List Container improvements for resizability etc.
  • Replaced Parallel Hashmap with Greg's Template Library as the submodule
  • skip duplicate validation for SPIR-V optimizer
  • Updated DXC
  • Boost-Wave Shader preprocessor now its own Translation Unit, can be always compiled with optimizations (so even Debug builds don't take minutes preprocessing the input to DXC)
  • target SPIR-V version is now a shader preprocessor option (because of __SPIR_V_MAJOR__ and friends)
  • fast_affine.hlsl for doing mathematical abominations like multiplying 3x4 matrices with 3x1 vectors as if they're padded 4x4 and 4x1
  • cofactor and fast inverse HLSL utilities (useful for nice fast normal matrix calc)
  • IUtilities has a create factory which can fail if it can't allocate the amount of HOST_VISIBLE memory you requested
  • split MonoAssetManagerAndBuiltinResourceApplication into two classes
  • alPreviousStages and allLaterStages sync utlitity functions
  • emulated_vector now has length_helper specialization, and inversesqrt for emulated_float64
  • constexpr findLSB variant
  • 3x3 matrix from quaternion HLSL utility

Bugfixes

  • DeferredFreeFunctor actually tested and works now
  • AccelerationStructure::validBuildFlags infinite recursive call
  • Shader compiler adopt_memory typo
  • keep boost-wave compile options consistent and encapsulate/don't leak it

Full Changelog: v0.7.0-alpha1...v0.7.1-alpha1

Multi-Entry Point SPIR-V Shaders - Removal of IGPUShader and ICPUShader

18 Jun 19:42
Compare
Choose a tag to compare

What's Changed

There's now only a single IShader.

Shader Stage, and Specialization Info is being provided directly as Pipeline Creation Parameters.

This means that the SPIR-V each shader gets its capabilities and extensions trimmed based on the entry points used by a single pipeline.

Aggressive dead code elimination SPIR-V optimization is necessary for this to function.

Full Changelog: v0.6.2-alpha2...v0.7.0-alpha1

Bugfix: Missing `tuple.hlsl` from embed

18 Jun 12:41
Compare
Choose a tag to compare
Pre-release

Default build of example 23 and 29 didn't work

Workgroup2 Reductions and Scans

18 Jun 09:29
Compare
Choose a tag to compare
Pre-release

What's Changed

Workgroup Scans

nbl::hlsl::workgroup2 reduce + scan by @keptsecret in #876

Highly Performant, the subgroup emulated variant (Stone-Kogge adder made of subgroupShuffleUp) up to 200% faster than native (subgroupInclusiveAdd) on Nvidia RTX GPUs.

Blogpost incoming.

Full Changelog: v0.6.1-alpha1...v0.6.2-alpha1

Godbolt NSC Docker Image, YML Workflows for MSVC, and Image Asset Converter fix

03 Jun 23:16
Compare
Choose a tag to compare

Fixed stack overflow by not resetting empty overflow callbacks on the transfer SIntendedSubmitInfo when uploading images in CAssetConverter

Full Changelog: v0.6.0-alpha1...v0.6.1-alpha1

Asset Converter Automated TLAS and BLAS builds & compactions, Clang MSVC builds

26 May 08:12
Compare
Choose a tag to compare

Asset Converter now covers 100% of asset types (except for renderpasses and framebuffers) at its inception, last feature outstanding is the RT Pipeline coming in #871

See examples of usage with and without compaction and ReBAR fast-path:

What's Changed

Full Changelog: v0.5.9-alpha2...v0.6.0-alpha1

Fix of v0.5.9-alpha1

12 May 14:10
7896216
Compare
Choose a tag to compare
Fix of v0.5.9-alpha1 Pre-release
Pre-release

What's Changed

Full Changelog: v0.5.9-alpha1...v0.5.9-alpha2

New Optimized Subgroup Arithmetic Utilities

12 May 11:31
50fd2e2
Compare
Choose a tag to compare

Our emulated Subgroup Scans are 2x faster than Nvidia's implementation of KHR_shader_subgroup_arithmetic

Certain SPIR-V and GLSL intrinsics got fixed.

Acceleration Structure API refactor and Asset Converter ReBAR support

06 May 13:32
4793d18
Compare
Choose a tag to compare

Cleaned up the inheritance hierarchy in TLAS and BLAS classes.

Now BLAS is a proper IPreHashed and TLAS has utility static methods to convert a Polymorphic Instance to a non-polymorphic one where the type is embedded in the lower bits of the Aligned Pointer when doing a build where the Instance input buffer is a span of pointers.

Added a demote_promote_writer_readers_lock which is well thought out and actually works.

Asset Converter can be now overriden to assume ReBAR support and create Buffers over DEVICE_LOCAL and HOST_VISIBLE memory, sidestepping the use of a transfer queue to upload the buffer data.

Queue Families declared when creating a Buffer or Image with concurrent sharing mode can now be queried from the object.

Fixed a number of bugs in Descriptor Lifetime Tracking and general inheritance patterns.

Fixed a bug in IUtilities where overflow submits and intended submits signalling timeline semaphores upon which staging buffer deferred memory deallocations were latched, would not get patched to include the COPY_BIT in the signal stage mask.

Minor additions of fma to HLSL/C++ tgmath library.

TLASes can track BLASes they reference after a build + Submission Callbacks & Custom Lifetime Tracking

01 Apr 10:25
4e43183
Compare
Choose a tag to compare

Lifetime Tracking for any IReferenceCounted via a Command Buffer

Absolutely awesome feature, warranting a patch version bump.

You can now call bool IGPUCommandBuffer::recordReferences(const std::span<const IReferenceCounted*> refs); which will make the Command Buffer hold onto all the refs until its reset.

This makes it easy to do things such as make the Command Buffer hold onto the IGPUBuffers whose Buffer Device Addresses you use in the shaders.

TLASes can track BLASes they use after a build

The builds are versioned as Queue's MultiTimelineEventHandler don't immediately know that a submission has completed, the TLAS holds only one set of tracked BLASes at a time, only highest can be set.

You don't suffer from ABA signalling problem because callbacks from Host and Device builds hold onto the BLAS ranges they're about to set until the tracking info update time.

Of-course in true Nabla-style everything is overridable and callable directly by you if so wished (e.g. middleware usage).

Submission Completion Callbacks

This was doable before with the TimelineEventHandler but you were in charge of making it and doing your own polling, now if you want to be extra lazy you can abuse the IQueues built-in MultiTimelineEventHandler which it uses for CommandBuffer (and whatever it references) lifetime tracking.

Think of it as a counterpart to CUDA's cudaStreamAddCallback or cudaLaunchHostFunc.

If you put IReferenceCounted in your std::function<void()>'s internal state, this allows you to also perform custom lifetime tracking but for only a single submit (as opposed to a commandbuffer which may be reused).