Releases: Devsh-Graphics-Programming/Nabla
v0.7.1-alpha1
What's Changed
New Features
- Github CI now tests Examples which are not added with
EXCLUDE_FROM_ALL
to the meta-example project IGeometry
,IPolygonGeometry
classes- Polygon Geometry can be used with Asset Converter from day one
CGeometryCreator
making basic geometries like cubes, cones, disks, etc.IGeometryLoader
base class (Mesh loaders are back!)- PLY Geometry Loader
- Mitsuba Serialized Geometry Loader
- CAD Example can now do Digital Terrain Models and display isolines and heightshading on tringular and grid meshes
- Precompile your Shaders to SPIR-V using NSC (with all our STL headers) and also as a CMake command!
- Precompile Shader Permutations with different JSON generated Device Capability Traits (we also have C++ autogen utilities that let you resolve the keys to them depending on device capabilities)
- Normal and Quaternion quantization caches done with our HLSL types
CGeometryManipulator
stub, more functionality from deprecatedMeshManipulator
to come- finished the
AABB.hlsl
shape - added transform, union and intersect functions which can be specialized for shapes in the HLSL library
- Almost all SPIR-V intrinsics for Raytracing Pipeline exposed (Except Ray Terminators and ReportHit) so you don't need to rely on HLSL intrinsics getting translated/codegenned properly
Removals
IRenderpassIndependentPipeline
IMesh
andIMeshBuffer
MeshPacker
s V1 and V2, we encourage programmable pulling from BDA now
Improvements
- made the
refctd_memory_resource::allocate
anddeallocate
virtual so they can be overriden (the STD::pmr didn't make sense here) - created a
adoption_memory_resource
- most of
algorithm.hlsl
also compiles as C++ Host code - Builtin resource and PCH improvements
- LRU Cache and Doubly Linked List Container improvements for resizability etc.
- Replaced Parallel Hashmap with Greg's Template Library as the submodule
- skip duplicate validation for SPIR-V optimizer
- Updated DXC
- Boost-Wave Shader preprocessor now its own Translation Unit, can be always compiled with optimizations (so even Debug builds don't take minutes preprocessing the input to DXC)
- target SPIR-V version is now a shader preprocessor option (because of
__SPIR_V_MAJOR__
and friends) fast_affine.hlsl
for doing mathematical abominations like multiplying 3x4 matrices with 3x1 vectors as if they're padded 4x4 and 4x1- cofactor and fast inverse HLSL utilities (useful for nice fast normal matrix calc)
IUtilities
has acreate
factory which can fail if it can't allocate the amount of HOST_VISIBLE memory you requested- split
MonoAssetManagerAndBuiltinResourceApplication
into two classes alPreviousStages
andallLaterStages
sync utlitity functionsemulated_vector
now haslength_helper
specialization, andinversesqrt
foremulated_float64
- constexpr
findLSB
variant - 3x3 matrix from quaternion HLSL utility
Bugfixes
DeferredFreeFunctor
actually tested and works now- AccelerationStructure::validBuildFlags infinite recursive call
- Shader compiler
adopt_memory
typo - keep boost-wave compile options consistent and encapsulate/don't leak it
Full Changelog: v0.7.0-alpha1...v0.7.1-alpha1
Multi-Entry Point SPIR-V Shaders - Removal of IGPUShader and ICPUShader
What's Changed
There's now only a single IShader
.
Shader Stage, and Specialization Info is being provided directly as Pipeline Creation Parameters.
This means that the SPIR-V each shader gets its capabilities and extensions trimmed based on the entry points used by a single pipeline.
Aggressive dead code elimination SPIR-V optimization is necessary for this to function.
Full Changelog: v0.6.2-alpha2...v0.7.0-alpha1
Bugfix: Missing `tuple.hlsl` from embed
Default build of example 23 and 29 didn't work
Workgroup2 Reductions and Scans
What's Changed
Workgroup Scans
nbl::hlsl::workgroup2
reduce + scan by @keptsecret in #876
Highly Performant, the subgroup emulated variant (Stone-Kogge adder made of subgroupShuffleUp
) up to 200% faster than native (subgroupInclusiveAdd
) on Nvidia RTX GPUs.
Blogpost incoming.
Full Changelog: v0.6.1-alpha1...v0.6.2-alpha1
Godbolt NSC Docker Image, YML Workflows for MSVC, and Image Asset Converter fix
Fixed stack overflow by not resetting empty overflow callbacks on the transfer SIntendedSubmitInfo
when uploading images in CAssetConverter
Full Changelog: v0.6.0-alpha1...v0.6.1-alpha1
Asset Converter Automated TLAS and BLAS builds & compactions, Clang MSVC builds
Asset Converter now covers 100% of asset types (except for renderpasses and framebuffers) at its inception, last feature outstanding is the RT Pipeline coming in #871
See examples of usage with and without compaction and ReBAR fast-path:
- https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/e30938c2615dd5d3ab69cadca3ba11d1e03f8233/67_RayQueryGeometry/main.cpp
- https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/e30938c2615dd5d3ab69cadca3ba11d1e03f8233/71_RayTracingPipeline/main.cpp
What's Changed
- minor improve to exclusive scan (less registers) by @keptsecret in #875
- Acceleration Structure Asset Conversion by @devshgraphicsprogramming in #872
- build: Add ClangCL profiles by @alichraghi in #791
- Working and Tested Asset Converter for Acceleration Structures by @devshgraphicsprogramming in #878
Full Changelog: v0.5.9-alpha2...v0.6.0-alpha1
Fix of v0.5.9-alpha1
What's Changed
- Update "Join our team" section of readme by @YasInvolved in #866
- Quick fix to subgroup arithmetic by @keptsecret in #874
Full Changelog: v0.5.9-alpha1...v0.5.9-alpha2
New Optimized Subgroup Arithmetic Utilities
Our emulated Subgroup Scans are 2x faster than Nvidia's implementation of KHR_shader_subgroup_arithmetic
Certain SPIR-V and GLSL intrinsics got fixed.
Acceleration Structure API refactor and Asset Converter ReBAR support
Cleaned up the inheritance hierarchy in TLAS and BLAS classes.
Now BLAS is a proper IPreHashed
and TLAS has utility static methods to convert a Polymorphic Instance to a non-polymorphic one where the type is embedded in the lower bits of the Aligned Pointer when doing a build where the Instance input buffer is a span of pointers.
Added a demote_promote_writer_readers_lock
which is well thought out and actually works.
Asset Converter can be now overriden to assume ReBAR support and create Buffers over DEVICE_LOCAL and HOST_VISIBLE memory, sidestepping the use of a transfer queue to upload the buffer data.
Queue Families declared when creating a Buffer or Image with concurrent sharing mode can now be queried from the object.
Fixed a number of bugs in Descriptor Lifetime Tracking and general inheritance patterns.
Fixed a bug in IUtilities
where overflow submits and intended submits signalling timeline semaphores upon which staging buffer deferred memory deallocations were latched, would not get patched to include the COPY_BIT
in the signal stage mask.
Minor additions of fma
to HLSL/C++ tgmath library.
TLASes can track BLASes they reference after a build + Submission Callbacks & Custom Lifetime Tracking
Lifetime Tracking for any IReferenceCounted via a Command Buffer
Absolutely awesome feature, warranting a patch version bump.
You can now call bool IGPUCommandBuffer::recordReferences(const std::span<const IReferenceCounted*> refs);
which will make the Command Buffer hold onto all the refs
until its reset.
This makes it easy to do things such as make the Command Buffer hold onto the IGPUBuffer
s whose Buffer Device Addresses you use in the shaders.
TLASes can track BLASes they use after a build
The builds are versioned as Queue's MultiTimelineEventHandler
don't immediately know that a submission has completed, the TLAS holds only one set of tracked BLASes at a time, only highest can be set.
You don't suffer from ABA signalling problem because callbacks from Host and Device builds hold onto the BLAS ranges they're about to set until the tracking info update time.
Of-course in true Nabla-style everything is overridable and callable directly by you if so wished (e.g. middleware usage).
Submission Completion Callbacks
This was doable before with the TimelineEventHandler
but you were in charge of making it and doing your own polling, now if you want to be extra lazy you can abuse the IQueue
s built-in MultiTimelineEventHandler
which it uses for CommandBuffer (and whatever it references) lifetime tracking.
Think of it as a counterpart to CUDA's cudaStreamAddCallback
or cudaLaunchHostFunc
.
If you put IReferenceCounted
in your std::function<void()>
's internal state, this allows you to also perform custom lifetime tracking but for only a single submit (as opposed to a commandbuffer which may be reused).