-
Notifications
You must be signed in to change notification settings - Fork 154
fix: regression in non-fast scalar indexing support #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #760 +/- ##
==========================================
+ Coverage 89.58% 90.07% +0.49%
==========================================
Files 11 12 +1
Lines 1008 1038 +30
==========================================
+ Hits 903 935 +32
+ Misses 105 103 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ext/ForwardDiffGPUArraysCoreExt.jl
Outdated
idxs = collect( | ||
Iterators.drop(ForwardDiff.structural_eachindex(result), offset) | ||
)[1:chunksize] | ||
result[idxs] .= partial_fn.(Ref(dual), 1:chunksize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not have an inference issue due to losing static information about size? I would think this needs to be ntuple
unless it can prove things about size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be type-stable, it would just have dynamism in the function that would slow it down a bit during the broadcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the chunksize is already an Int, so I don't think we will have any benefit of using an ntuple
Noted in #759 (comment), GPU is completely untested in ForwardDiff.jl, so this sets up the buildkite pipeline. I setup the backend and all, and just took a few tests from #760 to seed it. The point of this isn't really to be a comprehensive set of GPU tests but rather to update this repo to have the standard tools the other repos have so GPU doesn't regress again/more.
19e8423
to
da2efb7
Compare
26c1ec9
to
c4c62a4
Compare
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
2536221
to
11540a0
Compare
In #472, the Has it been properly explored if the existing functions can be written in an alternative way that would support both fast and non-fast scalar arrays with the same generic code (which would avoid any new extensions)? |
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Yes, on the master branch seeding is (again) performed without broadcasting. Depending on the structural array type the set of indices are not readily available in an allocation-free broadcastable form (e.g. set of uppertriangular indices for If we want to avoid these allocations (and the broadcasting overhead) for non-GPU arrays, I don't immediately see how this issue could be solved by a generic implementation. Possibly the amount of code duplication could be reduced by introducing a helper function or branch that based on the type of the input array switches between broadcasting and iterating (presumably defaulting to iteration?), but even in this case it would be necessary to add an extension that ensures that GPU arrays use broadcasting. Alternatively, we could default to using broadcasting (with the additional overhead of collecting the indices), and - as an additional optimization - only use iteration for a handful of selected base array types such as What are your thoughts @KristofferC? |
fixes #759
ForwardDiff.gradient
now supports GPU Arrayscc @ChrisRackauckas @devmotion