-
Notifications
You must be signed in to change notification settings - Fork 788
Bisected regression runs genOffsetHash leading to wasteful loop when processing ofs-delta #1451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Commit bb5c35196ee55b82f6506d69257de0e59cd8c41d from the main branch of go-git causes ridiculous amounts of CPU and RAM usage with tree.Size() for large repos like asahi-linux.git. References: go-git/go-git#1451
The 'assign to a map a million times' behaviour could be seen in commits as old as 83649a1 when running Additionally, memory usage also explodes |
I believe I'm experiencing the same issue. In my case I'm cloning a repository to obtain a file at a known path on a given revision (could be any tag, branch, or sha). It seems the only way to do do this is using package main
import (
"log"
"github.com/go-git/go-billy/v5/memfs"
"github.com/go-git/go-git/v5"
"github.com/go-git/go-git/v5/storage/memory"
)
func main() {
cloneOpts := &git.CloneOptions{
URL: "https://github.com/red-hat-data-services/RHOAI-Build-Config.git",
NoCheckout: true,
}
_, err := git.Clone(memory.NewStorage(), memfs.New(), cloneOpts)
if err != nil {
log.Fatalf("clone error: %w", err)
}
} It's worth noting as well, though this may warrant opening a separate issue, that |
Not sure what you mean by a "full checkout"... it works with bare repos so that's definitely not required. Are you referring to shallow clones (which would produce a much smaller packfile)?
|
@runxiyu yes, sorry for the confusion. “Full Checkout” was intended to mean |
@aThorp96 maybe a sparse-checkout would make things slightly better on your case, as you don't need to go through the entire worktree. |
@runxiyu this does not seem to be the case with the code you provided. Are you taking about a different code snippet? |
This is the behavior that I observe on an aarch64 linux machine. The 0.1s was right before the mentioned commit, and the 1s is the mentioned commit |
@runxiyu I timed the specific ops in your code and tested with the latest released version (without the "bad commit") and against the latest v6. The difference in performance is neglegible:
As far as I can see, the changes pointed out do not have an impact on the execution path. That being said, you mentioned |
recording.webmI haven't investigated sha1cd yet (though I probably don't need it, as I am a server myself, and I handle pushes through git-receive-pack instead of go-git) |
@runxiyu thanks for sharing the video. This aligns with what I mentioned before as The last benchmarks results I have on arm64 are a few years old. Do you mind running this just to see whether you getting an even worse performance than expected? |
sha1cd benchmark:
|
recording.webmI think this pretty conclusively shows that this commit is the issue here
|
I noticed that go-git is using a ton of CPU and memory when doing simple operations like iterating through a tree on large repositories.
Test program
Test environment
This commit from Asahi Linux
go version go1.23.6 linux/arm64
M1 MacBook Air (2020)
Expected behavior
The entire process completes in about 0.1s.
Actual behvior
It takes about 1.3s.
In my scenario it's
.gitignore
that's being slow totree.Size
on.Profiling
It looks like the main problem is how genOffsetHash uses maps. I'm no expert when it comes to the Go runtime so I'll leave it here.
First bad commit
Call stack
Interestingly, genOffsetHash is only called occasionally, and it takes a long time whenever it is called.
The second-level loop at [1] seems to be called
idx.Count()
times, which is 10950545 in my case. Quite reasonably this slows things down.genOffsetHash
was only called for the file that slowed things down. It wasn't called at all before the problematic commit.Now, I think setting a map 10950545 times sounds problematic... but this seems to have always been the behavior here. I'm looking into what caused
genOffsetHash
to be called.This has previously returned
but after the problematic commit it returns
Previously
FindHash
is only called one time per file, but now it's being called at least two times for each file, and for.gitignore
it's called three times (with the last one returning a null hash).Notes
It's on main but not in any releases yet
The text was updated successfully, but these errors were encountered: