Skip to content

Conversation

@Zephyr271828
Copy link

Hi maintainers of Sheared Llama, thank you for your brilliant work! I'm using this repo for my own research and I modified some parts to make if work on my machine. I'm making a PR here to see if my modifications can benefit other researchers.

What does this PR do?

This PR tries to address 2 problems:

  1. flash-attn-2 compatibility. As mentioned here, flash-attn-2 is not supported by Sheared Llama at this moment. It turns out we only need to modify the package version and a function call to make it work.
  2. NaN loss handling: As mentioned in LanguageCrossEntropy logs nan when bash pruning.sh #34 , sometimes LanguageCrossEntropy becomes NaN becomes the number of samples in some domains is too small or even 0. Therefore, this PR checks if number of samples of 0 before doing the division, which solves the NaN issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant