-
Notifications
You must be signed in to change notification settings - Fork 243
Correct NM tag calculation #1536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
return readBaseMatchesRefBaseWithAmbiguity(readBase, refBase); | ||
} | ||
case NMTagMode: | ||
// TODO Different treatment for bisulfite? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are bisulfite sequences relevant here?
- NM calculation had also been performed in calculateMdAndNmTags in one pass along with calculating the MD tag. The NM tag calculation was now changed to use the calculateSamNmTag method. - Added integration tests that catch off-spec behavior. - Implemented correct handling of the "=" base.
@yfarjoun I have a recollection of you having an opinion about this? |
me too....mostly it was due to being conservative and not liking change, I think....not really a good reason. Perhaps @tfenne has a more substantial opinion? |
Just for reference, the reason why this came up is that DRAGEN apparently produces reads with Ns in them, which is the only time that this becomes relevant. This would lead to ValidateSamFile reporting an error on DRAGEN BAMs. |
@yfarjoun My opinion is similar - I worry about change. But if the default behavior in tools remains the same, and there aren't backwards compatibility breaks, I don't object. FWIW a lot of UMI consensus tools generate reads with Ns in them (e.g. fgbio), so this does come up quite a bit in those situations. And there are variant callers that use NM to filter reads, so there are use cases where a chance in this behavior would have a significant change on downstream pipelines. |
Ok I see. The methods here are indeed used for tools like MergeBamAlignment, so this will result in downstream change (for reads with ambiguity codes in them). This change would make it more spec compliant, but I understand the hesitation about change. One other option is to keep it the way it was and be more liberal in SamFileValidator, so that we don't report errors for files that are spec-compliant. What do people think? |
Fixes #1535
[AaCcGgTt]
. This is compliant to the SAM spec.