Skip to content

Scalable Genotyping for Large Cohorts #162

@ytt01

Description

@ytt01

I used Minimap2 to align ONT data and called structural variants (SVs) for an ONT cohort using Sniffles. These SVs are used as a reference panel to genotype short-read data. However, due to the large number of short-read samples, my server cannot handle joint genotyping of all samples simultaneously.
Graphtyper’s documentation suggests providing all BAMs together for optimal genotyping, but splitting samples into sub-cohorts seems necessary for computational feasibility.
My questions are:
Does Graphtyper support merging genotype results from separately processed sub-cohorts (e.g., population-specific batches) into a unified VCF?
Are there recommended tools or built-in functions in Graphtyper to merge sub-cohort VCFs while resolving potential conflicts (e.g., duplicate variants, inconsistent FORMAT/INFO fields)?
If merging is possible, what steps or precautions should be taken to ensure consistency (e.g., handling reference panels, avoiding batch effects)?
This workflow is critical for scaling to large cohorts. Any guidance would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions