Skip to content

Potential issue if we officially support exome (interval file) data analysis #29

@oskarvid

Description

@oskarvid

I did some reading about exome data analysis with the tools that we use in Selma and apparently it would produce sub optimal results according this discussion: https://gatkforums.broadinstitute.org/gatk/discussion/6894/gatk-best-practices-for-exome-targeted-capture-small-region
The important points are the following quotes:

  • you should not use BQSR on [exome data]
  • You are probably better off doing hard filtering for a small target region [instead of using VQSR]

This discussion also has good information about why BQSR is not advised for datasets with less than 100 million bases.

We discussed running hap.py on exome (interval file) data analyses but based on these points this may not be a good use of our time given that we shouldn't run the BQSR, VQSR and ApplyVQSR tools on small datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions