Skip to content

Fixes to Consensus Calling and Updated Depth Calculation #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
0e90722
Bug Fix - Truncated Consensus Sequences Due to incompatable variants
Sep 18, 2024
fe36e06
Remove debug statements
Sep 19, 2024
8e5a955
Fixed Bug caused by variable FIELD
Oct 17, 2024
9246d75
Fixed Typo
Jan 6, 2025
321c413
Updated viral consensus to use bcftools
Jan 6, 2025
aad7282
Fixed bug in viral consensus leading to no final vcf being produced
Jan 6, 2025
e90feb5
Updated how depth of coverage is calculated to ignore secondary/supp
Jan 6, 2025
f6d25c7
Merge branch 'master' into consensusTruncation
Jan 6, 2025
d3b56de
Updated formatting on new depth calculation format
Jan 6, 2025
3cfd4bd
Added threads to all subcommands
Feb 17, 2025
8d962be
--very-sensitive now affects --run-specific
Feb 17, 2025
2ca6570
Revert "--very-sensitive now affects --run-specific"
Feb 17, 2025
c3adc8b
Changed default top-size behaviour to use whole database
Mar 5, 2025
58a54ed
Changed default top-size behaviour to use whole database
Mar 5, 2025
e3f2bf1
Merge branch 'dynamicVirusTop' of https://github.com/zacherydickson/t…
Mar 5, 2025
7209ff7
Created a single script to handle breadth and depth calculations across
Apr 8, 2025
3cb2298
Add Exec Permissions to TRACES_overall
Apr 9, 2025
22b62c6
Fix bugs in TRACES_overall.sh
Apr 9, 2025
21cbbd0
Replaced use of TRACES_overall_X.sh with TRACES_overall.sh
Apr 9, 2025
51bcacf
Incorporate TRACES_overall changes from master branch
Apr 9, 2025
8bbaf96
Updated some dependency versions in INSTALL
Jun 12, 2025
f7b623e
Merge branch 'globalSensitive' into dynamicVirusTop
Jun 12, 2025
256bcf5
--very-sensitive now affects --run-specific
Feb 17, 2025
aed8124
Merge branch 'globalSensitive' into dynamicVirusTop
Jun 12, 2025
84bf307
GlobalSensitive and DynamicTop merged in; Added Versioning Files
Jun 12, 2025
1bd07e7
Inital branch commit with option setup
Jun 12, 2025
a6e6c49
Added TRACES_get_best_by_meta.sh
Jun 12, 2025
1c64887
Get Best By Meta Usage implemented
Jun 12, 2025
a1f17ac
Updates to Usage
Jun 12, 2025
399c6cd
Fixed the long lived Copping typo
Jun 12, 2025
6dd8049
Merge branch 'master' into altDB
Jun 12, 2025
6242b5f
Fixed fatal typo in awk delim specification
Jun 12, 2025
8f9ea72
Updates to viral name tracking
Jun 13, 2025
d59c6bc
Fixes to Version Notes
Jun 13, 2025
48dcf5f
Fixes to version notes
Jun 13, 2025
cfddf68
Removed src/viral_names
Jun 13, 2025
e7b0552
Fixed fatal type in path to internal_viral_names
Jun 16, 2025
99e8a2e
Incorporte V1.2.0dev changes - Merge branch 'altDB'
Jun 16, 2025
0078c4c
Updated version notes
Jun 16, 2025
66ed3c7
Made internal_virus_names path relative to executable path rather than
Jun 23, 2025
04588c2
Fixed Bug where RUN_SPECIFIC wasn't turned on by RUN_EXTREME
Jun 23, 2025
db72213
Fix version Notes
Jun 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.swp
tests
1 change: 1 addition & 0 deletions Version.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.2.1
14 changes: 14 additions & 0 deletions VersionNotes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Version 1.1.4
GlobalSensitive and DynamicTop merged in; Added Versioning Files
Version 1.2.0
Reorganized Usage Message into blocks for easier reading
Added infrastructure for altVDB and VDBmetadata options
Handled interactions between altVDB and internal database commands
Added General script for getting best result from top when virus group metadata is available
If Metadata for the database is provided, virus groups from the database are used
and mappings between accessions and viruses are used
System file for default supported viruses created
virus_names in src directory now contains the viruses used in the analysis
Version 1.2.1
Bug Fix - Run Extreme when specified alone now performs alignments

383 changes: 236 additions & 147 deletions src/TRACESPipe.sh

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion src/TRACES_coverage_table.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash
#
declare -a VIRUSES=('B19' 'HV1' 'HV2' 'HV3' 'HV4' 'HV5' 'HV6' 'HV6A' 'HV6B' 'HV7' 'HV8' 'POLY1' 'POLY2' 'POLY3' 'POLY4' 'POLY5' 'POLY6' 'POLY7' 'POLY8' 'POLY9' 'POLY10' 'POLY11' 'POLY12' 'POLY13' 'POLY14' 'TTV' 'HBOV1' 'HBOVNOT1' 'HBV' 'HPV' 'VARV' 'SV40' 'CUTA' 'HERV');
readarray -t VIRUSES < viral_names.txt
#declare -a VIRUSES=('B19' 'HV1' 'HV2' 'HV3' 'HV4' 'HV5' 'HV6' 'HV6A' 'HV6B' 'HV7' 'HV8' 'POLY1' 'POLY2' 'POLY3' 'POLY4' 'POLY5' 'POLY6' 'POLY7' 'POLY8' 'POLY9' 'POLY10' 'POLY11' 'POLY12' 'POLY13' 'POLY14' 'TTV' 'HBOV1' 'HBOVNOT1' 'HBV' 'HPV' 'VARV' 'SV40' 'CUTA' 'HERV');
declare -a ORGANS=( $(cat ../meta_data/meta_info.txt | tr ':' '\t' | awk '{ print $1;}') );
#
printf "; ";
Expand Down
18 changes: 11 additions & 7 deletions src/TRACES_cy_consensus.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,25 @@ awk '$4 < 1' cy-coverage-$Organ.bed > cy-zero-coverage-$Organ.bed
#
# CALLS W CONSENSUS
samtools faidx $Reference # -P 9.9e-1
bcftools mpileup -Ou -f $Reference $Alignments | bcftools call --ploidy 1 -P 9.9e-1 -mv -Oz -o calls-cy-$Organ.vcf.gz
bcftools index calls-cy-$Organ.vcf.gz
rawCalls="raw-cy-$Organ.vcf.gz"
bcftools mpileup -Ov -f $Reference $Alignments | bcftools call --ploidy 1 -P 9.9e-1 -mv -Oz -o "$rawCalls"
bcftools index "$rawCalls"
#
# normalize indels
bcftools norm -f $Reference calls-cy-$Organ.vcf.gz -Oz -o calls-cy-$Organ.norm.vcf.gz
bcftools norm -f $Reference "$rawCalls" -Oz -o calls-cy-$Organ.norm.vcf.gz
#
# filter adjacent indels within 5bp
bcftools filter --IndelGap 5 calls-cy-$Organ.norm.vcf.gz -Oz -o calls-cy-$Organ.norm.flt-indels.vcf.gz
# filter incompatible variants
finalVCF="calls-cy-$Organ.vcf.gz"
./TRACES_filter_incompatible_variants.sh calls-cy-$Organ.norm.flt-indels.vcf.gz >| $finalVCF;
#
# create bed file
zcat calls-cy-$Organ.norm.flt-indels.vcf.gz |vcf2bed --snvs > cy-calls-$Organ.bed
zcat "$finalVCF" | vcf2bed --snvs > cy-calls-$Organ.bed
#
# CONSENSUS
tabix -f calls-cy-$Organ.norm.flt-indels.vcf.gz
bcftools consensus -m cy-zero-coverage-$Organ.bed -f $Reference calls-cy-$Organ.norm.flt-indels.vcf.gz > cy-consensus-$Organ.fa
tabix -f "$finalVCF"
bcftools consensus -m cy-zero-coverage-$Organ.bed -f $Reference $finalVCF > cy-consensus-$Organ.fa
#
# Give new header name for the consensus sequence
tail -n +2 cy-consensus-$Organ.fa > CY_TMP_FILE_$Organ.xki
Expand All @@ -40,6 +44,6 @@ cat CY_TMP_FILE_$Organ.xki >> cy-consensus-$Organ.fa
rm -f CY_TMP_FILE_$Organ.xki;
#
#
rm -f calls-cy-$Organ.vcf.gz calls-cy-$Organ.vcf.gz.csi calls-cy-$Organ.norm.bcf calls-cy-$Organ.norm.flt-indels.bcf calls-cy-$Organ.norm.flt-indels.vcf.gz calls-cy-$Organ.norm.vcf.gz;
rm -f "$rawCalls" "$rawCalls.csi" calls-cy-$Organ.norm.bcf calls-cy-$Organ.norm.flt-indels.bcf calls-cy-$Organ.norm.flt-indels.vcf.gz calls-cy-$Organ.norm.vcf.gz "$finalVCF" ${finalVCF/.vcf.gz/.bcf}; "$finalVCF.tbi"
#
#
85 changes: 85 additions & 0 deletions src/TRACES_filter_incompatible_variants.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/bin/bash

#This script is intended to pass over a vcf file and remove conflicting variants
#The choice is made first by higher quality, then by VCF order
#Confilicting alleles are snps at the same position as an insertion
# and other variants which appear in the region covered by a deletion
#If there are multiple variants overlapping a position, the best one is retained.
#Variants are processed in an order such that some conflicts may be resolved
# Example two variant sites covered by a deletion, where the deletion is lower
# quality than either variant: the deletion is filtered and the two variants are
# retained. In this way the maximum number of variants are
#
#Written by Zachery Dickson, September 2024
# Contact: dicksoz@mcmaster.ca
# zachery.dickson@helski.fi

if [ "$#" -lt 1 ]; then
>&2 echo -e "Usage: $(basename $0) In.vcf[.gz] > Out.vcf.gz\n" \
"\tFinds conflicting variants, selects the maximum conflict free set\n" \
"\tAssumes variants are sorted by position";
exit 1;
fi

file=$1;shift;
if file $file | grep -q gzip; then
zcat $file;
else
cat $file;
fi |
awk -F '\\t' -v call="$0 $file" '
#j > i
function testOverlap(i,j) {
if(chrom[j] != chrom[i]){return 0;}
if(left[j] > right[i]){return 0;}
return 1;
}
BEGIN{OFS="\t"}
/^##/{print;next;}
/^#CHROM/{
print "##incompatFilterCommand="call;
print;
next
}
{
counter++;
fullLine[counter] = $0;
qual[counter] = $6;
chrom[counter] = $1;
left[counter] = $2;
len = length($4)
# if(length($5) > len){len = length($5)}
right[counter] = $2+len-1;
sortableLength[counter] = sprintf("%d%06d",len,counter);
}
END {
#Process Variants in order of length
n=asort(sortableLength);
for(i = n; i >= 1; i--){
#Iterate from longest to shortest
idx = substr(sortableLength[i],length(sortableLength[i])-5)+0
if(Filtered[idx]){continue;}
jdx = idx;
bestQual=qual[idx];
#Check all following entries as long as this variant is best,
# we have not checked all variants, and the next one overlaps
while(bestQual==qual[idx] && jdx < counter &&
testOverlap(idx,jdx+1)){
jdx++;
if(qual[jdx] > qual[idx]){
Filtered[idx] = 1;
bestQual = qual[jdx];
} else {
Filtered[jdx] = 1;
}
}
}
for(idx=1;idx<=counter; idx++){
if(!Filtered[idx]){
print fullLine[idx];
}
}
}
' |
bgzip;

18 changes: 18 additions & 0 deletions src/TRACES_get_best_by_meta.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
ORGAN=$1; shift
Virus=$1; shift
MetaFile=$1; shift
#
awk -v Virus="$Virus" -F '\\t' '
BEGIN{OFS="\t";bestSim="-";bestAcc="-"}
(ARGIND == 1){if($2==Virus){InSet[$1]=1};next}
{
sub("NC_","NC-",$4);
split($4,a,"_");
acc = a[1];
sub("NC-","NC_",acc);
sim=$3;
}
(InSet[acc] && sim > bestSim+0){ bestSim=sim; bestAcc=acc;}
END {print bestSim,bestAcc}
' "$MetaFile" "top-$ORGAN.csv"
18 changes: 11 additions & 7 deletions src/TRACES_hybrid_consensus.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,26 @@ awk '$4 < 1' $Label-coverage-$Organ.bed > $Label-zero-coverage-$Organ.bed
#
# CALLS
samtools faidx $Reference # -P 9.9e-1 # uses the new variations for consensus
bcftools mpileup -Ou -f $Reference $Alignments | bcftools call --ploidy 1 -M -mv -Oz -o $Label-$Organ-calls.vcf.gz
bcftools index $Label-$Organ-calls.vcf.gz
rawCalls="$Label-$Organ-raw.vcf.gz"
bcftools mpileup -Ov -f $Reference $Alignments | bcftools call --ploidy 1 -M -mv -Oz -o "$rawCalls"
bcftools index "$rawCalls"
#
# normalize indels
bcftools norm -f $Reference $Label-$Organ-calls.vcf.gz -Oz -o $Label-$Organ-calls.norm.vcf.gz
bcftools norm -f $Reference "$rawCalls" -Oz -o $Label-$Organ-calls.norm.vcf.gz
#
# filter adjacent indels within 5bp
#bcftools filter --IndelGap 5 $Label-$Organ-calls.norm.vcf.gz -Oz -o $Label-$Organ-calls.norm.flt-indels.vcf.gz
# filter incompatible variants
finalVCF="$Label-$Organ-calls.vcf.gz"
./TRACES_filter_incompatible_variants.sh $Label-$Organ-calls.norm.vcf.gz >| "$finalVCF";
#
# create bed file
#zcat $Label-$Organ-calls.norm.flt-indels.vcf.gz |vcf2bed --snvs > $Label-calls-$Organ.bed
zcat $Label-$Organ-calls.norm.vcf.gz |vcf2bed --snvs > $Label-calls-$Organ.bed
zcat "$finalVCF" | vcf2bed --snvs > $Label-calls-$Organ.bed
#
# CONSENSUS
tabix -f $Label-$Organ-calls.norm.vcf.gz
bcftools consensus -m $Label-zero-coverage-$Organ.bed -f $Reference $Label-$Organ-calls.norm.vcf.gz > $Label-consensus-$Organ.fa
tabix -f "$finalVCF"
bcftools consensus -m $Label-zero-coverage-$Organ.bed -f $Reference "$finalVCF" > $Label-consensus-$Organ.fa
#
# Give new header name for the consensus sequence
tail -n +2 $Label-consensus-$Organ.fa > $Label-$Organ-TMP_FILE.xki
Expand All @@ -44,6 +48,6 @@ cat $Label-$Organ-TMP_FILE.xki >> $Label-consensus-$Organ.fa
rm -f $Label-$Organ-TMP_FILE.xki;
#
#
rm -f $Label-$Organ-calls.vcf.gz $Label-$Organ-calls.vcf.gz.csi $Label-$Organ-calls.norm.bcf $Label-$Organ-calls.norm.vcf.gz.csi $Label-$Organ-calls.norm.vcf.gz;
rm -f "$rawCalls" "$rawCalls.csi" $Label-$Organ-calls.norm.bcf $Label-$Organ-calls.norm.vcf.gz.csi $Label-$Organ-calls.norm.vcf.gz "$finalVCF" "$finalVCF.tbi";
#
#
6 changes: 3 additions & 3 deletions src/TRACES_install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,10 @@ if [[ "$INSTALL_PIPELINE" -eq "1" ]];
conda install -c bioconda spades --yes
conda install -c bioconda igv --yes
conda install -c bioconda bowtie2 --yes
conda install -c bioconda samtools=1.9 --yes
conda install -c bioconda bcftools=1.9 --yes
conda install -c bioconda samtools=1.21 --yes
conda install -c bioconda bcftools=1.21 --yes
conda install -c bioconda bedops --yes
conda install -c bioconda bedtools --yes
conda install -c bioconda bedtools=2.31.1 --yes
conda install -c bioconda fastq-pair --yes
conda install -c bioconda entrez-direct --yes
conda install -c bioconda/label/cf201901 entrez-direct --yes
Expand Down
18 changes: 11 additions & 7 deletions src/TRACES_mt_consensus.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,25 @@ awk '$4 < 1' $Label-coverage-$Organ.bed > $Label-zero-coverage-$Organ.bed # CHAN
#
# CALLS
samtools faidx $Reference # -P 9.9e-1
bcftools mpileup -Ou -f $Reference $Alignments | bcftools call --ploidy 1 -P 9.9e-1 -mv -Oz -o $Label-$Organ-calls.vcf.gz
bcftools index $Label-$Organ-calls.vcf.gz
rawCalls="$Label-$Organ-raw.vcf.gz"
bcftools mpileup -Ov -f $Reference $Alignments | bcftools call --ploidy 1 -P 9.9e-1 -mv -Oz -o "$rawCalls"
bcftools index "$rawCalls"
#
# normalize indels
bcftools norm -f $Reference $Label-$Organ-calls.vcf.gz -Oz -o $Label-$Organ-calls.norm.vcf.gz
bcftools norm -f $Reference "$rawCalls" -Oz -o $Label-$Organ-calls.norm.vcf.gz
#
# filter adjacent indels within 5bp
bcftools filter --IndelGap 5 $Label-$Organ-calls.norm.vcf.gz -Oz -o $Label-$Organ-calls.norm.flt-indels.vcf.gz
# filter incompatible variants
finalVCF="$Label-$Organ-calls.vcf.gz"
./TRACES_filter_incompatible_variants.sh $Label-$Organ-calls.norm.flt-indels.vcf.gz >| "$finalVCF";
#
# create bed file
zcat $Label-$Organ-calls.norm.flt-indels.vcf.gz |vcf2bed --snvs > $Label-calls-$Organ.bed
zcat "$finalVCF" |vcf2bed --snvs > $Label-calls-$Organ.bed
#
# CONSENSUS
tabix -f $Label-$Organ-calls.norm.flt-indels.vcf.gz
bcftools consensus -m $Label-zero-coverage-$Organ.bed -f $Reference $Label-$Organ-calls.norm.flt-indels.vcf.gz > $Label-consensus-$Organ.fa
tabix -f "$finalVCF"
bcftools consensus -m $Label-zero-coverage-$Organ.bed -f $Reference "$finalVCF" > $Label-consensus-$Organ.fa
#
# Give new header name for the consensus sequence
tail -n +2 $Label-consensus-$Organ.fa > $Label-TMP_FILE_$Organ.xki
Expand All @@ -45,6 +49,6 @@ cat $Label-TMP_FILE_$Organ.xki >> $Label-consensus-$Organ.fa
rm -f $Label-TMP_FILE_$Organ.xki;
#
#
rm -f $Label-$Organ-calls.vcf.gz $Label-$Organ-calls.vcf.gz.csi $Label-$Organ-calls.norm.bcf $Label-$Organ-calls.norm.flt-indels.bcf $Label-$Organ-calls.norm.flt-indels.vcf.gz $Label-$Organ-calls.norm.flt-indels.vcf.gz.csi $Label-$Organ-calls.norm.vcf.gz;
rm -f "$rawCalls" "$rawCalls.csi" $Label-$Organ-calls.norm.bcf $Label-$Organ-calls.norm.vcf.gz $Label-$Organ-calls.norm.flt-indels.bcf $Label-$Organ-calls.norm.flt-indels.vcf.gz $Label-$Organ-calls.norm.flt-indels.vcf.gz.csi "$finalVCF" "$finalVCF.tbi" "${finalVCF/.vcf.gz/.bcf}";
#
#
18 changes: 11 additions & 7 deletions src/TRACES_multiorgan_consensus.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,17 @@ samtools index $FIELD-data_aligned_sorted.bam $FIELD-data_aligned_sorted.bam.bai
bedtools genomecov -ibam $FIELD-data_aligned_sorted.bam -bga > $FIELD-coverage.bed
awk '$4 < 1' $FIELD-coverage.bed > $FIELD-zero-coverage.bed
samtools faidx $REFERENCE
bcftools mpileup -Ou -f $REFERENCE $FIELD-data_aligned_sorted.bam \
| bcftools call --ploidy 1 -mv -Oz -o $FIELD-calls.vcf.gz
bcftools index $FIELD-calls.vcf.gz
bcftools norm -f $REFERENCE $FIELD-calls.vcf.gz -Oz -o $FIELD-calls.norm.vcf.gz
zcat $FIELD-calls.norm.vcf.gz |vcf2bed --snvs > $FIELD-calls.bed
tabix -f $FIELD-calls.norm.vcf.gz
bcftools consensus -f $REFERENCE $FIELD-calls.norm.vcf.gz > $FIELD-consensus.fa
rawCalls="$FIELD-raw.vcf.gz"
bcftools mpileup -Ov -f $REFERENCE $FIELD-data_aligned_sorted.bam \
| bcftools call --ploidy 1 -mv -Oz -o "$rawCalls"
bcftools index "$rawCalls"
bcftools norm -f $REFERENCE "$rawCalls" -Oz -o $FIELD-calls.norm.vcf.gz
# filter incompatible variants
finalVCF="$FIELD-calls.vcf.gz"
./TRACES_filter_incompatible_variants.sh $FIELD-calls.norm.vcf.gz >| "$finalVCF";
zcat "$finalVCF" | vcf2bed --snvs > $FIELD-calls.bed
tabix -f "$finalVCF"
bcftools consensus -f $REFERENCE "$finalVCF" > $FIELD-consensus.fa
#
# Give new header name for the consensus sequence
tail -n +2 $FIELD-consensus.fa > $FIELD-TMP2_FILE.xki
Expand Down
90 changes: 90 additions & 0 deletions src/TRACES_overall.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/bin/bash
#
#USAGE: TRACES_overall.sh reftype pattern organ [maxdepth=1000]
# RefType is one of viral, specific, cy, mtdna and defines the type of reference we are calculating stats for
# Pattern defines the specific reference used
# for cy and mtdna this should be cy and mt respectively
# for specific it is the pattern used to find the reference in the DB
# for virus it is the virus Name
# Organ is the identifier library being processed
# MaxDepth is the maximum depth to consider at any given position
#
#
#Define Global Variables
declare -A ValidRefTypes;
for type in cy mtdna specific viral; do
ValidRefTypes[$type]=1
done
declare -A RequiredPattern;
RequiredPattern["cy"]="cy"
RequiredPattern["mtdna"]="mt"
DEFAULT_MAX_DEPTH=1000;
#Pull the Ref Type, validate it, and set the directories containing relevant bed and bam, and output files
REFTYPE="$1";shift;
if [[ -z "${ValidRefTypes[$REFTYPE]}" ]]; then
>&2 echo -e "[BUG] Attempt to call TRACES_overall.sh on an unrecognized type ($REFTYPE)\n" \
"\t Valid types are: " "${!ValidRefTypes[@]}" \
;
exit 1;
fi
BedDir="../output_data/TRACES_${REFTYPE}_bed"
BamDir="../output_data/TRACES_${REFTYPE}_alignments"
StatsDir="../output_data/TRACES_${REFTYPE}_statistics"
#Pull the pattern and validate it against the reftype
PATTERN="$1";shift;
if [[ -n "${RequiredPattern[$REFTYPE]}" && "$PATTERN" != "${RequiredPattern[$REFTYPE]}" ]]; then
rq="${RequiredPattern[$REFTYPE]}"
>&2 echo "[BUG WARNING] Call to TRACES_overall.sh $REFTYPE with pattern!='$rq'";
PATTERN=$rq
fi
#Set relevant pref- and suffixes
BedPrefix="$PATTERN" #post-delimiter is constant across all ref types
AlnBamPrefix="$PATTERN" #post-delimiter is constant across all ref types
AlnBamSuffix="-$PATTERN" #pre-delimiter included as the suffix isn't present for cy and mtdna
StatsPrefix="$PATTERN" #post-delimiter is constant across all ref types
#Handle cases where the -fixes aren't pattern
case $REFTYPE in
viral)
AlnBamPrefix="viral"
;;
specific)
AlnBamPrefix="viral"
;;
cy)
AlnBamSuffix=""
;;
mtdna)
AlnBamSuffix=""
;;
esac
#Pull the organ and then set the relevant bed, bam, and output files
ORGAN="$1"; shift
CovBedFile="$BedDir/$BedPrefix-coverage-$ORGAN.bed"
AlnBamFile="$BamDir/${AlnBamPrefix}_aligned_sorted-${ORGAN}${AlnBamSuffix}.bam"
DepthFile="$StatsDir/${StatsPrefix}-total-depth-coverage-${ORGAN}.txt"
ZeroBedFile="$BedDir/${BedPrefix}-zero-coverage-$ORGAN.bed"
BreadthFile="$StatsDir/${StatsPrefix}-total-horizontal-coverage-${ORGAN}.txt"
#Pull the Max Depth from the comand line options
MAX_DEPTH=$1; shift
if [ -z "$MAX_DEPTH" ]; then
MAX_DEPTH=$DEFAULT_MAX_DEPTH;
fi
#
#Get the total size of the reference sequence, and the sequence ID for the first contig
# Note: we assume all references are single contigs at this point
TOTAL_SIZE=$(tail -n 1 "$CovBedFile" | awk '{ print $3}');
acc=$(head -1 "$CovBedFile" | cut -f1);
bedEntry="$acc\t0\t$TOTAL_SIZE"
#Updated Depth calculation which ignores peaks, but more importantly ignores secondary/supplementary alignments
#Column 3 is the length of the genome (constructed in the bedEntry above
#Column 4 is the total bases covered
NORMALIZED_COVERAGE=$(samtools bedcov --max-depth "$MAX_DEPTH" <(echo -e "$bedEntry") "$AlnBamFile" | awk '{print $4/$3}')
printf "$PATTERN\t$ORGAN\t%0.4f\n" "$NORMALIZED_COVERAGE" > "$DepthFile"
#
ZERO_COVERAGE=$(awk '{sum += ($3-$2)} END {print sum}' "$ZeroBedFile");
if [ -z "$ZERO_COVERAGE" ]; then
ZERO_COVERAGE=0;
fi
NORMALIZED_ZERO_COVERAGE=$(echo "scale=4; (100-(($ZERO_COVERAGE / $TOTAL_SIZE)*100))" | bc -l);
printf "$PATTERN\t$ORGAN\t%0.4f\n" "$NORMALIZED_ZERO_COVERAGE" > "$BreadthFile"
#
Loading