-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hey,
thanks for the great simulation tool! We simulated reads using this command:
conda activate SWAMPy
python simulate_metagenome.py \
--genomes_file ../example/viralref.fasta \
--temp_folder ../example/${sample}_temp \
--genome_abundances ../example/${sample}.tsv \
--primer_set a4 \
--output_folder ../eebg24_highq \
--output_filename_prefix ${sample} \
--n_reads 1000000 \
--seqSys MSv3 \
--read_length 250 \
--seed 10 \
--amplicon_distribution dirichlet_1 \
--amplicon_pseudocounts 200 \
--unique_insertion_rate 0.00002 \
--unique_deletion_rate 0.000115 \
--unique_substitution_rate 0.002485 \
--recurrent_insertion_rate 0.00002 \
--recurrent_deletion_rate 0 \
--recurrent_substitution_rate 0.003357 \
--max_insertion_length 14 \
--subs_VAF_alpha 0.36,0.27 \
--del_VAF_alpha 0.59,0.42 \
--ins_VAF_alpha 0.33,0.45 \
--r_subs_VAF_alpha 0.36,0.27 \
--r_del_VAF_alpha 0.59,0.42 \
--r_ins_VAF_alpha 0.33,0.45
but when we check the quality pattern with FASTQC we see such profiles for the paired-end reads:
Forward read

Reverse read

Usually, the read quality drops towards the 5' end and not the 3' end (see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). So it looks flipped? Although the general error profile looks quite good otherwise!
Did we do something wrong in the simulation, or is this a bug?
(we see the same pattern for all reads we simulated)
Thanks!
Metadata
Metadata
Assignees
Labels
No labels