Skip to content

Simulating multiple random genomes with VarSim using real VCF input #279

Open
@sdws1983

Description

@sdws1983

Hello,

I’m trying to use VarSim to simulate SNPs and SVs in a plant genome (e.g., Arabidopsis thaliana). I have several VCF files from real population data that include both SNPs and SVs, and I want to simulate dozens of random genomes using these VCFs.

However, when I use these VCF files as input to VarSim, the simulated genomes are always exactly the same across runs. I don’t understand why this happens — I was expecting some level of random sampling or stochasticity when generating each simulated genome.

Here is the command I used:

varsim.py --id ath_${i} --seed $i --simulator_executable ~/software/varsim/opt/ART/art_bin_VanillaIceCream/art_illumina \
--reference ~/reference/ath/upload_vcf/Col-PEK.genome.fasta --sv_num_ins 5000 --sv_num_del 5000 --sv_num_dup 2500 --sv_num_inv 2500 \
--vc_num_snp 500000 --vc_num_ins 20000 --vc_num_del 20000 --vc_num_mnp 5000 --vc_num_complex 2500 \
--vc_min_length_lim 0 --vc_max_length_lim 49 --sv_min_length_lim 50 --sv_max_length_lim 1000000 \
--disable_sim --vc_prop_het 0.6 --sv_prop_het 0.6 --vcfs ~/reference/ath/upload_vcf/72accs.col_pek.* \
--disable_rand_vcf --disable_rand_dgv --out_dir out_sim_${i} --log_dir log_sim_${i} --work_dir work_sim_${i}

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions