-
Notifications
You must be signed in to change notification settings - Fork 238
Open
Description
Hi,
I started to use mmseqs2 to functionally annotate genes, and saw a surprising requirement of disk space. I tested the swissprot db with a concatenated fna file using nohup mmseqs easy-search /mnt/8T_2/zuo/gene_cluster_cohort/27_genes_cohort.fna /mnt/16T_2/mmseqs_db/swissprot alnResult.m8 tmp -e 0.01 --min-seq-id 0.3 --cov-mode 2 -c 0.8
.
And the process reported:
prefilter tmp/5432758783232164347/search_tmp/7264814417130636468/q_orfs_aa /mnt/16T_2/mmseqs_db/swissprot tmp/5432758783232164347/search_tmp/7264814417130636468/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 96 --compressed 0 -v 3 -s 5.7
Query database size: 797809035 type: Aminoacid
Estimated memory consumption: 4G
Target database size: 572970 type: Aminoacid
Index table k-mer threshold: 112 at k-mer size 6
Index table: counting k-mers
[=================================================================] 572.97K 2s 147ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 572.97K 1s 889ms
Index statistics
Entries: 197513212
DB size: 1618 MB
Avg k-mer size: 3.086144
Top 10 k-mers
GPGGTL 1851
GQSWTV 1705
WGMFAT 1637
PGVFEV 1637
VLWQFW 1622
AYIRPN 1586
RSPKGV 1584
TPHKWY 1559
KPWFAY 1551
ITLSPY 1540
Time for index table init: 0h 0m 5s 636ms
Hard disk might not have enough free space (717G left).The prefilter result might need up to 9T.
Process prefiltering step 1 of 1
Is there any way to reduce the requirement of disk space? I feel unrealistic about the so large size for merely 50G input. Any suggestion would be greatly appreciated. /(T o T)/~~
Metadata
Metadata
Assignees
Labels
No labels