Input fna file of 50G, resulting prefilter of estimated 9T?

Hi, 

I started to use mmseqs2 to functionally annotate genes, and saw a surprising requirement of disk space. I  tested the swissprot db with a concatenated fna file using `nohup mmseqs easy-search /mnt/8T_2/zuo/gene_cluster_cohort/27_genes_cohort.fna /mnt/16T_2/mmseqs_db/swissprot alnResult.m8 tmp -e 0.01 --min-seq-id 0.3 --cov-mode 2 -c 0.8`.


And the process reported:
```
prefilter tmp/5432758783232164347/search_tmp/7264814417130636468/q_orfs_aa /mnt/16T_2/mmseqs_db/swissprot tmp/5432758783232164347/search_tmp/7264814417130636468/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 96 --compressed 0 -v 3 -s 5.7 

Query database size: 797809035 type: Aminoacid
Estimated memory consumption: 4G
Target database size: 572970 type: Aminoacid
Index table k-mer threshold: 112 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 572.97K 2s 147ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 572.97K 1s 889ms
Index statistics
Entries:          197513212
DB size:          1618 MB
Avg k-mer size:   3.086144
Top 10 k-mers
    GPGGTL	1851
    GQSWTV	1705
    WGMFAT	1637
    PGVFEV	1637
    VLWQFW	1622
    AYIRPN	1586
    RSPKGV	1584
    TPHKWY	1559
    KPWFAY	1551
    ITLSPY	1540
Time for index table init: 0h 0m 5s 636ms
Hard disk might not have enough free space (717G left).The prefilter result might need up to 9T.
Process prefiltering step 1 of 1
```

Is there any way to reduce the requirement of disk space? I feel unrealistic about the so large size for merely 50G input. Any suggestion would be greatly appreciated. /(T o T)/~~

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Input fna file of 50G, resulting prefilter of estimated 9T? #972

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Input fna file of 50G, resulting prefilter of estimated 9T? #972

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions