Skip to content

Issues downloading databases using docker-compose #112

@eyal-converge

Description

@eyal-converge

Hi there - trying to set-up MMseqs2 server within my company
Currently using It via Boltz which uses colabfold API internally - getting rate limited at scale ..

I tried downloading the databases via docker-compose
(downloading datasets /w -v for future use - saving to AWS S3)

docker-compose run --rm -v /db:/opt/mmseqs-web/databases db-setup UniRef100 \
UniRef90 UniRef50 UniProtKB "UniProtKB/TrEMBL" \
"UniProtKB/Swiss-Prot" NR NT GTDB PDB PDB70 \
Pfam-A.full Pfam-A.seed Pfam-B CDD eggNOG VOGDB dbCAN2 SILVA Resfinder Kalamari

Error Is Can not allocate entries memory in IndexTable::initMemory - Error: indexdb died

Full logs (might be missing some at start)

e.g 
createindex /opt/mmseqs-web/databases/UniRef100 /opt/mmseqs-web/databases/tmp_UniRef100 --split 1 

MMseqs Version:                         9c13275673343059cb7e4847c6c89f4b64ce4f9a
Seed substitution matrix                aa:VTML80.out,nucl:nucleotide.out
k-mer length                            0
Alphabet size                           aa:21,nucl:5
Compositional bias                      1
Compositional bias scale                1
Max sequence length                     65535
Max results per query                   300
Mask residues                           1
Mask residues probability               0.9
Mask lower case residues                0
Mask lower letter repeating N times     0
Spaced k-mers                           1
Spaced k-mer pattern               
Sensitivity                             7.5
k-score                                 seq:0,prof:0
Check compatible                        0
Search type                             0
Split database                          1
Split memory limit                      0
Index subset                            0
Verbosity                               3
Threads                                 48
Min codons in orf                       30
Max codons in length                    32734
Max orf gaps                            2147483647
Contig start mode                       2
Contig end mode                         2
Orf start mode                          1
Forward frames                          1,2,3
Reverse frames                          1,2,3
Translation table                       1
Translate orf                           0
Use all table starts                    false
Offset of numeric ids                   0
Create lookup                           0
Compressed                              0
Overlap between sequences               0
Sequence split mode                     1
Header split mode                       0
Translation mode                        0
Strand selection                        1
Remove temporary files                  false

indexdb /opt/mmseqs-web/databases/UniRef100 /opt/mmseqs-web/databases/UniRef100 --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 0 --split 1 --split-memory-limit 0 --index-subset 0 -v 3 --threads 48 

Estimated memory consumption: 2T
Process needs more than 326G main memory.
Increase the size of --split or set it to 0 to automatically optimize target database split.
Write VERSION (0)
Write META (1)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write HDR1INDEX (18)
Write HDR1DATA (19)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Index table: counting k-mers
...
[=================================================================] 100.00% 458.07M 17m 21s 449ms
Index table: Masked residues: 3175422676
Can not allocate entries memory in IndexTable::initMemory
Error: indexdb died

My host specs (AWS r5d.12xlarge instance type - link)

  • 48 vCPUs
  • ~360GB RAM
  • 2.5TB SSD

(P.S - You guys are great - really enjoying seeing your lab publications)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions