Skip to content

jobs stuck in PENDING status with local mmseqs-web API #85

@reyjul

Description

@reyjul

Hello,

I'm trying to make the mmseqs-web API work but I'm encountering several issues.

This is the Dockerfile I used to build the API:

FROM --platform=linux/amd64 golang:latest as builder
ARG TARGETARCH

WORKDIR /opt/build
ADD backend .
RUN GOOS=linux GOARCH=$TARGETARCH go build -o mmseqs-web

ADD https://mmseqs.com/latest/mmseqs-linux-avx2.tar.gz  .

ADD https://mmseqs.com/foldseek/foldseek-linux-avx2.tar.gz  .

ADD https://raw.githubusercontent.com/soedinglab/MMseqs2/678c82ac44f1178bf9a3d49bfab9d7eed3f17fbc/util/mmseqs_wrapper.sh binaries/mmseqs
ADD https://raw.githubusercontent.com/steineggerlab/foldseek/0a68e16214a6db745cee783128ccba8546ea5dc9/util/foldseek_wrapper.sh binaries/foldseek

RUN mkdir binaries; \
    if [ "$TARGETARCH" = "arm64" ]; then \
      for i in mmseqs foldseek; do \
        if [ -e "${i}-linux-arm64.tar.gz" ]; then \
          cat ${i}-linux-arm64.tar.gz | tar -xzvf- ${i}/bin/${i}; \
          mv ${i}/bin/${i} binaries/${i}; \
        fi; \
      done; \
    else \
      for i in mmseqs foldseek; do \
        for j in sse2 sse41 avx2; do \
          if [ -e "${i}-linux-${j}.tar.gz" ]; then \
            cat ${i}-linux-${j}.tar.gz | tar -xzvf- ${i}/bin/${i}; \
            mv ${i}/bin/${i} binaries/${i}_${j}; \
          fi; \
        done; \
      done; \
    fi;

RUN chmod -R +x binaries

FROM debian:stable-slim
LABEL maintainer="Milot Mirdita <milot@mirdita.de>"

RUN apt-get update && apt-get install -y ca-certificates wget aria2 && rm -rf /var/lib/apt/lists/*
COPY --from=builder /opt/build/mmseqs-web /opt/build/binaries/* /usr/local/bin/

ENTRYPOINT ["/usr/local/bin/mmseqs-web"]

I then installed the databanks and created the indexes the usual way:

mmseqs databases UniRef50 UniRef50 tmp --remove-tmp-files
mmseqs createindex UniRef50 tmp --split 1

and added the params files along the banks in the same directory (/local/banks):

{
  "name": "UniRef50",
  "path": "UniRef50",
  "version": "",
  "default": true,
  "order": 0,
  "index": "",
  "search": "",
  "status": "COMPLETE"
}

This is how I launch the API:

singularity exec --env MMSEQS_NUM_THREADS=2 --bind /local/banks:/local/banks /shared/software/singularity/images/mmseqs2-app-v7-8e1704f-rpbs.sif /usr/local/bin/mmseqs-web -local -config config.json -app mmseqs

This is the content of the config.json file:

{
    "app": "mmseqs",
    "verbose": true,
    "server" : {
        "address"    : "0.0.0.0:3000",
        "dbmanagment": false,
        "cors"       : true
    },
    "worker": {
        "gracefulexit" : true
    },
    "paths" : {
        "databases"    : "/local/banks/",
        "results"      : "/shared/home/rey/colabfold",
        "temporary"    : "/tmp",
        "colabfold"    : {
            "uniref"        : "/local/banks/UniRef50"
        },
        "mmseqs"       : "/usr/local/bin/mmseqs",
        "foldseek"     : "/usr/local/bin/foldseek"
    },
    "redis" : {
        "network"  : "tcp",
        "address"  : "mmseqs-web-redis:6379",
        "password" : "",
        "index"    : 0
    },
    "mail" : {
        "type"      : "null",
        "sender"    : "mail@example.org",
        "templates" : {
            "success" : {
                "subject" : "Done -- %s",
                "body"    : "Dear User,\nThe results of your submitted job are available now at https://search.mmseqs.com/queue/%s .\n"
            },
            "timeout" : {
                "subject" : "Timeout -- %s",
                "body"    : "Dear User,\nYour submitted job timed out. More details are available at https://search.mmseqs.com/queue/%s .\nPlease adjust the job and submit it again.\n"
            },
            "error"   : {
                "subject" : "Error -- %s",
                "body"    : "Dear User,\nYour submitted job failed. More details are available at https://search.mmseqs.com/queue/%s .\nPlease submit your job later again.\n"
            }
        }
    }
}

I get a response with curl which seems to indicate that the API is running and listening on correct port (3000):

curl -X GET http://10.0.1.246:3000/databases
{"databases":[{"name":"UniRef50","version":"","path":"UniRef50","default":true,"order":0,"taxonomy":false,"full_header":false,"index":"","search":"","status":"COMPLETE"},{"name":"UniRef30","version":"2103","path":"UniRef30","default":false,"order":1,"taxonomy":false,"full_header":false,"index":"","search":"","status":"COMPLETE"}]}

On a side note, I can't list databases if I the status in the params file is different from COMPLETE.

If I try to submit a sequence with python:

>>> from requests import get, post
>>> ticket = post('http://10.0.1.246:3000/ticket', {
...             'q' : '>FASTA\nMPKIIEAIYENGVFKPLQKVDLKEGE\n',
...             'database[]' : ["UniRef50"],
...             'mode' : 'all',
...         }).json()
>>> ticket
{'id': 'A5n_NyrysSRtH7tNN6uuYdS6LFkv2bhK3Z94IA', 'status': 'PENDING'}

The directory containing the job is correctly created. But then nothing happens, the jobs stays forever in PENDING state.

Trying to get job status after a few hours, nothing seems to happen either:

>>> status = get('http://10.0.1.246:3000/ticket/' + ticket['id']).json()
>>> status
{'id': 'A5n_NyrysSRtH7tNN6uuYdS6LFkv2bhK3Z94IA', 'status': 'PENDING'}

Any idea / advice are welcome.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions