Skip to content
Terézia Slanináková edited this page Mar 5, 2025 · 1 revision

AlphaFind API - User Documentation

Overview

AlphaFind is a web-based search engine that allows for structure-based search of proteins across the entire AlphaFold Protein Structure Database. The API provides endpoints to search for proteins by their ID (Uniprot ID, PDB ID, or Gene Symbol) and returns structurally similar proteins.

Base URL

When deployed locally, the API is accessible at:

http://localhost:8080

Endpoints

1. /ready (GET)

This endpoint checks if the API service is ready to handle requests.

Request Parameters: None

Response Format:

{
  "ready": "true|false"
}

Example Usage:

curl 'http://localhost:8080/ready'

2. /search (GET)

This is the main endpoint for searching for proteins similar to a given query protein.

Request Parameters:

Parameter Type Description Default
query string The protein identifier (Uniprot ID, PDB ID, or Gene Symbol) Required
offset integer The offset from which to return results 0
limit integer The maximum number of results to return 50

Notes:

  • offset must be non-negative and less than 50
  • limit must be non-negative and less than 50
  • The API may adjust the limit based on available results

Response Format:

When results are available:

{
  "results": [
    {
      "object_id": "string",       
      "tm_score": "float",           
      "rmsd": "float",                
      "aligned_percentage": "float",
      "sequence_aligned_percentage": "float"
    }
  ],
  "search_time": "float"
}

When the query is in queue (results being computed):

{
  "results": [],
  "queue_position": "integer"
}

When the protein is not found in the database:

{
  "message":"Protein not found in the database.",
  "results": []
}

Example Usage:

# Basic search
curl 'http://localhost:8080/search?query=A0A0C5PVI1'

# Search with pagination
curl 'http://localhost:8080/search?query=A0A0C5PVI1&offset=10&limit=20'

3. /metrics (GET)

This endpoint provides Prometheus metrics for monitoring the API service.

Request Parameters: None

Response Format: Prometheus metrics text format

Example Usage:

curl 'http://localhost:8080/metrics'

Understanding Search Results

The search results include several metrics to evaluate structural similarity:

  • TM-Score: Template Modeling score, ranging from 0 to 1, where 1 indicates perfect structural similarity. Scores above 0.5 generally indicate similar structures.
  • RMSD (Root Mean Square Deviation): Measures the average distance between aligned atoms. Lower values indicate better similarity.
  • aligned_percentage: The percentage of the query protein that was aligned to the result protein.
  • sequence_aligned_percentage: The percentage of the sequence that was aligned, weighted by identity.

Asynchronous Processing

The API uses an asynchronous processing model:

  1. When a search is submitted, the API checks if results already exist for the query.
  2. If results exist, they are returned immediately.
  3. If results don't exist, the query is placed in a queue for processing.
  4. The API response includes a queue position.
  5. Clients should poll the API until results become available.

Limitations

  • The maximum number of results per request is 1000.
  • Some queries may take time to process if they haven't been processed before.
  • The API requires valid protein identifiers that exist in the AlphaFold database.

Example Use Cases

Example 1: Search for similar proteins

curl 'http://localhost:8080/search?query=P69905'

Example 2: Retrieve a subset of results with pagination

curl 'http://localhost:8080/search?query=P69905&offset=10&limit=20'

Example 3: Check if the server is ready to process requests

curl 'http://localhost:8080/ready'

Best Practices

  1. Always check if the server is ready using the /ready endpoint before making search requests.
  2. When receiving a queue position in the response, implement exponential backoff when polling for results.
  3. Use appropriate pagination parameters to limit the amount of data returned.
  4. Include error handling in your client application to handle cases where proteins are not found.