Skip to content

Add AI21 support to Inference Plugin #131238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Jan-Kazlouski-elastic
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented Jul 14, 2025

Creation of new AI21 inference provider integration allowing completion (both streaming and non-streaming) and chat_completion (only streaming) to be executed as part of inference API.

Changes were tested locally against next models:
jamba-large
jamba-mini

Create Completion Endpoint

Success:

PUT {{base-url}}/_inference/completion/ai21-completion
RQ
{
    "service": "ai21",
    "service_settings": {
        "api_key": "{{ai21-api-key}}",
        "model_id": "jamba-mini"
    }
}
RS
{
    "inference_id": "ai21-completion",
    "task_type": "completion",
    "service": "ai21",
    "service_settings": {
        "model_id": "jamba-mini",
        "rate_limit": {
            "requests_per_minute": 200
        }
    }
}

Invalid Model:

PUT {{base-url}}/_inference/completion/ai21-completion
RQ
{
    "service": "ai21",
    "service_settings": {
        "api_key": "{{ai21-api-key}}",
        "model_id": "invalid-model"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an input validation error response for request from inference entity id [ai21-completion] status [422]. Error message: [{\"detail\":\"The provided model is not supported. See https://docs.ai21.com/docs/jamba-foundation-models#api-versioning for a list of supported models\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an input validation error response for request from inference entity id [ai21-completion] status [422]. Error message: [{\"detail\":\"The provided model is not supported. See https://docs.ai21.com/docs/jamba-foundation-models#api-versioning for a list of supported models\"}]"
        }
    },
    "status": 400
}

Auth Failed:

PUT {{base-url}}/_inference/completion/ai21-completion
RQ
{
    "service": "ai21",
    "service_settings": {
        "api_key": "invalid-key",
        "model_id": "jamba-mini"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an authentication error status code for request from inference entity id [ai21-completion] status [401]. Error message: [{\"detail\":\"Forbidden: Bad or missing Apikey/JWT.\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an authentication error status code for request from inference entity id [ai21-completion] status [401]. Error message: [{\"detail\":\"Forbidden: Bad or missing Apikey/JWT.\"}]"
        }
    },
    "status": 400
}
Perform Completion

Success Non Streaming:

POST {{base-url}}/_inference/completion/ai21-completion
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "That's a striking opening line from William Gibson's **Neuromancer**. It paints a vivid, dystopian image, evoking a sense of decay and the eerie beauty of a world on the edge of technological collapse. What drew you to this line? Are you exploring cyberpunk themes or looking for inspiration for writing?"
        }
    ]
}

Success Streaming:

POST {{base-url}}/_inference/completion/ai21-completion/_stream
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"That"},{"delta":"'"},{"delta":"s"},{"delta":" a"},{"delta":" memorable"}]}

event: message
data: {"completion":[{"delta":" opening"},{"delta":" line"},{"delta":" from"},{"delta":" William"},{"delta":" Gibson"},{"delta":"'"}]}

event: message
data: {"completion":[{"delta":"s"},{"delta":" groundbreaking"},{"delta":" novel"},{"delta":" Ne"},{"delta":"ur"}]}

event: message
data: {"completion":[{"delta":"om"},{"delta":"ancer"},{"delta":"."},{"delta":" It"},{"delta":" sets"},{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" vivid"},{"delta":","},{"delta":" dyst"},{"delta":"opian"}]}

event: message
data: {"completion":[{"delta":" tone"},{"delta":" for"},{"delta":" the"},{"delta":" cyber"},{"delta":"punk"},{"delta":" world"}]}

event: message
data: {"completion":[{"delta":" he"}]}

event: message
data: {"completion":[{"delta":" created"},{"delta":"."},{"delta":" Would"},{"delta":" you"},{"delta":" like"}]}

event: message
data: {"completion":[{"delta":" to"},{"delta":" discuss"},{"delta":" the"},{"delta":" book"}]}

event: message
data: {"completion":[{"delta":","},{"delta":" the"},{"delta":" genre"},{"delta":","},{"delta":" or"}]}

event: message
data: {"completion":[{"delta":" something"},{"delta":" else"},{"delta":" related"},{"delta":" to"},{"delta":" it"}]}

event: message
data: {"completion":[{"delta":"?"}]}

event: message
data: [DONE]

Create Completion Endpoint

Success:

PUT {{base-url}}/_inference/chat_completion/ai21-chat-completion
RQ:
{
    "service": "ai21",
    "service_settings": {
        "api_key": "{{ai21-api-key}}",
        "model_id": "jamba-mini"
    }
}
RS:
{
    "inference_id": "ai21-chat-completion",
    "task_type": "chat_completion",
    "service": "ai21",
    "service_settings": {
        "model_id": "jamba-mini",
        "rate_limit": {
            "requests_per_minute": 200
        }
    }
}

Invalid Model:

PUT {{base-url}}/_inference/chat_completion/ai21-chat-completion
RQ:
{
    "service": "ai21",
    "service_settings": {
        "api_key": "{{ai21-api-key}}",
        "model_id": "invalid-model"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an input validation error response for request from inference entity id [ai21-chat-completion] status [422]. Error message: [{\"detail\":\"The provided model is not supported. See https://docs.ai21.com/docs/jamba-foundation-models#api-versioning for a list of supported models\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an input validation error response for request from inference entity id [ai21-chat-completion] status [422]. Error message: [{\"detail\":\"The provided model is not supported. See https://docs.ai21.com/docs/jamba-foundation-models#api-versioning for a list of supported models\"}]"
        }
    },
    "status": 400
}

Auth Failed:

PUT {{base-url}}/_inference/chat_completion/ai21-chat-completion
RQ:
{
    "service": "ai21",
    "service_settings": {
        "api_key": "invalid-key",
        "model_id": "jamba-mini"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an authentication error status code for request from inference entity id [ai21-chat-completion] status [401]. Error message: [{\"detail\":\"Forbidden: Bad or missing Apikey/JWT.\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an authentication error status code for request from inference entity id [ai21-chat-completion] status [401]. Error message: [{\"detail\":\"Forbidden: Bad or missing Apikey/JWT.\"}]"
        }
    },
    "status": 400
}
Perform Chat Completion

Success Simple:

POST {{base-url}}/_inference/chat_completion/ai21-chat-completion/_stream
RQ
{
    "model": "jamba-mini",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 10
}
RS
event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"role":"assistant"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":"Deep"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" learning"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" is"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" a"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" of"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" machine"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" learning"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" that"},"index":0}],"model":null,"object":null}

event: message
data: {"id":"chatcmpl-ca8f0d76-4633-50b1-971b-3821e4a9eea1","choices":[{"delta":{"content":" involves"},"finish_reason":"length","index":0}],"model":null,"object":null,"usage":{"completion_tokens":10,"prompt_tokens":15,"total_tokens":25}}

event: message
data: [DONE]


Success Complex is not supported due to error on AI21 side, requesting content to be string type and not anything else.:

POST https://api.ai21.com/studio/v1/chat/completions
RQ:
{
    "model": "jamba-mini",
    "max_tokens": 10,
    "messages": [{
            "role": "user",
            "content": [{
                    "type": "text",
                    "text": "What's the price of a scarf?"
                }
            ]
        }
    ],
    "tools": [{
            "type": "function",
            "function": {
                "name": "get_current_price",
                "description": "Get the current price of a item",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "item": {
                            "id": "123"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": {
        "type": "function",
        "function": {
            "name": "get_current_price"
        }
    }
}
RS:
{
    "detail": [
        {
            "loc": [
                "body",
                "messages",
                0,
                "UserMessage",
                "content"
            ],
            "msg": "str type expected",
            "type": "type_error.str"
        }
    ]
}

ES Inference:

POST {{base-url}}/_inference/chat_completion/ai21-chat-completion/_stream
RQ:
{
    "model": "llama3.2:3b",
    "messages": [{
            "role": "user",
            "content": [{
                    "type": "text",
                    "text": "What's the price of a scarf?"
                }
            ]
        }
    ],
    "tools": [{
            "type": "function",
            "function": {
                "name": "get_current_price",
                "description": "Get the current price of a item",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "item": {
                            "id": "123"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": {
        "type": "function",
        "function": {
            "name": "get_current_price"
        }
    }
}
RS:
{
    "error": {
        "code": "unprocessable_entity",
        "message": "Received an input validation error response for request from inference entity id [ai21-chat-completion] status [422]. Error message: [{\"detail\":[{\"loc\":[\"body\",\"messages\",0,\"UserMessage\",\"content\"],\"msg\":\"str type expected\",\"type\":\"type_error.str\"}]}]",
        "type": "ai21_error"
    }
}
  • - Have you signed the contributor license agreement?
  • - Have you followed the contributor guidelines?
  • - If submitting code, have you built your formula locally prior to submission with gradle check?
  • - If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • - If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • - If you are submitting this code for a class then read our policy for that.

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic changed the title Ai21 chat completion Add AI21 support to Inference Plugin Jul 14, 2025
@elasticsearchmachine elasticsearchmachine added v9.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jul 14, 2025
…etionRequestEntity and Ai21ChatCompletionRequest
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic marked this pull request as ready for review July 16, 2025 15:35
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants