OpenAI Compatible API Server with OpenVINO

This is an OpenAI compatible API server that uses OpenVINO for inference.

Endpoints

/v1/chat/completions: This endpoint is used to create chat completions.
/v1/models: This endpoint is used to list the available models.
/health: This endpoint is used to check the health of the API.

Usage

To use the API, you need to send a request to the /v1/chat/completions endpoint with a JSON payload that contains the following parameters:

model (string, optional): The name of the model to use. If not specified, the default model is used.
messages (array, required): A list of messages comprising the conversation so far.
- role (string, required): The role of the message sender (e.g., "user" or "assistant").
- content (string, required): The content of the message.
temperature (number, optional, default: 0.7): What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p (number, optional, default: 1.0): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n (integer, optional, default: 1): How many chat completion choices to generate for each input message.
stream (boolean, optional, default: false): Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events.
max_tokens (integer, optional, default: 900): The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.
presence_penalty (number, optional, default: 0.0): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penalty (number, optional, default: 0.0): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same lines verbatim.
stop (string or array, optional, default: null): Up to 4 sequences where the API will stop generating further tokens.

You can find more information about the API in the api_documentation.md file.

Model Directory

The model should be present in the ./models directory. The directory structure should be as follows:

./models/
└── <model_name>/
    ├── openvino_model.bin
    ├── openvino_tokenizer.bin
    ├── openvino_detokenizer.bin
    └── tokenizer_config.json

openvino_model.bin: The OpenVINO model file.
openvino_tokenizer.bin: The OpenVINO tokenizer file.
openvino_detokenizer.bin: The OpenVINO detokenizer file.
tokenizer_config.json: The tokenizer configuration file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
api_documentation.md		api_documentation.md
app.py		app.py
plan.md		plan.md
requirements.txt		requirements.txt
test-api.py		test-api.py
tokenizer_config.json		tokenizer_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAI Compatible API Server with OpenVINO

Endpoints

Usage

Model Directory

About

Languages

mrankitvish/openvino-api-server

Folders and files

Latest commit

History

Repository files navigation

OpenAI Compatible API Server with OpenVINO

Endpoints

Usage

Model Directory

About

Topics

Resources

Stars

Watchers

Forks

Languages