Skip to content

Update Inference specification for Hugging Face's completion and chat completion tasks #4370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Jan-Kazlouski-elastic
Copy link

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented May 14, 2025

This PR is for changes to specification caused by elastic/elasticsearch#127254:

Extended Task Support:

  • Added completion and chat_completion tasks to the list of supported Hugging Face tasks.

Model Requirements for Chat Tasks:

  • Updated documentation to describe specific requirements for using chat_completion and completion tasks, including model compatibility with the OpenAI API format and usage guidelines for serverless vs. dedicated endpoints.

New Configuration Parameters:

  • Introduced optional model_id field in Hugging Face service settings, applicable to completion and chat_completion tasks.

Rate Limit Clarifications:

  • Updated rate_limit documentation to clarify default behavior and guidance for tuning based on deployment specifics.

Documentation Fixes:

  • Corrected typos in existing text_embedding request examples.

Additional actions

  • Signed the CLA

  • Executed make contrib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant