Skip to content

Conversation

@daspartho
Copy link
Contributor

@daspartho daspartho commented Nov 29, 2023

close #370

adds support for parallel sampling using vllm library when num_return_sequences in generation kwargs is > 1 and the model is supported by vllm (currently all hf models in llm-vm)

TODO: handle dependencies

Copy link
Contributor

@VictorOdede VictorOdede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set vllm_support to be a property of the BaseOnsiteLLM class then we can set it to true by default unless it's not supported by the model then we can set the property to false

@daspartho
Copy link
Contributor Author

made suggested changes. vllm_support is set to true by default and needs to be set false explicitly for unsupported models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parallel sampling with vLLM

2 participants