Skip to content

Commit 8f6da3c

Browse files
mkhludnevoandreeva-nvnnshah1
authored
Demonstrate passing "max_tokens" param (#34)
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> Co-authored-by: Neelay Shah <neelays@nvidia.com>
1 parent 5c03411 commit 8f6da3c

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

samples/client.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,16 @@ async def process_stream(
9999
return success
100100

101101
async def run(self):
102+
# Sampling parameters for text generation
103+
# including `temperature`, `top_p`, top_k`, `max_tokens`, `early_stopping`.
104+
# Full list available at:
105+
# https://github.com/vllmproject/vllm/blob/5255d99dc595f9ae7647842242d6542aa4145a4f/vllm/sampling_params.py#L23
106+
sampling_parameters = {
107+
"temperature": "0.1",
108+
"top_p": "0.95",
109+
"max_tokens": "100",
110+
}
102111
exclude_input_in_output = self._flags.exclude_inputs_in_outputs
103-
sampling_parameters = {"temperature": "0.1", "top_p": "0.95"}
104112
with open(self._flags.input_prompts, "r") as file:
105113
print(f"Loading inputs from `{self._flags.input_prompts}`...")
106114
prompts = file.readlines()

0 commit comments

Comments
 (0)