You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Client need to insure embedding inputs with limited size(max tokens), so need embedding service provide tokenizer endpoint also.
Most embedding model also included tokenizer models also, service level need expose tokenizer is make sense.
Currently client need implement local tokenizer to calculate the token count. that consume Client resource very much.
Other engine (TEI) provide tokenize endpoint
To Reproduce
Steps to reproduce the behavior:
Steps to prepare models repository '...'
OVMS launch command '....'
Client command (additionally client code if not using official client or demo) '....'
See error
Expected behavior
A clear and concise description of what you expected to happen.
Logs
Logs from OVMS, ideally with --log_level DEBUG. Logs from client.
Configuration
OVMS version
OVMS config.json file
CPU, accelerator's versions if applicable
Model repository directory structure
Model or publicly available similar model that reproduces the issue
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
@gavinlichn The easiest way to get such functionality would be to deploy just individual tokenizer model and access it via KServe API. When you deploy embedding instance, it download and expose also the individual pipeline models for tokenizer and embeddings.
You can list the models using a call curl http://localhost:8000/v1/config
You can get the tokens by sending the text using KServer infer call like this: curl -X POST http://localhost:2000/v2/models/Alibaba-NLP%2Fgte-large-en-v1.5_tokenizer_model/infer -H "Content-Type: application/json" -d '{"inputs" : [ {"name" : "Parameter_1", "shape" : [ 1 ], "datatype" : "BYTES", "data" : ["This is my test"]} ]}'
Note that the model name needs to be http encoded in the URL. The reponse will be similar to:
Would it be sufficient?
Note that it is also possible to export the model with automatic truncating of the input text to match the model context length. Check the --help for export_models.py script.
Describe the bug
Client need to insure embedding inputs with limited size(max tokens), so need embedding service provide tokenizer endpoint also.
Most embedding model also included tokenizer models also, service level need expose tokenizer is make sense.
Currently client need implement local tokenizer to calculate the token count. that consume Client resource very much.
Other engine (TEI) provide tokenize endpoint
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Logs
Logs from OVMS, ideally with --log_level DEBUG. Logs from client.
Configuration
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: