Skip to content

[MODULE] - Tiktoken Length Classifier #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jhoetter opened this issue Sep 26, 2023 · 4 comments
Closed

[MODULE] - Tiktoken Length Classifier #347

jhoetter opened this issue Sep 26, 2023 · 4 comments
Labels
cognition enhancement New feature or request

Comments

@jhoetter
Copy link
Member

Please describe the module you would like to add to bricks
See here: https://platform.openai.com/tokenizer

Do you already have an implementation?
Tiktoken has a basic implementation in their docs, e.g. for gpt-3.5.-turbo. We just need to calculate the number of tokens of a text input.

Additional context
Also use this to calculate the complexity of a paragraph for RAG; for instance, > 1000 tokens means a high complexity.

@jhoetter jhoetter added enhancement New feature or request cognition labels Sep 26, 2023
@jhoetter
Copy link
Member Author

Those are two bricks, if I think about it. one to just calculate the number of tokens (i.e. returning an integer), and one brick that categorizes the text into discrete classes like "short", "medium", "long" etc.

@LeonardPuettmannKern
Copy link
Contributor

@jhoetter I will rename this issue to "Tiktoken Length Classifier" and open a separate issue for a "Tiktoken Token Counter" generator brick!

@LeonardPuettmannKern LeonardPuettmannKern changed the title [MODULE] - Tiktoken Length Enrichment [MODULE] - Tiktoken Length Classifier Sep 28, 2023
@LeonardPuettmannKern
Copy link
Contributor

See here: #359

@LeonardPuettmannKern
Copy link
Contributor

Implemented with #360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cognition enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants