Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.

Conversation

@SamSaffron
Copy link
Member

Adds context length controls to researcher (max tokens per post and batch)
Allow picking LLM for researcher
Fix bug where unicode usernames were not working
Fix documentation of OR logic

Adds context length controls to researcher (max tokens per post and batch)
Allow picking LLM for researcher
Fix bug where unicode usernames were not working
Fix documentation of OR logic

# Create test content with long text to test token truncation
topic = Fabricate(:topic, category: category, tags: [tag_research])
long_content = "zz " * 100 # This will exceed our token limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 is abit of a magic number here. I wonder if we can generate the long content here based on a constant so that updating the constant in the future doesn't break this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is fine, cause later on we check for it only showing up 48 times... it is a bit magic and may break if we swap tokenizers, but overall is pretty safe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +156 to +161
register_filter(/\Ausernames?:(.+)\z/i) do |relation, username, filter|
user_ids = User.where(username_lower: username.split(",").map(&:downcase)).pluck(:id)
if user_ids.empty?
relation.where("1 = 0")
else
relation.where("1 = 0") # No results if user doesn't exist
relation.where("posts.user_id IN (?)", user_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this to the following and let the postgres planner optimize the query.

        register_filter(/\Ausernames?:(.+)\z/i) do |relation, username, filter|
          relation.where("posts.user_id IN (?)", User.where(username_lower: username.split(",").map(&:downcase)).select(:id))
        end

Also slightly safer as we are not plucking the User#ids into memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started off with that and it failed, so I ended up simplifying, I think the odds of getting a problem here memory wise is very low given llm is streaming in usernames very unlikely the list will get too long.

@SamSaffron SamSaffron merged commit 3e74eea into main Jun 4, 2025
6 checks passed
@SamSaffron SamSaffron deleted the bugfix-wed branch June 4, 2025 06:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants