203 add support for roman numerals within search #800

ProfessorAtomicManiac · 2025-06-04T07:18:58Z

@twangodev pls test on your end to make sure its all good. I ended up changing a lot of original parameters

Copilot

Pull Request Overview

This PR adds support for Roman numeral handling in course searches and refines instructor indexing and search analyzers.

Introduces a roman_numerals_synonym_filter and course_analyzer for courses
Adds a min_length_filter and instructor_analyzer to clean up instructor tokens
Updates search queries to use the new analyzers and adjusts boost settings

Comments suppressed due to low confidence (6)

search/es_util.py:365

The slop parameter only applies to phrase-based queries. For multi_match with type: "best_fields", either remove slop or switch to type: "phrase"/"phrase_prefix" if phrase proximity is required.

"slop": 1

search/es_util.py:110

[nitpick] Update this function docstring to mention the addition of the Roman numeral synonym filter and course_analyzer so that maintainers are aware of the new indexing behavior.

Index courses into Elasticsearch.

search/es_util.py:279

[nitpick] Consider revising this comment for clarity and tone (e.g., remove "literally all it does it" and explain why single-character tokens are excluded).

# literally all it does it to remove tokens shorter than 2 characters after analyzer tokenizes the string

search/es_util.py:118

Add or update tests that verify the Roman numeral synonyms are correctly applied during indexing and that searches for "I" vs "1" return equivalent results.

"roman_numerals_synonym_filter": {

search/es_util.py:166

Deleting and recreating the index on each run will drop existing data. Consider using index templates or migrations to update mappings without data loss.

if es.indices.exists(index="courses"):

search/es_util.py:287

A min_length_filter with min: 2 will drop valid single-character tokens (e.g., initials or names like 'O'). Confirm this behavior is acceptable or adjust the filter.

"min": 2

Copilot · 2025-06-04T07:21:11Z

search/es_util.py

+    if es.indices.exists(index="instructors"):
+        es.indices.delete(index="instructors")
+    es.indices.create(index="instructors", body=settings)


[nitpick] The index creation logic for courses and instructors is duplicated. Consider extracting a helper function to DRY this pattern and simplify future changes.

Suggested change

if es.indices.exists(index="instructors"):

es.indices.delete(index="instructors")

es.indices.create(index="instructors", body=settings)

create_index(es, "instructors", settings)

twangodev · 2025-06-04T18:25:46Z

I'll leave this PR open, there's a lot to review and I just want to make sure that this doesn't impact the current search scoring.

I think we should begin looking into CICD checks for search, defining some test cases and see how well search can match queries (i.e. 75% of queries contain an expected result, 50% or lower should fail the CI job). That should help us clearly define what how/what makes the search better.

ProfessorAtomicManiac · 2025-06-07T21:48:45Z

I can try to look into CI/CD. I actually need to do CI/CD later for my job so I would really want to learn it

ProfessorAtomicManiac added 3 commits June 4, 2025 00:07

added roman numeral snonyms

58e0cc6

made it so instructors middle initial cannot heavily influence results

5a162bb

tuned subject relevance

4e481bc

Copilot AI review requested due to automatic review settings June 4, 2025 07:18

ProfessorAtomicManiac linked an issue Jun 4, 2025 that may be closed by this pull request

add support for roman numerals within search #203

Open

Copilot AI reviewed Jun 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

203 add support for roman numerals within search #800

203 add support for roman numerals within search #800

Uh oh!

ProfessorAtomicManiac commented Jun 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 4, 2025

Uh oh!

twangodev commented Jun 4, 2025 •

edited

Loading

Uh oh!

ProfessorAtomicManiac commented Jun 7, 2025

Uh oh!

Uh oh!

203 add support for roman numerals within search #800

Are you sure you want to change the base?

203 add support for roman numerals within search #800

Uh oh!

Conversation

ProfessorAtomicManiac commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

twangodev commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProfessorAtomicManiac commented Jun 7, 2025

Uh oh!

Uh oh!

ProfessorAtomicManiac commented Jun 4, 2025 •

edited

Loading

twangodev commented Jun 4, 2025 •

edited

Loading