Skip to content

Conversation

@alex-yung-github
Copy link
Contributor

@alex-yung-github alex-yung-github commented Apr 27, 2025

Added average gpa trend graph and seniment analysis of reviews graph.

Go to avg_gpa_trend branch and view a course (that has sufficient history)

Summary by CodeRabbit

  • New Features

    • Added sentiment analysis bar chart on course–professor pages, shown when sufficient reviews exist.
    • Added per-term GPA trend line chart with contextual messages when data is insufficient.
    • Enhanced pages to surface sentiment distribution and GPA history alongside existing details.
  • Style

    • Improved responsive layout for chart containers and adjusted font sizes for better readability across breakpoints.
  • Chores

    • Introduced tooling to compute and aggregate review sentiments.
    • Extended grade data to include semester information to support per-term GPA trends.

barrett-ruth and others added 30 commits October 21, 2024 15:54
feat(docker): upgrade docker to 3.12
fix(middleware): don't fetch DNE fields
deesig and others added 28 commits February 16, 2025 14:23
- cannot modify the offset amount for some reason
@coderabbitai
Copy link

coderabbitai bot commented Oct 4, 2025

Walkthrough

Adds sentiment analysis tooling and GPA trend computation: new management commands to analyze reviews and average sentiments, model update linking CourseGrade to Semester, loader changes to include term context, view helpers to assemble sentiment distributions and GPA history, and template/CSS/JS to render sentiment bar and GPA line charts on the course-professor page.

Changes

Cohort / File(s) Summary of Changes
Sentiment analysis management commands
tcf_website/management/commands/sentiment_analyze.py, tcf_website/management/commands/average_sentiments.py
New commands: batch sentiment scoring of reviews CSV using RoBERTa; CSV-to-DataFrame utilities to compute and query average sentiments by instructor and course; interactive querying loop.
Grades import with per-term differentiation
tcf_website/management/commands/load_grades.py
Keys extended to include term_desc; instructor key narrowed; semester field added to payload; per-term aggregation keys updated.
Model update
tcf_website/models/models.py
CourseGrade gains a foreign key to Semester (semester).
Views: analytics integration
tcf_website/views/browse.py
Adds CSV-driven helpers to load sentiments, categorize distribution, and compute per-term GPA trends; integrates sentiment_distribution and gpa_history into course_instructor context.
Frontend charts and styles
tcf_website/templates/course/course_professor.html, tcf_website/static/course/course_professor.css, tcf_website/static/course/javascript/course_data.js
Template adds sentiment bar chart and GPA line chart with conditional rendering and a custom Chart.js plugin; responsive styles for chart container and sentiment text; JS export line unchanged functionally.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as User
  participant View as Django view: course_instructor
  participant CSV1 as reviews_data_with_sentiment.csv
  participant CSV2 as grade_data/csv/*
  participant Tmpl as Template: course_professor.html
  participant Chart as Chart.js (browser)

  User->>View: GET /course/<course_id>/instructor/<instructor_id>
  View->>CSV1: Load sentiments DataFrame
  View->>View: get_sentiments(instructor, course)
  View->>View: categorize_sentiments(scores)
  View->>CSV2: Load grade CSVs
  View->>View: get_course_term_gpa(course_id, instructor_id)
  View->>Tmpl: Render with {sentiment_distribution, gpa_history}
  Tmpl-->>User: HTML with chart canvases
  User->>Chart: Initialize charts (bar + line)
  Chart-->>User: Rendered sentiment and GPA trend
Loading
sequenceDiagram
  autonumber
  participant Cmd as manage.py sentiment_analyze
  participant FS as reviews_data/reviews_data.csv
  participant Model as RoBERTa sentiment model
  participant Out as reviews_data/reviews_data_with_sentiment.csv

  Cmd->>FS: Read reviews CSV
  loop For each review
    Cmd->>Cmd: split_text(text, max_tokens=512)
    loop For each chunk
      Cmd->>Model: Infer probabilities
      Model-->>Cmd: pos/neu/neg scores
      Cmd->>Cmd: get_sentiment_score(chunk)
    end
    Cmd->>Cmd: compute_average_sentiment(chunk_scores)
  end
  Cmd->>Out: Write CSV with sentiment_score
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A whisker twitch, a chart takes flight,
Bars of mood and lines of light—
Semesters hop in tidy rows,
Sentiment blooms where data flows.
I nibble CSVs with glee,
And plot a GPA willow tree.
Thump! More insight for you and me. 🐇📈

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description does not follow the repository’s required template and is missing all key sections such as GitHub Issues addressed, What I did, Screenshots, Testing, and Questions/Discussions/Notes, leaving reviewers without structured context or testing instructions. Update the description to include each template heading with relevant details, including linked issue numbers, a summary of changes under “What I did,” before/after screenshots, testing steps, and any discussion notes to satisfy the repository’s guidelines.
Docstring Coverage ⚠️ Warning Docstring coverage is 47.37% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly summarizes the primary additions of an average GPA graph and a sentiment analysis section, directly reflecting the main UI enhancements introduced in this changeset. It is concise and specific, without extraneous details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch avg_gpa_trend

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tcf_website/management/commands/load_grades.py (1)

312-331: Don’t assign a string to the new semester FK.

row[-1] is the term description string from the identifier tuple, so CourseGrade(**course_grade_params) will try to stuff "Fall 2022" (etc.) into the FK column and explode on bulk_create. You need to resolve that descriptor to the actual Semester instance (or its PK) before populating the params—e.g. hydrate a self.semesters lookup keyed by term number/desc and use that entry here (and likewise for instructor rows).

Apply this diff after introducing a proper lookup (for illustration only):

-        course_grade_params = {
-            "course_id": self.courses.get(row[:2]),
-            "semester": row[-1],
+        course_grade_params = {
+            "course_id": self.courses.get(row[:2]),
+            "semester_id": self.semesters[row[-1]],
...
-        if is_instructor_grade:
-            course_grade_params["instructor_id"] = self.instructors.get(row[2:4])
+        if is_instructor_grade:
+            course_grade_params["instructor_id"] = self.instructors.get(row[2:4])

Make sure self.semesters[...] is populated during __init__ so every identifier resolves to a valid FK.

tcf_website/templates/course/course_professor.html (1)

205-698: Avoid crashing when the GPA chart isn’t rendered.

The template skips rendering <canvas id="gpaChart"> when gpa_history|length < 3, but the script still executes and unconditionally does gpaChartCanvas.classList.add(...). When the dataset has 2 terms, gpaChartCanvas is null, so this throws and the whole script dies (also preventing the sentiment chart from loading). Guard all DOM usage behind a presence check (or bail early when the elements aren’t there) so the page stays functional for shorter histories.

Wrap the GPA logic with:

-        if (gpaHistory.length <= 1) {
-            gpaChartCanvas.classList.add('d-none');
+        if (!gpaChartCanvas || gpaHistory.length <= 1) {
+            if (gpaChartCanvas) {
+                gpaChartCanvas.classList.add('d-none');
+            }
             gpaNoDataDiv.classList.remove('d-none');
             ...

(and similarly guard gpaNoDataDiv / chart construction).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e37a946 and a2861ae.

⛔ Files ignored due to path filters (2)
  • tcf_website/management/commands/reviews_data/reviews_data.csv is excluded by !**/*.csv
  • tcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csv is excluded by !**/*.csv
📒 Files selected for processing (8)
  • tcf_website/management/commands/average_sentiments.py (1 hunks)
  • tcf_website/management/commands/load_grades.py (4 hunks)
  • tcf_website/management/commands/sentiment_analyze.py (1 hunks)
  • tcf_website/models/models.py (1 hunks)
  • tcf_website/static/course/course_professor.css (4 hunks)
  • tcf_website/static/course/javascript/course_data.js (1 hunks)
  • tcf_website/templates/course/course_professor.html (2 hunks)
  • tcf_website/views/browse.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tcf_website/templates/course/course_professor.html (2)
tcf_website/migrations/0004_grades_data_overhaul.py (1)
  • Migration (6-144)
tcf_website/legacy_models.py (1)
  • Grades (316-344)
tcf_website/models/models.py (5)
tcf_website/migrations/0006_answer_semester_question_instructor_votequestion_and_more.py (1)
  • Migration (9-134)
tcf_website/migrations/0004_remove_coursegrade_d_remove_coursegrade_d_minus_and_more.py (1)
  • Migration (6-119)
tcf_website/legacy_models.py (2)
  • Grades (316-344)
  • Semesters (109-123)
tcf_website/migrations/0004_grades_data_overhaul.py (1)
  • Migration (6-144)
tcf_website/migrations/0006_alter_coursegrade_average_and_more.py (1)
  • Migration (6-23)
tcf_website/views/browse.py (2)
tcf_website/management/commands/average_sentiments.py (2)
  • extract_professor_name (5-6)
  • extract_course_mnemonic (8-9)
tcf_website/models/models.py (2)
  • full_name (272-274)
  • Course (536-872)
tcf_website/management/commands/average_sentiments.py (1)
tcf_website/views/browse.py (2)
  • extract_professor_name (310-311)
  • extract_course_mnemonic (314-315)
🪛 GitHub Check: eslint
tcf_website/static/course/javascript/course_data.js

[failure] 329-329:
Insert

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pylint
  • GitHub Check: django

Comment on lines +5 to +9
def extract_professor_name(professor_full):
return re.match(r'^[^()]+', professor_full).group().strip()

def extract_course_mnemonic(course_full):
return course_full.split(' |')[0].strip()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle missing instructor/course values safely

re.match(...).group() and .split() assume every CSV row has non-empty strings. Pandas will pass NaN (floats) for missing values, which makes re.match raise TypeError and breaks the command on real data. Guard the inputs or drop null rows before applying the regex/split so the pipeline continues even when data is incomplete.

Apply this diff to make the helpers resilient:

 def extract_professor_name(professor_full):
-    return re.match(r'^[^()]+', professor_full).group().strip()
+    if not isinstance(professor_full, str) or not professor_full.strip():
+        return ""
+    match = re.match(r"^[^()]+", professor_full)
+    return match.group().strip() if match else professor_full.strip()
 
 def extract_course_mnemonic(course_full):
-    return course_full.split(' |')[0].strip()
+    if not isinstance(course_full, str) or not course_full.strip():
+        return ""
+    return course_full.split(" |", 1)[0].strip()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_professor_name(professor_full):
return re.match(r'^[^()]+', professor_full).group().strip()
def extract_course_mnemonic(course_full):
return course_full.split(' |')[0].strip()
def extract_professor_name(professor_full):
if not isinstance(professor_full, str) or not professor_full.strip():
return ""
match = re.match(r"^[^()]+", professor_full)
return match.group().strip() if match else professor_full.strip()
def extract_course_mnemonic(course_full):
if not isinstance(course_full, str) or not course_full.strip():
return ""
return course_full.split(" |", 1)[0].strip()

Comment on lines +27 to +38
if __name__ == "__main__":
avg_sentiment_df = avg_sentiment_df_creator()
print(avg_sentiment_df)

while True:
professor = input("Enter professor name (or type 'exit' to quit): ")
if professor.lower() == 'exit':
break
course = input("Enter course name (or type 'exit' to quit): ")
if course.lower() == 'exit':
break
query_average(avg_sentiment_df, professor, course) No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Expose this logic via a proper Django management command

Because this module sits under management/commands/, Django expects a Command subclass. With only a __main__ block, python manage.py average_sentiments raises AttributeError: module ... has no attribute 'Command', so the command cannot run. Wrap the existing workflow in a BaseCommand implementation so it can be executed through Django’s management interface.

Apply this diff to add the command scaffold and optional interactive mode:

-import os
-import re
-import pandas as pd
+import os
+import re
+import pandas as pd
+from django.core.management.base import BaseCommand
 
@@
-def query_average(df, professor, course):
+def query_average(df, professor, course):
     result = df[(df["instructor_name_only"] == professor) & (df["course_code_only"] == course)]
     if result.empty:
         print("No data found for the given professor and course.")
     else:
         print(f"Average sentiment score for {professor} in {course}: {result['sentiment_score'].values[0]:.2f}")
 
-
-if __name__ == "__main__":
-    avg_sentiment_df = avg_sentiment_df_creator()
-    print(avg_sentiment_df)
-    
-    while True:
-        professor = input("Enter professor name (or type 'exit' to quit): ")
-        if professor.lower() == 'exit':
-            break
-        course = input("Enter course name (or type 'exit' to quit): ")
-        if course.lower() == 'exit':
-            break
-        query_average(avg_sentiment_df, professor, course)
+def interactive_loop(avg_sentiment_df):
+    print(avg_sentiment_df)
+    while True:
+        professor = input("Enter professor name (or type 'exit' to quit): ")
+        if professor.lower() == "exit":
+            break
+        course = input("Enter course name (or type 'exit' to quit): ")
+        if course.lower() == "exit":
+            break
+        query_average(avg_sentiment_df, professor, course)
+
+
+class Command(BaseCommand):
+    help = "Generate average sentiment scores for instructor-course pairs."
+
+    def add_arguments(self, parser):
+        parser.add_argument(
+            "--interactive",
+            action="store_true",
+            help="Print the dataframe and prompt for instructor/course lookups.",
+        )
+
+    def handle(self, *args, **options):
+        avg_sentiment_df = avg_sentiment_df_creator()
+        if options["interactive"]:
+            interactive_loop(avg_sentiment_df)
+        else:
+            self.stdout.write(avg_sentiment_df.to_string(index=False))

AutoTokenizer,
)

ssl._create_default_https_context = ssl._create_unverified_context
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not disable SSL certificate verification globally

Overriding ssl._create_default_https_context with ssl._create_unverified_context disables TLS verification for every HTTPS request in this process. That is a serious security regression—any network call (model download, analytics, etc.) becomes vulnerable to MITM attacks. Instead, keep verification enabled and, if a specific source requires a different context, scope the override to that single request.

Comment on lines +309 to +315
# Converts professor name to solely full name without email from CSV
def extract_professor_name(professor_full):
return re.match(r'^[^()]+', professor_full).group().strip()

# Converts course title in CSV to solely just mnemonic and number
def extract_course_mnemonic(course_full):
return course_full.split(' |')[0].strip()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against null instructor/course strings

Same issue as in the management command: reviews with missing instructor/course values will pass NaN into these helpers, and re.match / .split() will throw, taking down the view. Please reuse the safer guards (or drop null rows) so production traffic doesn’t see 500s when data is incomplete.

🤖 Prompt for AI Agents
In tcf_website/views/browse.py around lines 309-315, the helpers assume
professor_full/course_full are valid strings and will raise when given null/NaN;
change them to guard first and return a safe default (e.g., empty string) for
missing values, and then perform string operations. For extract_professor_name,
check for falsy or non-string input (or wrap in a try) and return '' if invalid,
otherwise run the regex and safely handle a None match before calling .group();
for extract_course_mnemonic, check for falsy input and return '' if invalid,
otherwise convert to str (to handle non-string NaN) and then split/strip as
before. Ensure no exceptions propagate for null/NaN inputs so the view won’t
500.

Comment on lines +318 to +325
def sentiments_df_creator():
reviews_data_path = 'tcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csv'
df = pd.read_csv(reviews_data_path)
df["instructor_name_only"] = df["instructor"].apply(extract_professor_name)
df["course_code_only"] = df["course"].apply(extract_course_mnemonic)
sentiments = df[["instructor_name_only", "course_code_only", "sentiment_score"]]
return sentiments

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail gracefully when the sentiment CSV is absent

pd.read_csv will raise FileNotFoundError if reviews_data_with_sentiment.csv hasn’t been generated yet (fresh environments, CI, etc.), causing the entire instructor page to error. Wrap the load in a try/except (or check os.path.exists) and return an empty dataframe so the page renders without sentiment data instead of crashing.

Apply this diff to add a safe fallback:

 def sentiments_df_creator():
-    reviews_data_path = 'tcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csv'
-    df = pd.read_csv(reviews_data_path)
+    reviews_data_path = "tcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csv"
+    if not os.path.exists(reviews_data_path):
+        return pd.DataFrame(columns=["instructor_name_only", "course_code_only", "sentiment_score"])
+    df = pd.read_csv(reviews_data_path)
     df["instructor_name_only"] = df["instructor"].apply(extract_professor_name)
     df["course_code_only"] = df["course"].apply(extract_course_mnemonic)
     sentiments = df[["instructor_name_only", "course_code_only", "sentiment_score"]]
     return sentiments
🤖 Prompt for AI Agents
In tcf_website/views/browse.py around lines 318 to 325, the call to pd.read_csv
will raise FileNotFoundError if reviews_data_with_sentiment.csv is missing and
crash the page; modify sentiments_df_creator to check for the file
(os.path.exists) or wrap pd.read_csv in try/except FileNotFoundError, and if the
file is absent return an empty DataFrame with the expected columns
("instructor_name_only", "course_code_only", "sentiment_score") so the view can
render without sentiment data.

Comment on lines +682 to +693
parts = term.split()
year, season = parts

season_map = {"Spring": 0, "Fall": 1}

transformed.append(
{
"semester_term": f"{season[0]}{year[2:]}",
"average_gpa": round(gpa, 2) if gpa is not None else None,
"year": year,
"season_order": season_map[season],
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix year/season extraction in GPA trend

Here parts = term.split() (e.g., "Fall 2023"["Fall", "2023"]), but the assignment year, season = parts swaps them. That makes season equal the year string, so season_map[season] raises KeyError, and semester_term becomes nonsense. Swap the unpacking order (or explicitly index) before looking up season_map.

Apply this diff to correct the mapping:

-        parts = term.split()
-        year, season = parts
+        parts = term.split()
+        if len(parts) < 2:
+            continue
+        season, year = parts[0], parts[1]
@@
-                "semester_term": f"{season[0]}{year[2:]}",
+                "semester_term": f"{season[0]}{year[2:]}",
                 "average_gpa": round(gpa, 2) if gpa is not None else None,
-                "year": year,
-                "season_order": season_map[season],
+                "year": year,
+                "season_order": season_map[season],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parts = term.split()
year, season = parts
season_map = {"Spring": 0, "Fall": 1}
transformed.append(
{
"semester_term": f"{season[0]}{year[2:]}",
"average_gpa": round(gpa, 2) if gpa is not None else None,
"year": year,
"season_order": season_map[season],
}
parts = term.split()
if len(parts) < 2:
continue
season, year = parts[0], parts[1]
season_map = {"Spring": 0, "Fall": 1}
transformed.append(
{
"semester_term": f"{season[0]}{year[2:]}",
"average_gpa": round(gpa, 2) if gpa is not None else None,
"year": year,
"season_order": season_map[season],
}
🤖 Prompt for AI Agents
In tcf_website/views/browse.py around lines 682 to 693, the unpacking of parts =
term.split() is reversed (year, season = parts) which makes season contain the
year and causes KeyError and incorrect semester_term; change the unpacking to
season, year = parts (or use parts[0] and parts[1]) so season is the season
string and year is the year string, then leave semester_term, average_gpa, year,
and season_order assignments as-is so season_map[season] works and semester_term
uses season[0] and year[2:].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.