-
Notifications
You must be signed in to change notification settings - Fork 11
Average GPA Graph and Sentiment Analysis Secion #1120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Yu Cao - Feature 101 About Page
Adding vi to team alums
added my picture and about
feat(docker): upgrade docker to 3.12
revert: gcp on 3.11
Update Fontawesome Kit
Add extra error logs
fix(middleware): don't fetch DNE fields
October 2024 Master Merge
About ronit
… into avg_gpa_trend
… into avg_gpa_trend
…owse.py for a custom query
…owse.py for a custom query
… History Found for 1 or less points found
… History Found for 1 or less points found
- cannot modify the offset amount for some reason
WalkthroughAdds sentiment analysis tooling and GPA trend computation: new management commands to analyze reviews and average sentiments, model update linking CourseGrade to Semester, loader changes to include term context, view helpers to assemble sentiment distributions and GPA history, and template/CSS/JS to render sentiment bar and GPA line charts on the course-professor page. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User as User
participant View as Django view: course_instructor
participant CSV1 as reviews_data_with_sentiment.csv
participant CSV2 as grade_data/csv/*
participant Tmpl as Template: course_professor.html
participant Chart as Chart.js (browser)
User->>View: GET /course/<course_id>/instructor/<instructor_id>
View->>CSV1: Load sentiments DataFrame
View->>View: get_sentiments(instructor, course)
View->>View: categorize_sentiments(scores)
View->>CSV2: Load grade CSVs
View->>View: get_course_term_gpa(course_id, instructor_id)
View->>Tmpl: Render with {sentiment_distribution, gpa_history}
Tmpl-->>User: HTML with chart canvases
User->>Chart: Initialize charts (bar + line)
Chart-->>User: Rendered sentiment and GPA trend
sequenceDiagram
autonumber
participant Cmd as manage.py sentiment_analyze
participant FS as reviews_data/reviews_data.csv
participant Model as RoBERTa sentiment model
participant Out as reviews_data/reviews_data_with_sentiment.csv
Cmd->>FS: Read reviews CSV
loop For each review
Cmd->>Cmd: split_text(text, max_tokens=512)
loop For each chunk
Cmd->>Model: Infer probabilities
Model-->>Cmd: pos/neu/neg scores
Cmd->>Cmd: get_sentiment_score(chunk)
end
Cmd->>Cmd: compute_average_sentiment(chunk_scores)
end
Cmd->>Out: Write CSV with sentiment_score
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tcf_website/management/commands/load_grades.py (1)
312-331: Don’t assign a string to the newsemesterFK.
row[-1]is the term description string from the identifier tuple, soCourseGrade(**course_grade_params)will try to stuff"Fall 2022"(etc.) into the FK column and explode on bulk_create. You need to resolve that descriptor to the actualSemesterinstance (or its PK) before populating the params—e.g. hydrate aself.semesterslookup keyed by term number/desc and use that entry here (and likewise for instructor rows).Apply this diff after introducing a proper lookup (for illustration only):
- course_grade_params = { - "course_id": self.courses.get(row[:2]), - "semester": row[-1], + course_grade_params = { + "course_id": self.courses.get(row[:2]), + "semester_id": self.semesters[row[-1]], ... - if is_instructor_grade: - course_grade_params["instructor_id"] = self.instructors.get(row[2:4]) + if is_instructor_grade: + course_grade_params["instructor_id"] = self.instructors.get(row[2:4])Make sure
self.semesters[...]is populated during__init__so every identifier resolves to a valid FK.tcf_website/templates/course/course_professor.html (1)
205-698: Avoid crashing when the GPA chart isn’t rendered.The template skips rendering
<canvas id="gpaChart">whengpa_history|length < 3, but the script still executes and unconditionally doesgpaChartCanvas.classList.add(...). When the dataset has 2 terms,gpaChartCanvasisnull, so this throws and the whole script dies (also preventing the sentiment chart from loading). Guard all DOM usage behind a presence check (or bail early when the elements aren’t there) so the page stays functional for shorter histories.Wrap the GPA logic with:
- if (gpaHistory.length <= 1) { - gpaChartCanvas.classList.add('d-none'); + if (!gpaChartCanvas || gpaHistory.length <= 1) { + if (gpaChartCanvas) { + gpaChartCanvas.classList.add('d-none'); + } gpaNoDataDiv.classList.remove('d-none'); ...(and similarly guard
gpaNoDataDiv/ chart construction).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
tcf_website/management/commands/reviews_data/reviews_data.csvis excluded by!**/*.csvtcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csvis excluded by!**/*.csv
📒 Files selected for processing (8)
tcf_website/management/commands/average_sentiments.py(1 hunks)tcf_website/management/commands/load_grades.py(4 hunks)tcf_website/management/commands/sentiment_analyze.py(1 hunks)tcf_website/models/models.py(1 hunks)tcf_website/static/course/course_professor.css(4 hunks)tcf_website/static/course/javascript/course_data.js(1 hunks)tcf_website/templates/course/course_professor.html(2 hunks)tcf_website/views/browse.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tcf_website/templates/course/course_professor.html (2)
tcf_website/migrations/0004_grades_data_overhaul.py (1)
Migration(6-144)tcf_website/legacy_models.py (1)
Grades(316-344)
tcf_website/models/models.py (5)
tcf_website/migrations/0006_answer_semester_question_instructor_votequestion_and_more.py (1)
Migration(9-134)tcf_website/migrations/0004_remove_coursegrade_d_remove_coursegrade_d_minus_and_more.py (1)
Migration(6-119)tcf_website/legacy_models.py (2)
Grades(316-344)Semesters(109-123)tcf_website/migrations/0004_grades_data_overhaul.py (1)
Migration(6-144)tcf_website/migrations/0006_alter_coursegrade_average_and_more.py (1)
Migration(6-23)
tcf_website/views/browse.py (2)
tcf_website/management/commands/average_sentiments.py (2)
extract_professor_name(5-6)extract_course_mnemonic(8-9)tcf_website/models/models.py (2)
full_name(272-274)Course(536-872)
tcf_website/management/commands/average_sentiments.py (1)
tcf_website/views/browse.py (2)
extract_professor_name(310-311)extract_course_mnemonic(314-315)
🪛 GitHub Check: eslint
tcf_website/static/course/javascript/course_data.js
[failure] 329-329:
Insert ⏎
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pylint
- GitHub Check: django
| def extract_professor_name(professor_full): | ||
| return re.match(r'^[^()]+', professor_full).group().strip() | ||
|
|
||
| def extract_course_mnemonic(course_full): | ||
| return course_full.split(' |')[0].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle missing instructor/course values safely
re.match(...).group() and .split() assume every CSV row has non-empty strings. Pandas will pass NaN (floats) for missing values, which makes re.match raise TypeError and breaks the command on real data. Guard the inputs or drop null rows before applying the regex/split so the pipeline continues even when data is incomplete.
Apply this diff to make the helpers resilient:
def extract_professor_name(professor_full):
- return re.match(r'^[^()]+', professor_full).group().strip()
+ if not isinstance(professor_full, str) or not professor_full.strip():
+ return ""
+ match = re.match(r"^[^()]+", professor_full)
+ return match.group().strip() if match else professor_full.strip()
def extract_course_mnemonic(course_full):
- return course_full.split(' |')[0].strip()
+ if not isinstance(course_full, str) or not course_full.strip():
+ return ""
+ return course_full.split(" |", 1)[0].strip()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def extract_professor_name(professor_full): | |
| return re.match(r'^[^()]+', professor_full).group().strip() | |
| def extract_course_mnemonic(course_full): | |
| return course_full.split(' |')[0].strip() | |
| def extract_professor_name(professor_full): | |
| if not isinstance(professor_full, str) or not professor_full.strip(): | |
| return "" | |
| match = re.match(r"^[^()]+", professor_full) | |
| return match.group().strip() if match else professor_full.strip() | |
| def extract_course_mnemonic(course_full): | |
| if not isinstance(course_full, str) or not course_full.strip(): | |
| return "" | |
| return course_full.split(" |", 1)[0].strip() |
| if __name__ == "__main__": | ||
| avg_sentiment_df = avg_sentiment_df_creator() | ||
| print(avg_sentiment_df) | ||
|
|
||
| while True: | ||
| professor = input("Enter professor name (or type 'exit' to quit): ") | ||
| if professor.lower() == 'exit': | ||
| break | ||
| course = input("Enter course name (or type 'exit' to quit): ") | ||
| if course.lower() == 'exit': | ||
| break | ||
| query_average(avg_sentiment_df, professor, course) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expose this logic via a proper Django management command
Because this module sits under management/commands/, Django expects a Command subclass. With only a __main__ block, python manage.py average_sentiments raises AttributeError: module ... has no attribute 'Command', so the command cannot run. Wrap the existing workflow in a BaseCommand implementation so it can be executed through Django’s management interface.
Apply this diff to add the command scaffold and optional interactive mode:
-import os
-import re
-import pandas as pd
+import os
+import re
+import pandas as pd
+from django.core.management.base import BaseCommand
@@
-def query_average(df, professor, course):
+def query_average(df, professor, course):
result = df[(df["instructor_name_only"] == professor) & (df["course_code_only"] == course)]
if result.empty:
print("No data found for the given professor and course.")
else:
print(f"Average sentiment score for {professor} in {course}: {result['sentiment_score'].values[0]:.2f}")
-
-if __name__ == "__main__":
- avg_sentiment_df = avg_sentiment_df_creator()
- print(avg_sentiment_df)
-
- while True:
- professor = input("Enter professor name (or type 'exit' to quit): ")
- if professor.lower() == 'exit':
- break
- course = input("Enter course name (or type 'exit' to quit): ")
- if course.lower() == 'exit':
- break
- query_average(avg_sentiment_df, professor, course)
+def interactive_loop(avg_sentiment_df):
+ print(avg_sentiment_df)
+ while True:
+ professor = input("Enter professor name (or type 'exit' to quit): ")
+ if professor.lower() == "exit":
+ break
+ course = input("Enter course name (or type 'exit' to quit): ")
+ if course.lower() == "exit":
+ break
+ query_average(avg_sentiment_df, professor, course)
+
+
+class Command(BaseCommand):
+ help = "Generate average sentiment scores for instructor-course pairs."
+
+ def add_arguments(self, parser):
+ parser.add_argument(
+ "--interactive",
+ action="store_true",
+ help="Print the dataframe and prompt for instructor/course lookups.",
+ )
+
+ def handle(self, *args, **options):
+ avg_sentiment_df = avg_sentiment_df_creator()
+ if options["interactive"]:
+ interactive_loop(avg_sentiment_df)
+ else:
+ self.stdout.write(avg_sentiment_df.to_string(index=False))| AutoTokenizer, | ||
| ) | ||
|
|
||
| ssl._create_default_https_context = ssl._create_unverified_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not disable SSL certificate verification globally
Overriding ssl._create_default_https_context with ssl._create_unverified_context disables TLS verification for every HTTPS request in this process. That is a serious security regression—any network call (model download, analytics, etc.) becomes vulnerable to MITM attacks. Instead, keep verification enabled and, if a specific source requires a different context, scope the override to that single request.
| # Converts professor name to solely full name without email from CSV | ||
| def extract_professor_name(professor_full): | ||
| return re.match(r'^[^()]+', professor_full).group().strip() | ||
|
|
||
| # Converts course title in CSV to solely just mnemonic and number | ||
| def extract_course_mnemonic(course_full): | ||
| return course_full.split(' |')[0].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against null instructor/course strings
Same issue as in the management command: reviews with missing instructor/course values will pass NaN into these helpers, and re.match / .split() will throw, taking down the view. Please reuse the safer guards (or drop null rows) so production traffic doesn’t see 500s when data is incomplete.
🤖 Prompt for AI Agents
In tcf_website/views/browse.py around lines 309-315, the helpers assume
professor_full/course_full are valid strings and will raise when given null/NaN;
change them to guard first and return a safe default (e.g., empty string) for
missing values, and then perform string operations. For extract_professor_name,
check for falsy or non-string input (or wrap in a try) and return '' if invalid,
otherwise run the regex and safely handle a None match before calling .group();
for extract_course_mnemonic, check for falsy input and return '' if invalid,
otherwise convert to str (to handle non-string NaN) and then split/strip as
before. Ensure no exceptions propagate for null/NaN inputs so the view won’t
500.
| def sentiments_df_creator(): | ||
| reviews_data_path = 'tcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csv' | ||
| df = pd.read_csv(reviews_data_path) | ||
| df["instructor_name_only"] = df["instructor"].apply(extract_professor_name) | ||
| df["course_code_only"] = df["course"].apply(extract_course_mnemonic) | ||
| sentiments = df[["instructor_name_only", "course_code_only", "sentiment_score"]] | ||
| return sentiments | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fail gracefully when the sentiment CSV is absent
pd.read_csv will raise FileNotFoundError if reviews_data_with_sentiment.csv hasn’t been generated yet (fresh environments, CI, etc.), causing the entire instructor page to error. Wrap the load in a try/except (or check os.path.exists) and return an empty dataframe so the page renders without sentiment data instead of crashing.
Apply this diff to add a safe fallback:
def sentiments_df_creator():
- reviews_data_path = 'tcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csv'
- df = pd.read_csv(reviews_data_path)
+ reviews_data_path = "tcf_website/management/commands/reviews_data/reviews_data_with_sentiment.csv"
+ if not os.path.exists(reviews_data_path):
+ return pd.DataFrame(columns=["instructor_name_only", "course_code_only", "sentiment_score"])
+ df = pd.read_csv(reviews_data_path)
df["instructor_name_only"] = df["instructor"].apply(extract_professor_name)
df["course_code_only"] = df["course"].apply(extract_course_mnemonic)
sentiments = df[["instructor_name_only", "course_code_only", "sentiment_score"]]
return sentiments🤖 Prompt for AI Agents
In tcf_website/views/browse.py around lines 318 to 325, the call to pd.read_csv
will raise FileNotFoundError if reviews_data_with_sentiment.csv is missing and
crash the page; modify sentiments_df_creator to check for the file
(os.path.exists) or wrap pd.read_csv in try/except FileNotFoundError, and if the
file is absent return an empty DataFrame with the expected columns
("instructor_name_only", "course_code_only", "sentiment_score") so the view can
render without sentiment data.
| parts = term.split() | ||
| year, season = parts | ||
|
|
||
| season_map = {"Spring": 0, "Fall": 1} | ||
|
|
||
| transformed.append( | ||
| { | ||
| "semester_term": f"{season[0]}{year[2:]}", | ||
| "average_gpa": round(gpa, 2) if gpa is not None else None, | ||
| "year": year, | ||
| "season_order": season_map[season], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix year/season extraction in GPA trend
Here parts = term.split() (e.g., "Fall 2023" → ["Fall", "2023"]), but the assignment year, season = parts swaps them. That makes season equal the year string, so season_map[season] raises KeyError, and semester_term becomes nonsense. Swap the unpacking order (or explicitly index) before looking up season_map.
Apply this diff to correct the mapping:
- parts = term.split()
- year, season = parts
+ parts = term.split()
+ if len(parts) < 2:
+ continue
+ season, year = parts[0], parts[1]
@@
- "semester_term": f"{season[0]}{year[2:]}",
+ "semester_term": f"{season[0]}{year[2:]}",
"average_gpa": round(gpa, 2) if gpa is not None else None,
- "year": year,
- "season_order": season_map[season],
+ "year": year,
+ "season_order": season_map[season],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parts = term.split() | |
| year, season = parts | |
| season_map = {"Spring": 0, "Fall": 1} | |
| transformed.append( | |
| { | |
| "semester_term": f"{season[0]}{year[2:]}", | |
| "average_gpa": round(gpa, 2) if gpa is not None else None, | |
| "year": year, | |
| "season_order": season_map[season], | |
| } | |
| parts = term.split() | |
| if len(parts) < 2: | |
| continue | |
| season, year = parts[0], parts[1] | |
| season_map = {"Spring": 0, "Fall": 1} | |
| transformed.append( | |
| { | |
| "semester_term": f"{season[0]}{year[2:]}", | |
| "average_gpa": round(gpa, 2) if gpa is not None else None, | |
| "year": year, | |
| "season_order": season_map[season], | |
| } |
🤖 Prompt for AI Agents
In tcf_website/views/browse.py around lines 682 to 693, the unpacking of parts =
term.split() is reversed (year, season = parts) which makes season contain the
year and causes KeyError and incorrect semester_term; change the unpacking to
season, year = parts (or use parts[0] and parts[1]) so season is the season
string and year is the year string, then leave semester_term, average_gpa, year,
and season_order assignments as-is so season_map[season] works and semester_term
uses season[0] and year[2:].
Added average gpa trend graph and seniment analysis of reviews graph.
Go to avg_gpa_trend branch and view a course (that has sufficient history)
Summary by CodeRabbit
New Features
Style
Chores