A machine learning pipeline to classify IMDB movie reviews as positive or negative using NLP preprocessing, TF-IDF vectorization, and a Logistic Regression model.
This project builds a text sentiment analysis model using the IMDB Reviews Dataset. The pipeline involves preprocessing, vectorization, training, evaluation, and live prediction.
This project uses the IMDB Reviews Dataset:
β‘οΈ Download from Kaggle
Steps:
- Download and extract the dataset.
- Rename or ensure the file is named
IMDB_Dataset.csv. - Place it in the project root directory.
β οΈ The dataset is not included in the repo due to GitHub file size limits.
- Python 3
- Pandas
- Scikit-learn
- NLTK
- TF-IDF Vectorizer
- Logistic Regression
- Clean and normalize text using NLP techniques
- Convert reviews into numerical features using TF-IDF
- Train and evaluate a logistic regression model
- Save trained model and vectorizer for reuse
- Predict sentiment of custom reviews in real-time
| Metric | Score |
|---|---|
| Accuracy | 85.13% |
| F1-Score | 85% |
text-sentiment-analysis/
βββ IMDB\_Dataset.csv
βββ sentiment\_analysis.py
βββ sentiment\_model.pkl
βββ tfidf\_vectorizer.pkl
βββ README.md
-
Clone the repo
git clone https://github.com/ahsankhizar5/text-sentiment-analysis.git cd text-sentiment-analysis -
Install dependencies
pip install -r requirements.txt
-
Run the script
python sentiment_analysis.py
-
Enter your own review for live prediction!
π Try your own review:
Enter a movie review: this seems to be bad one
Predicted Sentiment: Negative π
MIT License
For queries or collaboration, feel free to reach out: Ahsan Khizar GitHub β LinkedIn
βCode is not just about solving problems. Itβs about building trust, clarity, and real-world impact β one line at a time.β> β Ahsan Khizar