This project focuses on performing sentiment analysis on IMDB movie reviews using deep learning techniques with an LSTM (Long Short-Term Memory) model. The dataset used contains 50,000 movie reviews labeled as positive or negative.
- Python
- TensorFlow/Keras
- Pandas
- Scikit-learn
- Kaggle API
The dataset used in this project is the IMDB Dataset of 50K Movie Reviews, which was downloaded from Kaggle.
-
Dataset Downloading:
- The dataset is downloaded using the Kaggle API.
- The zip file is extracted to access the CSV file.
-
Data Preprocessing:
- The dataset is loaded using Pandas.
- Sentiment labels are converted to numerical values (positive: 1, negative: 0).
-
Train-Test Splitting:
- The dataset is split into training (80%) and testing (20%) sets.
-
Tokenization and Padding:
- Tokenization is applied to convert text to sequences.
- Padding is used to ensure uniform sequence length.
-
LSTM Model Building:
- An LSTM model is created with the following layers:
- Embedding layer
- LSTM layer
- Dense output layer with a sigmoid activation function
- Model is compiled using binary cross-entropy loss and the Adam optimizer.
- An LSTM model is created with the following layers:
-
Model Training:
- The model is trained with 5 epochs and a batch size of 64.
- Validation split of 20% is used.
-
Model Evaluation:
- The model is evaluated on the test data.
- Accuracy and loss metrics are reported.
-
Prediction Function:
- A function is implemented to predict sentiment based on user input reviews.
- The model achieved satisfactory accuracy on the test set.
- Example predictions:
- "This movie was not so interesting." -> Negative
- "This movie was very amazing." -> Positive
- Clone the repository from GitHub.
- Install required dependencies using:
pip install -r requirements.txt
- Run the Jupyter Notebook to train and evaluate the model.
- Increase the dataset size to enhance model performance.
- Tune hyperparameters for better accuracy.
- Experiment with different neural network architectures.
This project is under the MIT License.
Arpan Pramanik