Application of naive bayes to filter spam emails, from scratch.
The classifier works by:
- Reading a dataset of emails labeled as spam or not spam.
- Calculating word frequencies and probabilities for both spam and non-spam emails.
- Applying Naive Bayes classification using log probabilities to avoid underflow.
- Predicting new emails as spam (
1) or not spam (0).
- Calculates prior probabilities for spam and not-spam emails.
- Uses Laplace smoothing for unseen words.
- Applies logarithmic probabilities for numerical stability.
- Evaluates model performance on a separate test dataset.
| Dataset Name | Description | Source |
|---|---|---|
| Emails Training Dataset | Labeled email features for training the Naive Bayes model | Kaggle Link |
| Emails Test Dataset | Raw email text with spam labels for testing | Kaggle Link |