Jigsaw Toxic Comment Classification leverages the Jigsaw dataset to build and benchmark text moderation systems.
The aim is to compare traditional machine learning models with modern deep learning and explore explainable AI (XAI) for trustworthy moderation.
- Data preprocessing of the Jigsaw public dataset
- Baseline ML models (e.g., Logistic Regression, SVM)
- Enhanced Deep Learning model (possible LSTM, CNN, Transformers)
- Explainable AI integration (XAI) for model transparency
- Clean and documented codebase for reproducibility
- Python 3.8+
- Pandas, scikit-learn, NumPy for ML pipeline
- TensorFlow/PyTorch, Keras for deep learning models
- LIME/SHAP for Explainable AI (XAI)
- Notebooks for EDA and experiments
Jigsaw Toxic Comment Classification Challenge dataset.
More details can be found on Kaggle.
Setup instructions and sample usage will be added soon, once the pipeline is finalized.
Examples and details on training, evaluation, and inference will be provided after completing initial model implementation.
- Please open an issue to suggest features or report bugs.
- Pull requests are welcome.
- All contributors are expected to follow the code of conduct.
MIT
Rehan Abdul Gani Shaikh
Aspiring Data Scientist | B.Tech Student
🔗 Connect with me: LinkedIn
📬 Email: rehansk.3107@gmail.com
README will be updated to reflect setup, data usage, and model details as the project evolves.