A light-weight proof of concept (PoC) for text classification using a fine-tuned BERT model. The project demonstrates hate speech detection by classifying raw input text into pre-defined labels via a simple web interface.
Minimal Machine Learning pipeline for BERT (bert-base-uncased)-based text classification. Includes training, evaluating, and a very small Flask-based web application for real-time inference.
- Load CSV data and train a BERT-based classifier
- Evaluate classification performance
- Run a Flask server for real-time text classification
- Predicts the type of hate speech based on user input
This PoC is developed as a technical foundation for a future larger project that will include OCR integration, database support, and real-time communication via WebSockets.
git clone https://github.com/NathanGrs00/ml-poc-classification.git
cd ml-poc-classificationUse pip to install all the required packages:
pip install -r requirements.txt💡 Python 3.8+ is recommended.
Start the Flask app:
python main.pyClick the localhost link flask generates. It will show a simple HTML page with an input field where you can type or paste text to classify.
- Go to
http://localhost:5000 - Enter any text (e.g., a tweets or comments)
- Click the "Submit" button
- The model returns a label with the type of hate speech (if any)
Feedback from peer students included the following changes to be implemented in the Proof of Concept:
- Model is not reliable enough.
- Extend dataset with more data.
- Balance the percentage of hateful and neutral comments.
- Verify the current labeled data in the dataset.
- UI looks too simple, result is hard to read.
- Change the labels to a more formatted version. Instead of just displaying 'violence' display a little more information.
- Give each result a colored label
This PoC is a stepping stone towards a larger application. Future plans are:
- 🧾 Adding OCR to scan document/text image input
- 🗄️ Database connection for saving classification output and user feedback.
- 🔌 WebSockets for real-time updates
I love feedback and contributions!
Report bugs or request features via GitHub Issues.
Fork the repository and make a Pull Request to contribute.
Feel free to propose improvements!
This project is licensed under the MIT License.
See the LICENSE file for more information.
Contributions are always welcome!
To contribute:
- Fork the repository
- Create a new branch for your feature or bugfix
- Commit your changes with clear messages
- Push to your fork
- Open a Pull Request describing your changes
For major changes, consider opening an issue first to discuss your proposal.