This project uses machine learning to predict the likelihood of an individual being a smoker based on biometric and health signals. The application includes a trained model, a Flask web server, and a simple HTML frontend.
- 📊 Predicts smoking probability using 26 biometric and health metrics
- 🧠 Trained using
scikit-learnLinear Regression - 💾 Model serialized using
pickle - 🌐 Flask-powered web interface with user input form
- 🧼 Basic data preprocessing and encoding steps
smoking-predictor/
├── app.py # Flask web server
├── model\_train.py # ML training script
├── smoking.csv # Dataset
├── mrs/
│ └── mr.pkl # Saved trained model
├── templates/
│ └── index.html # Web form interface
├── static/ # (Optional) CSS/JS assets
└── README.md # Project documentation
Run the training script to generate mr.pkl:
python model_train.pyThis script:
- Loads and cleans the dataset
- Encodes categorical variables (gender, oral, tartar)
- Trains a Linear Regression model
- Saves the model in the
mrs/directory
Install required libraries:
pip install -r requirements.txtrequirements.txt
pandas
numpy
scikit-learn
flask
Start the Flask server:
python app.pyThen open your browser at: http://127.0.0.1:5000
The form takes in 26 numeric input values including:
- Gender (0 = Female, 1 = Male)
- Age, Height, Weight, Waist, Eyesight, Hearing
- Blood pressure, Sugar, Cholesterol levels, etc.
- Oral Health (Oral exam, Tartar, Dental Caries)
After submission, it displays the predicted probability of the person being a smoker.
📝 Note: All inputs are required and must be numerical.
Prediction of smoking is 0.83
- Switch from regression to classification (e.g., Logistic Regression or RandomForestClassifier)
- Add input validation and better UI/UX
- Deploy online using Render, Heroku, or Vercel
- Include real-world visuals or dashboard
Contributions and questions are welcome! Raise an issue or submit a pull request.