Machine Learning Engineering Challenge

Summary of the Challenge

This challenge involves preparing a Machine Learning model for deployment, ensuring it can handle predictions on request. The task includes creating all necessary files, documentation, and code to make the model production-ready while allowing flexibility for improvements in the pipeline. The solution should work locally, with the possibility of being adapted for any cloud provider.

Description

Requirements:

Deploy the model.py file provided by teammate Mike to production.
Ensure the model can receive and predict data upon request.
The solution should work for any cloud provider.
Consider this as version 1 of a series of future model versions.
Optional: Propose and implement a better model pipeline if the provided one is unsuitable.

Guidance:

Add files, code, and documentation to improve the repository.
Ensure the solution is functional locally; cloud deployment is not required.
Provide reasoning for decisions made during the process.
Highlight assumptions or questions for the team in the documentation.
Consider the model's binary classification use case with imputation requirements.

Solution Proposed

Workflow

Training Pipeline (generate imputation and model dependencies)
Inference Pipeline (real-time prediction through an API request)
Streamlit (Interface to request an inference)

The requirement 'The solution should work for any cloud provider' is satisfied by the Docker implementation (cloud provider agnostic).

Next Steps

Batch Inference Pipeline (the workflow is only prepared for one prediction at time)
ML Improvements (Feature Engineering, MICE Imputation, Model Evaluation)

ML Improvements

Data Validation: Added validation checks for input data to ensure robustness.
Improved Imputation:
- Replaced constant imputation with mean imputation, given the fact that Age == 0 is odd.
  - Mean is not ideal, but it is an improvement over constant imputation.
Model Upgrade:
- Switched from LogisticRegression to RandomForestClassifier. Avoided the dependency on StandardScaler required by LogisticRegression.
Model Evaluation: Implemented 5-Fold Stratified Cross-Validation (without data leakage) to ensure robust metrics during training, replacing a simple hold-out approach.
Complete Training: Finalized the training process after thorough model validation.

Install Dependencies

Navigate to the repository folder:
```
cd <repository_folder>
```

Install the required dependencies:

pip install --no-cache-dir -r requirements.txt

Install the project in editable mode (setup.py):
```
pip install -e .
```

How to Execute

Training pipeline

Run using terminal:

python run app/train.py

Inference pipeline

You have two options to deploy your inference API:

Run using terminal:

python run app/inference.py

Execute a container (build your image and run your container)

docker build -t mle-challenge-deploy .

docker run --name inference-app -p 8089:8089 mle-challenge-deploy

Streamlit Application

Run using terminal:

streamlit run app/main.py

Docker Setup (Windows 10)

To install Docker Desktop on your Windows machine, is required to:

Turn On Virtualization on your computer BIOS
Install the following features on your System: HyperV and Virtual Machine Platform
Install WSL 2
Install Docker Desktop To be successful, your Docker Desktop should have your docker engine running (if stopped, you failed something).

For more details, follow the Docker Documentation Guide.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
data		data
models		models
tests		tests
Dockerfile		Dockerfile
README.md		README.md
model.py		model.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Engineering Challenge

Summary of the Challenge

Description

Solution Proposed

ML Improvements

Install Dependencies

How to Execute

Training pipeline

Inference pipeline

Streamlit Application

Docker Setup (Windows 10)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

KaikeWesleyReis/mle-deploy-challenge

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Engineering Challenge

Summary of the Challenge

Description

Solution Proposed

ML Improvements

Install Dependencies

How to Execute

Training pipeline

Inference pipeline

Streamlit Application

Docker Setup (Windows 10)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages