This repository contains implementations and experiments conducted as part of the CCE2502 unit, Pattern Recognition and Machine Learning at the University of Malta. The work spans two comprehensive assignments:
- Assignment I – Classification using synthetic datasets and kNN benchmarking
- Assignment II – Logistic regression from scratch using gradient descent and regularisation
Each assignment is structured as a separate notebook, with annotated explanations and visual outputs.
Both assignments were implemented and completed during Semester II, 2023/2024.
-
Task 1:
- Generate and visualise synthetic datasets with
make_circles
andmake_blobs
- Inspect feature distribution, means, and standard deviations
- Generate and visualise synthetic datasets with
-
Task 2:
- Implement custom
Shuffle_SplitDataset()
andClassificationMetrics()
functions - Evaluate metrics: Accuracy, Recall, Precision, F1-score
- Implement custom
-
Task 3:
- Apply k-Nearest Neighbours (k-NN) algorithm using
sklearn
- Tune
k
for optimal accuracy and F1-score - Train/test/validation pipeline
- Apply k-Nearest Neighbours (k-NN) algorithm using
-
Task 4:
- Compare brute-force vs kd-tree methods for inference timing
- Scale experiments to datasets of size
10^2
to2.5x10^5
- Empirical validation of complexity analysis for
kNN
See assignment_I.pdf
for full code, metrics, and plotted results.
-
Implemented from scratch using NumPy:
- Logistic loss function (categorical cross-entropy)
- Prediction via sigmoid activation
- Accuracy computation
- Regularised gradient descent
-
Visualisation & Validation:
- Decision boundary plotted against training data
- Convergence plots of training and validation loss
-
Dataset Analysis:
- Compare linearly separable vs non-separable datasets
- Polynomial feature expansion (degree 2 to 4)
- Evaluation of regularisation impact
-
Bonus: Comparison with
sklearn.MLPClassifier
on advanced datasets
See CCE2502_Assignment_II.pdf
for all implementation details and analysis.
-
Manual implementation of logistic regression solidified understanding of:
- Gradient descent dynamics
- Regularisation and weight decay
- Decision boundary intuition
-
Reinforced ability to:
- Generate, visualise, and split datasets
- Evaluate classifier performance using custom metrics
- Scale and benchmark ML algorithms (e.g., kNN)
- Clone this repository
- Open the
.ipynb
notebooks in Jupyter or VS Code - Install any missing dependencies:
pip install matplotlib numpy scikit-learn
- Run each notebook sequentially (e.g.,
Assignment I
,Assignment II
)
logistic-regression, knn, ml-from-scratch, classification, benchmarking, sklearn, pattern-recognition, cce2502, university-of-malta
Graham Pellegrini
University of Malta – Department of Computer Engineering
GitHub: @GrahamPellegrini
Report Files: