Skip to content
View seuwenfei's full-sized avatar
πŸ‘‹
πŸ‘‹

Block or report seuwenfei

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
seuwenfei/README.md

Hi there, I am Wen Fei πŸ‘‹


πŸ‘©β€πŸ’» About Me :

  • ✨ Statistics graduate & IT Cloud Ops Analyst based in Singapore.
  • πŸ”­ Interested in data, cloud, analytics, machine learning, and related fields.
  • 🌱 Growing skills in AWS, GCP, Python, SQL, and machine learning.
  • ⚑ Experienced in data analytics, building data pipelines, automating processes, and creating dashboards.
  • πŸ“š Published research in statistical quality control (acceptance sampling) and COVID-19 survival regression analysis.
  • πŸ“« How to reach me: Linkedin Badge

πŸ› οΈ Languages and Tools :

PythonΒ  MysqlΒ  PandasΒ  SklearnΒ  NumpyΒ  SeaΒ  SPSSΒ  KaggleΒ  JupyterΒ  TableauΒ  ExcelΒ  WordΒ 


πŸ“‘ Projects

Here are selected projects completed using Python, Power BI, Tableau, SQL, Looker Studio, and Java:

  Β  Note: The dates indicate the month and year when each project was completed.

  • Parcel Delivery Time Prediction (Regression Modeling)   |   Oct 2025   |   Show project
    • Tools: Jupyter Notebook, Python (Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, XGBoost, TensorFlow, PyTorch, Joblib)
    • Description: Built regression models to predict parcel delivery times using historical Amazon order data.
    • Techniques: EDA, feature engineering (date-time transformations, distance calculations), outlier detection, multicollinearity check, mixed scaling (StandardScaler + RobustScaler), feature encoding, and model comparison (Random Forest, XGBoost, TensorFlow DNN, PyTorch DNN).
    • Result: Selected TensorFlow DNN as the final model due to its stability and generalization, achieving competitive MAE, RMSE, and RΒ². Deployed a pipeline with preprocessing and model serialization for inference on new parcel orders.

  • SQL Driven Business Analytics Framework   |   Aug 2025   |   Show project
    • Tools: SQL Server, Power BI (DAX)
    • Designed queries for customer, sales, and profitability insights.
    • Built dashboards with KPIs (revenue, profit margin, repeat purchase rate), maps, bar charts, and trend visualizations.
    • Created calculated metrics in Power BI using DAX measures (e.g., Revenue per Order, Profit per Order, Margin per Order, Repeat Purchase Rate).

  • Identification of Disaster-Related Tweets (NLP Classification)   |   May 2023   |   Show project
    • Tools: Jupyter Notebook, Python (Pandas, NumPy, Seaborn, Matplotlib, SciPy, Plotly, NLTK, re, collection, wordcloud, TensorFlow, Scikit-learn)
    • Description: Developed an NLP classification model to predict disaster-related tweets.
    • Techniques: EDA, Text Preprocessing, Classification Model Comparison (Linear SVC, Multinomial NB, Neural Network).
    • Result: Achieved AUC 0.86 with Linear SVC, showing strong separation between disaster and non-disaster tweets

  • Churn Prediction (IBM Telco Dataset)   |   Apr 2023   |   Show project
    • Tools: Jupyter Notebook, Python (Pandas, NumPy, Seaborn, Matplotlib, Plotly, H3, Folium, TensorFlow, imblearn, Scikit-learn, XGBoost).
    • Description: Built ML models to predict customer churn.
    • Techniques: EDA, Visualization, Classification Model Comparison (Random Forest, Logistic Regression, AdaBoost, XGBoost).
    • Result: Achieved AUC 0.86 with XGBoost.

  • Web Scraping Booking.com   |   Apr 2023   |   Show project
    • Tools: Python (Pandas, Requests, BeautifulSoup, RegEx)
    • Description: Scraped hotel data (name, rating, reviews, distance from city center, prices) and processed structured datasets for further analytics.

  • Titanic Survival Prediction (Kaggle Competition)   |   Mar 2023   |   Show project
    • Tools: Jupyter Notebook, Python (Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, TensorFlow).
    • Description: Developed ML models to predict survival outcomes.
    • Techniques: EDA, Feature Engineering, Visualization, Classification Model Comparison (Random Forest, Logistic Regression, Complement Naive Bayes).
    • Result: Achieved stratified k-fold CV score of 0.85 with Random Forest.

  • Feature Engineering - Convert UTC to Local time   |   Mar 2023   |   Show project
    • Tools: Python (Pandas, DateTime, Dateutil, pytz)
    • Converted UTC time to Malaysia Standard Time for analytics workflows.

  • Worldwide Movie Series Visualization   |   Jan 2023   |   Show project
    • Tools: Python (Pandas, NumPy, Seaborn, Matplotlib, wordcloud)
    • Created visualizations highlighting patterns and trends in movie series data.
    • Techniques: EDA, Feature Engineering, Visualization.

  • Online Payment Fraud Detection   |   Dec 2022   |   Show project
    • Tools: Jupyter Notebook, Python (Pandas, NumPy, Seaborn, Matplotlib, Tabulate, Scikit-learn)
    • Description: Trained ML models to classify fraudulent vs. non-fraudulent transactions.
    • Techniques: EDA, Visualization, Classification Model Comparison (Random Forest, Logistic Regression).
    • Result: Achieved stratified k-fold CV F1 score of 0.985 using a Random Forest model.

  • Cookies Sales Dashboard   |   May 2023   |   Show project
    • Tools: Power BI (Power Query, DAX).
    • Description: Built a dashboard to analyze sales, cost, profit, lead time, and customer trends.

  • Flight Ticket Sales Analysis Dashboard   |   Jan 2023   |   Show project
    • Tools: PostgreSQL, Tableau
    • Description: Queried airline ticket data and built dashboards for sales, booking periods, and fare conditions.

  • KPMG Data Analytics Consulting Virtual Internship   |   Nov 2022   |   Show project
    • Tools: Python (Jupyter Notebook), Tableau.
    • Conducted data quality assessment and insights analysis.
    • Built Tableau dashboards for customer segmentation and insights presentation.

  • Non-parametric Test for Patient Health Status   |   Mar 2022   |   Show project
    • Tools: SAS Studio
    • Description: Applied Shapiro-Wilk, Wilcoxon, Kolmogorov-Smirnov, Kruskal-Wallis, and Spearman’s correlation to patient health data.

  • E-commerce Dashboard  |   Jul 2021
    • Tools: Looker Studio (Google Data Studio)
    • Description: Built dashboards displaying sessions, transactions, revenue, checkout behavior, AOV, and conversion rate.

     


  • Java Application -Simple Student Information System   | Nov 2019 |   Show project
    • Tools: Java (NetBeans)
    • Built a Java application to represent a simple student information system.

Pinned Loading

  1. Online-payment-fraud-detection Online-payment-fraud-detection Public

    This repository contains my online payment fraud detection project using Python

    Jupyter Notebook 32 13

  2. business_analytics_framework business_analytics_framework Public