Implementations of three distinct machine learning tasks GANs, VAEs, and Phishing Detection
- Generative Adversarial Networks (GANs) for image generation.
- Variational Autoencoders (VAEs) for both image generation and latent space analysis.
- Phishing Detection using VAEs, which applies anomaly detection techniques to identify phishing URLs.
Additionally, the repository includes a detailed documentation manual with explanations and related questions.
.
├── data_phishing.csv # Phishing dataset used for training the VAE model
├── AI_GenAI_Manual.pdf # Manual containing questions and explanations
├── 21L_6225_A1_GenAI.ipynb # Jupyter Notebook containing the implementation
├── 21L_6225_A1_GenAI.pdf # Detailed assignment documentation
- Consists of a Generator and a Discriminator.
- The Generator learns to create realistic images from random noise.
- The Discriminator distinguishes between real and generated images.
- Used for generating images from MNIST, FashionMNIST, and Digits datasets.
- Evaluated using loss curves and image quality comparisons.
- Uses an encoder-decoder architecture.
- The Encoder maps input data to a probabilistic latent space.
- The Decoder reconstructs inputs from the latent representation.
- Uses PCA for latent space visualization.
- Applied to MNIST, FashionMNIST, and Digits datasets.
- Uses an anomaly detection approach.
- The VAE is trained on legitimate URLs, learning to reconstruct them.
- Phishing URLs are detected based on high reconstruction errors.
- Preprocessing includes feature extraction and standardization.
- Achieved 70% accuracy based on reconstruction error thresholds.
- Image Datasets (MNIST, FashionMNIST, Digits)
- Normalized pixel values for stable training.
- Used PCA to visualize latent space representation.
- Phishing Dataset (data_phishing.csv)
- Extracted relevant features from URLs.
- Applied
StandardScaler
for normalization. - Divided dataset into training and validation sets.
- Generator: Converts random noise into realistic images.
- Discriminator: Determines real vs. fake images.
- Loss Function: Binary cross-entropy with adversarial training.
- Evaluation: Image quality and loss curve analysis.
- Encoder: Compresses input data into a latent space representation.
- Decoder: Reconstructs the original input from latent variables.
- Loss Function: Reconstruction loss + KL divergence.
- Evaluation: Reconstruction quality, latent space visualization.
- VAE-based anomaly detection.
- Trained only on legitimate URLs to reconstruct normal patterns.
- Anomalies (phishing URLs) identified based on high reconstruction errors.
- Evaluation metrics: ROC curve, classification report.
- GANs produced sharper images.
- VAEs generated images with more structured latent space representations.
- Latent space analysis showed clear clustering patterns in VAEs.
- Datasets used: MNIST, FashionMNIST, Digits.
- ROC curve analysis demonstrated performance trade-offs.
- Achieved 70% accuracy, balancing false positives and false negatives.
- Challenges: Handling imbalanced datasets, improving feature extraction.
- Python 3.8+
- Jupyter Notebook / Google Colab / VS Code
- PyTorch
- NumPy
- Pandas
- Scikit-learn
- Matplotlib
- Open
21L_6225_A1_GenAI.ipynb
in Jupyter Notebook, Google Colab, or VS Code. - Run the notebook cells in sequence.
- Visualize results for GANs, VAEs, and phishing detection.
- Improve phishing detection by integrating additional URL-based features (e.g., SSL certificates, domain age).
- Explore advanced generative models such as Wasserstein GANs (WGANs) and β-VAEs.
- Enhance dataset diversity for more robust anomaly detection models.