Ovarian Cyst Classification

Objective: This project aims to classify ovarian cysts as benign or malignant using deep learning models, leveraging transfer learning in convolutional neural networks (CNNs).

Both the TensorFlow and PyTorch frameworks were used to train on the same dataset with similar model architectures. These two projects are included in this repository to:

Compare the model performances across both frameworks.
Ensure that the model is effectively capturing patterns in the medical images.

Data

The original dataset was sourced from medical journals and provided by doctors. It contained a mix of .jpg and .pdf files. To ensure the quality and relevance of the dataset, a preprocessing step was performed to eliminate certain files based on the following criteria:

Non-image files (e.g., PDFs, text documents)
Images with multiple photos in a single file (e.g., collages, comparison images)
Annotated images (i.e., images with bounding boxes, arrows, or labels)
Images containing text, words, or alphabets
Other irrelevant or noisy files that could interfere with model training

Despite these filtering steps, some annotated images were retained to avoid reducing the dataset size too much. After cleaning, the dataset contained:

Benign images: Reduced from 168 to 148
Malignant images: Reduced from 103 to 94

The images were structured into a folder format to facilitate preprocessing and model training:

e2e/
- Benign/
  - benign_image1.jpg
  - benign_image2.jpg
  - ...
- Malignant/
  - malignant_image1.jpg
  - malignant_image2.jpg
  - ...

The entire dataset folder was zipped and uploaded to Google Drive, which is mounted to Google Colab for training.

Tensorflow Implementation

Click here to view the file

Data Preprocessing

Loading images from respective directories
Assigning labels: Benign images labeled as [1,0], and malignant images labeled as [0,1]
Resizing images to 224x224 pixels to maintain uniformity
Normalizing pixel values to the range 0-1 (originally 0-255)
Converting images and labels into NumPy arrays for efficient processing
Splitting data into 80% training and 20% validation sets
Computing class weights to address class imbalance

Image Augmentation

Shifting width by 10% (translation)
Shifting height by 10% (translation)
Zoom in/out by 20%
Flip horizontally
Only 1 augmentation selected from above for each augmented image
Fill points in case of outside boundaries with nearest point color

Model Architecture

Sequential Model:

A freezed pretrained DenseNet121 as feature extractor, top layer excluded for transfer learning
A global average pooling layer for reducing dimensionality
A fully connected layer with 256 neurons and ReLU activation
A 0.3 dropout layer for avoiding overfitting
An output layer with 2 neurons and softmax activation, ensuring the sum of probabilities for benign and malignant classifications equals 1.

Training Configurations:

Optimizer: Adam with a learning rate of 0.001 (default)
Loss Function: Categorical Cross-Entropy
Callbacks:
- Early Stopping: stops model training by monitoring validation AUC for 5 epochs
- Reduce LR on Plateau: multiplies learning rate by 0.2 when validation loss does not decrease for 3 epochs, with a minimum LR of 1e-6
Training Duration: 50 epochs

Finetuning

Fine-tuning is performed only if the baseline model achieves a validation AUC of at least 0.75.
The last 50 layers of DenseNet121 are unfrozen and trained for 20 additional epochs with a lower learning rate (1e-5).
Early stopping halts fine-tuning if validation AUC does not improve for 3 consecutive epochs.

Final Model

The trained models are evaluated using the following metrics:

Training vs Validation Accuracy
Training vs Validation Loss
Training vs Validation AUC

Performance comparisons are visualized through plots, and the best-performing model is selected and saved as a .h5 file.

To load the model for inference, use:

from tensorflow.keras.models import load_model

model = load_model("model.h5")
prediction = model.predict(img)
benign_prob, malignant_prob = prediction[0]

PyTorch Implementation

Click here to view the file

This implementation aims to replicate the findings of a research study, as the original work was not open-sourced. The configurations and parameters used in this project are carefully tuned to align with the details provided in the paper:

Deep learning-enabled pelvic ultrasound images for accurate diagnosis of ovarian cancer in China: a retrospective, multicentre, diagnostic study

Data Preprocessing

Loading images from respective directories
Assigning labels: Benign images labeled as 1, Malignant images labeled as 0
Resizing images to 224x224 pixels
Normalizing pixel values to the range 0-1
Converting images and labels into PyTorch tensors
Splitting data into 80% training and 20% validation sets
Computing class weights to address class imbalance

Image Augmentation

Random horizontal flip
Random rotation (up to 30 degrees)
Random affine transformations (translation and scaling)

Model Architecture

Sequential Model:

A freezed pretrained DenseNet121 as feature extractor, top layer excluded for transfer learning
A fully connected layer with 1024 input features and 1 output neuron for binary classification replaced the classifier head

Training Configurations:

Optimizer: Stochastic Gradient Descent with 0.001 learning rate, 0.9 momentum, 1e-4 weight decay
Loss Function: Binary Cross-Entropy with Logits Loss
StepLR Scheduler reduces learning rate by factor of 0.1 every 20 epochs.
Training Duration: 30 epochs

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
archive		archive
PyTorch_Ovarian_Cyst_Classification.ipynb		PyTorch_Ovarian_Cyst_Classification.ipynb
README.md		README.md
TensorFlow_Ovarian_Cyst_Classification.ipynb		TensorFlow_Ovarian_Cyst_Classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ovarian Cyst Classification

Data

Tensorflow Implementation

Data Preprocessing

Image Augmentation

Model Architecture

Finetuning

Final Model

PyTorch Implementation

Data Preprocessing

Image Augmentation

Model Architecture

References

About

Uh oh!

Releases

Packages

Languages

chuanlintneoh/OvarianCystClassification

Folders and files

Latest commit

History

Repository files navigation

Ovarian Cyst Classification

Data

Tensorflow Implementation

Data Preprocessing

Image Augmentation

Model Architecture

Finetuning

Final Model

PyTorch Implementation

Data Preprocessing

Image Augmentation

Model Architecture

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages