This project fine-tunes a BERT-based model for classifying underwater objects based on textual descriptions. The model is trained on a custom dataset and evaluated for performance. The classification task assigns objects into one of three categories:
- Man-made object (Label:
0
) - Round/spherical object (Label:
1
) - Natural formation (Label:
2
)
- Download a Pretrained Model (
download_model.py
) - Prepare Dataset (
prepare_dataset.py
) - Fine-tune the Model (
finetune_model.py
) - Verify the Model's Performance (
verify_model.py
) - Deploy a Classifier for Object Recognition (
auv_object_classifier.py
)
Clone the Repository:
git clone https://github.com/yourusername/yourrepository.git
cd yourrepository
Set Up Python Virtual Environment:
python3 -m venv auv_llm_env
source auv_llm_env/bin/activate # (For Windows: auv_llm_env\Scripts\activate)
pip install -r requirements.txt
This script downloads a BERT-based model (bert-base-uncased
) and saves it locally.
Steps:
- Fetches the tokenizer and model from Hugging Face.
- Saves them in the
./original_model/
directory.
Run Command:
python download_model.py
This script generates a sample dataset, splits it into training and testing sets, and converts it into Hugging Face Dataset format.
Steps:
- Defines a dataset with text descriptions and their respective labels.
- Splits the dataset into train (80%) and test (20%).
- Saves the dataset in
./underwater_data/train/
and./underwater_data/test/
.
Run Command:
python prepare_dataset.py
This script fine-tunes the BERT model on the prepared dataset using Hugging Face's Trainer API.
Steps:
- Loads the training and test datasets.
- Tokenizes the text using
AutoTokenizer
. - Defines evaluation metrics (
accuracy
,F1-score
). - Configures the training settings (
TrainingArguments
). - Trains the model and saves it as
./fine_tuned_model/
.
Run Command:
python finetune_model.py
This script compares the original and fine-tuned models by making predictions on test examples.
Steps:
- Loads both models (
original
andfine-tuned
). - Predicts categories for sample descriptions.
- Prints model outputs to compare logits and final classifications.
Run Command:
python verify_model.py
This script provides a Python class for classifying underwater objects using the fine-tuned model.
Steps:
- Loads the fine-tuned model and tokenizer.
- Defines a
classify()
function that predicts the object category. - Runs test cases to classify new descriptions.
Run Command:
python auv_object_classifier.py
Description: Square metallic box with antennas
Classification: man-made object (Confidence: 0.89)
--------------------------------------------------
Description: Smooth round object reflecting sonar signals
Classification: round/spherical object (Confidence: 0.93)
--------------------------------------------------
Description: Irregular formation with plant growth
Classification: natural formation (Confidence: 0.85)
- Check Dataset Quality: Ensure training data is correctly labeled.
- Increase Training Epochs: Modify
num_train_epochs
inTrainingArguments
. - Adjust Tokenization: Experiment with different
max_length
settings. - Inspect Model Outputs: Run
verify_model.py
to compare logit differences.
Install dependencies using:
pip install transformers datasets torch scikit-learn evaluate pandas
This project demonstrates how to fine-tune and deploy an NLP model for classifying underwater objects based on text descriptions. The fine-tuned model improves predictions over the pretrained model and can be integrated into an AUV mission system.