Persian Image Captioning (PIC)

Overview

This model leverages pre-trained weights from the EfficientNet model on ImageNet for the CNN part. In contrast, the Transformer component is trained from scratch to generate captions. Also the training dataset is 40K preprocessing images and captions from Coco Flicker Farsi dataset:

Dataset

This project's dataset is a subset of the Coco-Flickr Farsi dataset, totaling 19 GB. To train the model effectively, the dataset has been filtered by clipping captions' length between 10 to 25 tokens. The resulting equalized dataset comprises 40,000 images. Histograms and plots illustrating the distribution of caption lengths in the dataset are as follows:

Caption: Distribution of original captions lenght

Caption: Distribution of filtered dataset caption lens

Model Architecture

PIC model is designed with a three-part architecture, utilizing Convolutional Neural Networks (CNNs), Encoders and Decoders (Transformers):

CNN: The EfficientNetB0 model is employed as the initial layer to extract meaningful features from input images. The pre-trained weights from ImageNet are used, and the feature extractor is frozen during training.
Encoder: The extracted image features are passed through a Transformer-based encoder. This encoder enhances the representation of the inputs, incorporating self-attention mechanisms for better context understanding.
Decoder: This model takes the encoder output and text data (sequences) as inputs. It is trained to generate captions by utilizing self-attention and cross-attention mechanisms. The decoder incorporates positional embeddings for sequence information and employs dropout layers for regularization. Models**: Utilize the latest advancements in deep learning for image captioning.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dataset		dataset
imgs		imgs
metrics		metrics
40K-IC.csv		40K-IC.csv
ICModel.py		ICModel.py
PIC_main.ipynb		PIC_main.ipynb
README.md		README.md
TransformerModel.py		TransformerModel.py
requirements.txt		requirements.txt
visualize_and_preparing_dataset.ipynb		visualize_and_preparing_dataset.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Persian Image Captioning (PIC)

Overview

Contents

Dataset

Model Architecture

Improvment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rasoulasadiyan/Persian-Image-Captioning-PIC

Folders and files

Latest commit

History

Repository files navigation

Persian Image Captioning (PIC)

Overview

Contents

Dataset

Model Architecture

Improvment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages