MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

Author: Biao Wu*, Yutong Xie*, Zeyu Zhang*, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu**

*Contributed Equally. **Corresponding author: qi.wu01@adelaide.edu.au.

MMCLIP.Cross-modal.Attention.Masked.Modelling.for.Medical.Language-Image.Pre-Training.mp4

News

[07/30/2024] 🎉🎉 Our paper has been prompted by CVer!

Abstract

Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modeling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second, most methods only adopt either paired image-text or image-only data, failing to exploit the combination of both paired and unpaired data. To this end, this paper proposes the MMCLIP (Masked Medical Contrastive Language-Image Pre-Training) framework to enhance pathological learning and feature learning via unpaired data. First, we introduce the attention-masked image modeling (AttMIM) and entity-driven masked language modeling module (EntMLM), which learns to reconstruct pathological visual and textual tokens via multi-modal feature interaction, thus improving medical-enhanced features. The AttMIM module masks a portion of the image features that are highly responsive to textual features. This allows MMCLIP to improve the reconstruction of highly similar image data in medicine efficiency. Second, our MMCLIP capitalizes unpaired data to enhance multimodal learning by introducing disease-kind prompts. The experimental results show that MMCLIP achieves SOTA for zero-shot and fine-tuning classification performance on five datasets.

Introduction

MMCLIP (X-ray Language-Image Pre-training) is a multimodal model designed to bridge the gap between medical text and X-ray images. Inspired by OpenAI's CLIP, MMCLIP aims to provide a unified feature space for both text and images, specifically focusing on the medical domain.

Citation

@article{wu2024xlip,
  title={Xlip: Cross-modal attention masked modelling for medical language-image pre-training},
  author={Wu, Biao and Xie, Yutong and Zhang, Zeyu and Phan, Minh Hieu and Chen, Qi and Chen, Ling and Wu, Qi},
  journal={arXiv preprint arXiv:2407.19546},
  year={2024}
}

Features

Multimodal Understanding: MMCLIP is trained to understand both text and X-ray images, facilitating various downstream medical tasks.
Domain-Specific: Unlike general-purpose models, MMCLIP is trained on a specialized dataset of medical text and X-ray images.
Easy to Use: With a simple API and clear documentation, integrating MMCLIP into your workflow is seamless.

Requirements

Python >= 3.7
PyTorch >= 1.8
CUDA-compatible GPU (optional but recommended)

Installation

To install MMCLIP, you can clone this repository and install the required packages.

git clone https://github.com/your_username/MMCLIP.git
cd MMCLIP
pip install -r requirements.txt

Quick Start

To encode an X-ray image and a medical text snippet into the same feature space, you can use the following code:

from MMCLIP import MMCLIPModel

# Initialize model
model = MMCLIPModel()

# Sample X-ray image and text
xray_image = "path/to/xray/image.jpg"
medical_text = "This X-ray shows signs of pneumonia."

# Encode into feature space
image_features, text_features = model.encode(xray_image, medical_text)

Contributing

We welcome contributions to MMCLIP! If you have a feature request, bug report, or want to contribute code, please open an issue or pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
image_first_0302_1-2.png		image_first_0302_1-2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

News

Abstract

Introduction

Citation

Features

Requirements

Installation

Quick Start

Contributing

License

About

Uh oh!

Releases

Packages

AIGeeksGroup/MMCLIP

Folders and files

Latest commit

History

Repository files navigation

MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

News

Abstract

Introduction

Citation

Features

Requirements

Installation

Quick Start

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages