Skip to content

priyankaav123/intellisegment

 
 

Repository files navigation

IntelliSegment

An Integrated Framework for Multi-Object Detection and Mask Generation using SAM2, CLIP, and YOLO-NAS


Table of Contents


About the Project

IntelliSegment is an advanced computer vision framework designed to provide robust multi-object detection and precise mask generation. By combining the capabilities of YOLO-NAS for real-time detection, Meta SAM2 for boundary refinement, and CLIP for semantic understanding, IntelliSegment delivers superior segmentation accuracy and efficiency in complex and dynamic environments.


Motivation

While recent developments in computer vision have greatly improved segmentation performance, challenges remain in detecting small or occluded objects and maintaining accuracy under varying conditions. IntelliSegment addresses these challenges by integrating multiple state-of-the-art models, resulting in a versatile and high-performing solution for applications such as autonomous systems, video surveillance, and medical imaging.


Key Features

  • Multi-object detection: Efficient and real-time object detection using YOLO-NAS.
  • Boundary refinement: Precise and detailed segmentation masks generated with Meta SAM2.
  • Semantic consistency: Enhanced object differentiation and contextual understanding using CLIP.
  • Robust performance: High accuracy and efficiency in complex and dynamic visual environments.
  • State-of-the-art results: Outperforms leading models in minimizing pixel-wise segmentation errors.

Methodology and Results

Framework Overview

IntelliSegment integrates several core components to enhance object detection and segmentation:

  1. YOLO-NAS for Object Detection

    • Anchor-Free Detection: YOLO-NAS operates without predefined anchor boxes, allowing flexible and adaptive bounding box predictions.
    • Multi-Scale Feature Extraction: Incorporates Feature Pyramid Networks (FPN) and Path Aggregation Networks (PAN) to capture and aggregate features at multiple scales, improving detection of objects of various sizes.
    • Pre-training: The model is pre-trained on a large dataset, enabling it to generate bounding boxes, class labels, and confidence scores for each detected object.
  2. Multi-Scale Feature Extraction with Feature Pyramid Network (FPN)

    • Contextual Feature Capture: FPN uses a top-down architecture with lateral connections to capture semantic information at different resolutions, combining high- and low-level features for robust object localization.
    • Anchor-Free Integration: Integrating FPN into an anchor-free framework reduces computational complexity and improves efficiency.
  3. Meta SAM2 for Object Segmentation

    • Mask Generation: Meta SAM2 is trained on a diverse dataset of images and videos, enabling it to generate precise masks even in complex or occluded scenes.
    • Enhanced Precision: SAM2 provides improved localization and precision, requiring fewer interactions than previous models, especially in intricate scenarios.
  4. Image-Label Similarity Matching with CLIP

    • Dual Modality Learning: CLIP learns from both images and text descriptions, embedding them into a shared space for improved semantic understanding.
    • Semantic Labeling: This shared embedding allows the model to perform object labeling and segmentation with greater adaptability across various tasks.

Results

Dataset Mean Absolute Error (MAE)
PASCAL-S 0.051
ECSSD 0.030

IntelliSegment consistently achieves lower segmentation errors while ensuring robust object localization. Its refined boundary segmentation, powered by Meta SAM2, sets it apart from other state-of-the-art models.


Requirements

  • Python: 3.8 or higher
  • PyTorch: Latest stable version
  • OpenCV
  • YOLO-NAS
  • Meta SAM2
  • CLIP
  • Additional dependencies: numpy, matplotlib, tqdm

About

This repository uses SAM2's capabilities to create a segmentation and classification model

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.3%
  • Python 1.7%