Skip to content
/ ABS Public

[ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection

Notifications You must be signed in to change notification settings

BIT-DA/ABS

Repository files navigation


From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection

[ICML2025] Official Code of ABS

Lincan Cai, Jingxuan Kang, Shuang Li, Wenxuan Ma, Binhui Xie, Zhida Qin, Jian Liang

image

Contribution

  • We propose an Attention-Based Selection (ABS) to guide the cropping process, focusing on the main objects in the image and minimizing the risk of cropping background objects.
  • We introduce a feature selection that cropping at the feature map of the original image to supplement the cropped images with global information, ensuring that the model retains semantic understanding while focusing on local features.
  • We propose a soft matching approach, enabling targeted matching of text descriptions to different patches. ABS achieves state-of-the-art performance in zero-shot classification and out-of-distribution datasets, even outperforming methods that require finetuning.

Requirements

  • Environment

For proper execution, please ensure Python is installed and all required dependencies are configured by running the following command from the project root directory:

conda create --name <env> --file requirements.txt
  • Datasets

To proceed with the experiments, please follow these steps:

  1. Download the required dataset, please refer to Dataset.md
  2. Update the corresponding data_path parameter in the configuration files located in the cfgs folder to point to your local dataset directory.

Experiment with zero-shot and OOD Benchmark

Run the following command:

python main.py --dataset_name imagenet

where dataset_name specifies the dataset to be used.

Acknowledgements

This project is based on the project: WCA. We thank the authors for making the source code publicly available.

About

[ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages