From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection
[ICML2025] Official Code of ABS
Lincan Cai, Jingxuan Kang, Shuang Li, Wenxuan Ma, Binhui Xie, Zhida Qin, Jian Liang

- We propose an Attention-Based Selection (ABS) to guide the cropping process, focusing on the main objects in the image and minimizing the risk of cropping background objects.
- We introduce a feature selection that cropping at the feature map of the original image to supplement the cropped images with global information, ensuring that the model retains semantic understanding while focusing on local features.
- We propose a soft matching approach, enabling targeted matching of text descriptions to different patches. ABS achieves state-of-the-art performance in zero-shot classification and out-of-distribution datasets, even outperforming methods that require finetuning.
- Environment
For proper execution, please ensure Python is installed and all required dependencies are configured by running the following command from the project root directory:
conda create --name <env> --file requirements.txt
- Datasets
To proceed with the experiments, please follow these steps:
- Download the required dataset, please refer to Dataset.md
- Update the corresponding
data_path
parameter in the configuration files located in thecfgs
folder to point to your local dataset directory.
Run the following command:
python main.py --dataset_name imagenet
where dataset_name
specifies the dataset to be used.
This project is based on the project: WCA. We thank the authors for making the source code publicly available.