There are three steps to train the classification network with selected high-quality pseudo labels.

Download the HPH dataset here
Download the LC25K dataset here
Download the CRC100K dataset here
Download the DigestPath dataset here
Using a 4:1 split for training and testing.
First, use the on-the-shelf VLM for zero-shot inference with our proposed method to filter out noisy samples on the training set.
In the vlm_cpl_LC25K.py file, there are two main functions, MVCandPrompt_feature_consensus.
You can use the combination of MVCandPrompt_feature_consensus or either one alone. You can also adjust the order of these two filters.
python vlm_cpl_LC25K.py --gpu 0
Second, after obtaining high-quality pseudo-labels, you can train a classification network.
python train_pseudo.py --gpu 0 --pseudo_csv <your_csv>