|
| 1 | +# CustomConcept101 |
| 2 | + |
| 3 | +We release a dataset of 101 concepts with 3-15 images for each concept for evaluating model customization methods. |
| 4 | + |
| 5 | +<br> |
| 6 | +<div> |
| 7 | +<p align="center"> |
| 8 | +<img src='../assets/sample_images.png' align="center" width=800> |
| 9 | +</p> |
| 10 | +</div> |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +## Download dataset |
| 15 | + |
| 16 | +``` |
| 17 | +git clone https://github.com/adobe-research/custom-diffusion.git |
| 18 | +cd custom-diffusion/customconcept101 |
| 19 | +wget https://www.cs.cmu.edu/~custom-diffusion/assets/benchmark_dataset.zip |
| 20 | +unzip benchmark_dataset.zip |
| 21 | +``` |
| 22 | + |
| 23 | +## Evaluation |
| 24 | + |
| 25 | +We provide a set of text prompts for each concept in the [prompts](prompts/) folder. The prompt file corresponding to each concept is mentioned in [dataset.json](dataset.json) and [dataset_multiconcept.json](dataset_multiconcept.json). The CLIP feature based image and text similarity can be calculated as: |
| 26 | + |
| 27 | +``` |
| 28 | +python evaluate.py --sample_root {folder} --target_path {target-folder} --numgen {numgen} |
| 29 | +``` |
| 30 | + |
| 31 | +* `sample_root`: the root location to generated images. The folder should contain subfolder `samples` with generated images. It should also contain a `prompts.json` file with `{'imagename.stem': 'text prompt'}` for each image in the samples subfolder. |
| 32 | +* `target_path`: file to target real images. |
| 33 | +* `numgen`: number of images in the `sample_root/samples` folder |
| 34 | +* `outpkl`: the location to save evaluation results (default: evaluation.pkl) |
| 35 | + |
| 36 | +## Results |
| 37 | +We compare our method (Custom Diffusion) with [DreamBooth](https://dreambooth.github.io) and [Textual Inversion](https://textual-inversion.github.io) on this dataset. We trained DreamBooth and Textual Inversion according to the suggested hyperparameters in the respective papers. Both Ours and DreamBooth are trained with generated images as regularization. |
| 38 | + |
| 39 | +**Single concept** |
| 40 | + |
| 41 | +<table> |
| 42 | + <tr> |
| 43 | + <td></td> |
| 44 | + <td colspan=3 align=center > 200 DDPM </td> |
| 45 | + <td colspan=3 align=center> 50 DDPM </td> |
| 46 | + </tr> |
| 47 | + <tr> |
| 48 | + <td></td> |
| 49 | + <td>Textual-alignment (CLIP)</td> |
| 50 | + <td>Image-alignment (CLIP)</td> |
| 51 | + <td>Image-alignment (DINO)</td> |
| 52 | + <td>Textual-alignment (CLIP)</td> |
| 53 | + <td>Image-alignment (CLIP)</td> |
| 54 | + <td>Image-alignment (DINO)</td> |
| 55 | + </tr> |
| 56 | + <tr> |
| 57 | + <td>Textual Inversion</td> |
| 58 | + <td> 0.6126 </td> |
| 59 | + <td> 0.7524 </td> |
| 60 | + <td> 0.5111 </td> |
| 61 | + <td> 0.6117 </td> |
| 62 | + <td> 0.7530 </td> |
| 63 | + <td> 0.5128 </td> |
| 64 | + </tr> |
| 65 | + <tr> |
| 66 | + <td>DreamBooth</td> |
| 67 | + <td> 0.7522 </td> |
| 68 | + <td> 0.7520 </td> |
| 69 | + <td> 0.5533 </td> |
| 70 | + <td> 0.7514 </td> |
| 71 | + <td> 0.7521 </td> |
| 72 | + <td> 0.5541 </td> |
| 73 | + </tr> |
| 74 | + <tr> |
| 75 | + <td> Custom Diffusion (Ours)</td> |
| 76 | + <td> 0.7602 </td> |
| 77 | + <td> 0.7440 </td> |
| 78 | + <td> 0.5311 </td> |
| 79 | + <td> 0.7583 </td> |
| 80 | + <td> 0.7456 </td> |
| 81 | + <td> 0.5335 </td> |
| 82 | + </tr> |
| 83 | +</table> |
| 84 | + |
| 85 | +**Multiple concept** |
| 86 | + |
| 87 | +<table> |
| 88 | + <tr> |
| 89 | + <td></td> |
| 90 | + <td colspan=3 align=center > 200 DDPM </td> |
| 91 | + <td colspan=3 align=center> 50 DDPM </td> |
| 92 | + </tr> |
| 93 | + <tr> |
| 94 | + <td></td> |
| 95 | + <td>Textual-alignment (CLIP)</td> |
| 96 | + <td>Image-alignment (CLIP)</td> |
| 97 | + <td>Image-alignment (DINO)</td> |
| 98 | + <td>Textual-alignment (CLIP)</td> |
| 99 | + <td>Image-alignment (CLIP)</td> |
| 100 | + <td>Image-alignment (DINO)</td> |
| 101 | + </tr> |
| 102 | + <tr> |
| 103 | + <td>DreamBooth</td> |
| 104 | + <td> 0.7383 </td> |
| 105 | + <td> 0.6625 </td> |
| 106 | + <td> 0.3816 </td> |
| 107 | + <td> 0.7366 </td> |
| 108 | + <td> 0.6636 </td> |
| 109 | + <td> 0.3849 </td> |
| 110 | + </tr> |
| 111 | + <tr> |
| 112 | + <td>Custom Diffusion (Opt)</td> |
| 113 | + <td> 0.7627 </td> |
| 114 | + <td> 0.6577 </td> |
| 115 | + <td> 0.3650 </td> |
| 116 | + <td> 0.7599 </td> |
| 117 | + <td> 0.6595 </td> |
| 118 | + <td> 0.3684 </td> |
| 119 | + </tr> |
| 120 | + <tr> |
| 121 | + <td> Custom Diffusion (Joint)</td> |
| 122 | + <td> 0.7567 </td> |
| 123 | + <td> 0.6680 </td> |
| 124 | + <td> 0.3760 </td> |
| 125 | + <td> 0.7534 </td> |
| 126 | + <td> 0.6704 </td> |
| 127 | + <td> 0.3799 </td> |
| 128 | + </tr> |
| 129 | +</table> |
| 130 | + |
| 131 | + |
| 132 | +## License |
| 133 | +Images taken from UnSplash are under [Unsplash License](https://unsplash.com/license). Images captured by ourselves are released under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) license. Flower category images are downloaded from Wikimedia/Flickr/Pixabay and the link to orginial images can be found [here](https://www.cs.cmu.edu/~custom-diffusion/assets/urls.txt). |
| 134 | + |
| 135 | + |
| 136 | +## Acknowledgments |
| 137 | +We are grateful to Sheng-Yu Wang, Songwei Ge, Daohan Lu, Ruihan Gao, Roni Shechtman, Avani Sethi, Yijia Wang, Shagun Uppal, and Zhizhuo Zhou for helping with the dataset collection, and Nick Kolkin for the feedback. |
0 commit comments