This is the repository for the paper:
VaseVL: Multimodal Agent and Benchmark for Ancient Greek Pottery
Jinchao Ge*, Biao Wu*, Zeyu Zhang*†, Shiya Huang, Judith Bishop, Gillian Shepherd, Ling Chen, Meng Fang, Yang Zhao**
*Equal contribution. †Project lead. **Corresponding author.
website_demo.mp4
If you use any content of this repo for your work, please cite the following our paper:
We present VaseVL, a pioneering Multi-Modal Large Language Model (MLLM) agent for ancient Greek pottery, capable of understanding and analyzing visual and textual data to enhance cultural heritage preservation. To further support the research community, we introduce VaseVQA, a comprehensive Q&A benchmark for evaluating the reasoning and interpretative capabilities of MLLMs on ancient artifacts. The data has 31,773 multi-view vase images. From these, we select 11,693 as single-view images. The benchmark contains vision-language (VL) tasks of visual question answering. VaseVL achieves state-of-the-art performance in stylistic classification and historical attribution, providing critical tools for authentication, forgery detection, and digital archiving. Our final fine-tuning process for the 7B checkpoint uses 9,354 available vase data and finishes in 3~4 hours. Beyond academic contributions, VaseVL fosters global heritage conservation, mitigating cultural erosion and promoting public engagement with ancient Greek artistry.
To get the full experience of the VaseVL UI, you need to deploy it locally by following the steps below:
-
Clone the VaseVL repository to your local machine.
git clone https://github.com/AIGeeksGroup/VaseVL.git
-
Navigate to the
ui
directory which contains the front-end source code.cd ui
-
Install all required Node.js dependencies.
npm install
-
Build the UI project for production.
npm run build
-
Start the local server to launch the VaseVL Demo UI.
npm run start
Once the server starts, you can access the VaseVL Demo UI in your browser at http://localhost:1717/projects/1743242682314/playground
by default.
Our data is under NCND license. no commerical use. Do not modify our data for another dataset.