SEA-VL is a large and ambitious initiative, so we have decided to split it into two phases. In Phase 1 of SEA-VL, we collected self-taken, culturally-relevant images with descriptions about the shared image in the respective local language. This dataset was compiled into an open-access SEA-relevant image dataset, the largest of its kind to date. This dataset will serve as the foundation for Phase 2, where we’ll develop instruction-tuning VL datasets and build a SEA-specific vision language model (VLM) using the constructed dataset.
0 commit comments