Skip to content

Commit a6f56f9

Browse files
Tianwei ZhaoTianwei Zhao
authored andcommitted
update
1 parent bcf7e02 commit a6f56f9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ <h1 class="text-3xl md:text-4xl font-bold text-slate-800 mb-8 max-w-4xl mx-auto"
1818
</p>
1919
<p>
2020
To explore the core knowledge representation in MLLMs, we introduce <strong>CoreCognition</strong>, a large-scale benchmark encompassing 12 core knowledge concepts grounded in developmental cognitive science.
21-
We evaluate 230 models with 11 different prompts, leading to a total of 1503 data points for analysis. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones.
21+
We evaluate 230 models with 11 different prompts. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones.
2222
</p>
2323
<p>
2424
Finally, we propose <strong>Concept Hacking</strong>, a novel controlled evaluation method, that reveals MLLMs fail to progress toward genuine core knowledge understanding, but instead rely on shortcut learning as they scale.

0 commit comments

Comments
 (0)