You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To explore the core knowledge representation in MLLMs, we introduce <strong>CoreCognition</strong>, a large-scale benchmark encompassing 12 core knowledge concepts grounded in developmental cognitive science.
21
-
We evaluate 230 models with 11 different prompts, leading to a total of 1503 data points for analysis. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones.
21
+
We evaluate 230 models with 11 different prompts. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones.
22
22
</p>
23
23
<p>
24
24
Finally, we propose <strong>Concept Hacking</strong>, a novel controlled evaluation method, that reveals MLLMs fail to progress toward genuine core knowledge understanding, but instead rely on shortcut learning as they scale.
0 commit comments