Skip to content

Commit bcf7e02

Browse files
Tianwei ZhaoTianwei Zhao
authored andcommitted
update
1 parent d5a9473 commit bcf7e02

File tree

3 files changed

+43
-36
lines changed

3 files changed

+43
-36
lines changed

assets/images/Data_curation.pdf

30.2 MB
Binary file not shown.

assets/images/Data_curation.png

1.56 MB
Loading

index.html

Lines changed: 43 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ <h1 class="text-3xl md:text-4xl font-bold text-slate-800 mb-8 max-w-4xl mx-auto"
1818
</p>
1919
<p>
2020
To explore the core knowledge representation in MLLMs, we introduce <strong>CoreCognition</strong>, a large-scale benchmark encompassing 12 core knowledge concepts grounded in developmental cognitive science.
21-
We evaluate 230 models with 11 different prompts, leading to a total of 2,530 data points for analysis. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones.
21+
We evaluate 230 models with 11 different prompts, leading to a total of 1503 data points for analysis. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones.
2222
</p>
2323
<p>
2424
Finally, we propose <strong>Concept Hacking</strong>, a novel controlled evaluation method, that reveals MLLMs fail to progress toward genuine core knowledge understanding, but instead rely on shortcut learning as they scale.
@@ -68,9 +68,9 @@ <h1 class="text-3xl md:text-4xl font-bold text-slate-800 mb-8 max-w-4xl mx-auto"
6868
<sup>1</sup>University of California San Diego &emsp;
6969
<sup>2</sup>Johns Hopkins University &emsp;
7070
<sup>3</sup>Emory University &emsp;
71-
<sup>4</sup>University of North Carolina at Chapel Hill &emsp;
7271
</div>
7372
<div class="mb-4 font-bold">
73+
<sup>4</sup>University of North Carolina at Chapel Hill &emsp;
7474
<sup>5</sup>Stanford University &emsp;
7575
<sup>6</sup>Ben-Gurion University of the Negev &emsp;
7676
</div>
@@ -135,7 +135,7 @@ <h3 class="text-2xl font-bold text-gray-900">Dataset Curation</h3>
135135
</div>
136136
<h4 class="text-lg font-bold text-gray-900">Discriminativeness</h4>
137137
</div>
138-
<p class="text-sm text-gray-700 leading-relaxed text-justify">
138+
<p class="text-sm text-gray-700 leading-relaxed text-left">
139139
Instances should be structured such that models lacking the targeted core knowledge necessarily select the <strong class="text-red-600">incorrect answers</strong>, thereby ensuring the discriminative power.
140140
</p>
141141
</div>
@@ -150,7 +150,7 @@ <h4 class="text-lg font-bold text-gray-900">Discriminativeness</h4>
150150
</div>
151151
<h4 class="text-lg font-bold text-gray-900">Minimal Confounding</h4>
152152
</div>
153-
<p class="text-sm text-gray-700 leading-relaxed text-justify">
153+
<p class="text-sm text-gray-700 leading-relaxed text-left">
154154
Questions should minimize reliance on confounding capabilities, such as <strong>object recognition</strong>, and must avoid conceptual overlap with other core knowledge included in the benchmark.
155155
</p>
156156
</div>
@@ -166,7 +166,7 @@ <h4 class="text-lg font-bold text-gray-900">Minimal Confounding</h4>
166166
</div>
167167
<h4 class="text-lg font-bold text-gray-900">Minimal Text Shortcut</h4>
168168
</div>
169-
<p class="text-sm text-gray-700 leading-relaxed text-justify">
169+
<p class="text-sm text-gray-700 leading-relaxed text-left">
170170
Instances should be crafted so that answers cannot be derived through textual shortcuts alone but require <strong class="text-blue-600">genuine multimodal comprehension</strong>.
171171
</p>
172172
</div>
@@ -190,6 +190,13 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Expert Collaboration</h4>
190190
</div>
191191
</div>
192192

193+
<!-- Data Curation Process Figure -->
194+
<div class="paper-figure-container mx-auto max-w-5xl mb-12 hover:shadow-xl transition-all duration-300 transform hover:-translate-y-1 cursor-pointer">
195+
<img src="{{ '/assets/images/Data_curation.png' | relative_url }}"
196+
alt="Data Curation Process and Methodology"
197+
class="w-full h-auto rounded-lg shadow-sm">
198+
</div>
199+
193200
<!-- Twelve Core Concepts Card -->
194201
<div class="bg-white rounded-2xl p-8 shadow-xl border border-gray-100 hover:shadow-2xl transition-all duration-300 transform hover:-translate-y-1">
195202
<div class="flex items-center mb-6">
@@ -210,7 +217,7 @@ <h3 class="text-2xl font-bold text-gray-900">Twelve Core Concepts</h3>
210217
</div>
211218
<div>
212219
<h4 class="text-lg font-bold text-gray-900 mb-2">Permanence</h4>
213-
<p class="text-sm text-gray-700 leading-relaxed text-justify">Objects do not cease to exist when they are no longer perceived.</p>
220+
<p class="text-sm text-gray-700 leading-relaxed text-left">Objects do not cease to exist when they are no longer perceived.</p>
214221
</div>
215222
</div>
216223
</div>
@@ -223,7 +230,7 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Permanence</h4>
223230
</div>
224231
<div>
225232
<h4 class="text-lg font-bold text-gray-900 mb-2">Continuity</h4>
226-
<p class="text-sm text-gray-700 leading-relaxed text-justify">Objects persist as unified, cohesive entities across space and time.</p>
233+
<p class="text-sm text-gray-700 leading-relaxed text-left">Objects persist as unified, cohesive entities across space and time.</p>
227234
</div>
228235
</div>
229236
</div>
@@ -236,7 +243,7 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Continuity</h4>
236243
</div>
237244
<div>
238245
<h4 class="text-lg font-bold text-gray-900 mb-2">Boundary</h4>
239-
<p class="text-sm text-gray-700 leading-relaxed text-justify">The transition from one object to another.</p>
246+
<p class="text-sm text-gray-700 leading-relaxed text-left">The transition from one object to another.</p>
240247
</div>
241248
</div>
242249
</div>
@@ -249,10 +256,10 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Boundary</h4>
249256
</div>
250257
<div>
251258
<h4 class="text-lg font-bold text-gray-900 mb-2">Spatiality</h4>
252-
<p class="text-sm text-gray-700 leading-relaxed text-justify">The <em>a priori</em> understanding of the Euclidean properties of the world.</p>
259+
<p class="text-sm text-gray-700 leading-relaxed text-left">The <em>a priori</em> understanding of the Euclidean properties of the world.</p>
260+
</div>
261+
</div>
253262
</div>
254-
</div>
255-
</div>
256263

257264
<!-- 5. Perceptual Constancy -->
258265
<div class="p-6 rounded-xl hover:shadow-lg transition-all duration-300 shadow-md" style="background-color: #D7F0FE;">
@@ -262,7 +269,7 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Spatiality</h4>
262269
</div>
263270
<div>
264271
<h4 class="text-lg font-bold text-gray-900 mb-2">Perceptual Constancy</h4>
265-
<p class="text-sm text-gray-700 leading-relaxed text-justify">Changes in appearances don't mean changes in physical properties.</p>
272+
<p class="text-sm text-gray-700 leading-relaxed text-left">Changes in appearances don't mean changes in physical properties.</p>
266273
</div>
267274
</div>
268275
</div>
@@ -272,10 +279,10 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Perceptual Constancy</h4>
272279
<div class="flex items-start">
273280
<div class="rounded-lg mr-4 mt-1 flex-shrink-0" style="background-color: #BEE4FD;">
274281
<span class="w-14 h-14 flex items-center justify-center font-bold text-4xl text-gray-800">6</span>
275-
</div>
276-
<div>
282+
</div>
283+
<div>
277284
<h4 class="text-lg font-bold text-gray-900 mb-2">Intuitive Physics</h4>
278-
<p class="text-sm text-gray-700 leading-relaxed text-justify">Intuitions about the laws of how things interact in the physical world.</p>
285+
<p class="text-sm text-gray-700 leading-relaxed text-left">Intuitions about the laws of how things interact in the physical world.</p>
279286
</div>
280287
</div>
281288
</div>
@@ -288,9 +295,9 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Intuitive Physics</h4>
288295
</div>
289296
<div>
290297
<h4 class="text-lg font-bold text-gray-900 mb-2">Perspective</h4>
291-
<p class="text-sm text-gray-700 leading-relaxed text-justify">To see what others see.</p>
292-
</div>
293-
</div>
298+
<p class="text-sm text-gray-700 leading-relaxed text-left">To see what others see.</p>
299+
</div>
300+
</div>
294301
</div>
295302

296303
<!-- 8. Hierarchy -->
@@ -301,7 +308,7 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Perspective</h4>
301308
</div>
302309
<div>
303310
<h4 class="text-lg font-bold text-gray-900 mb-2">Hierarchy</h4>
304-
<p class="text-sm text-gray-700 leading-relaxed text-justify">Understanding of inclusion and exclusion of objects and categories.</p>
311+
<p class="text-sm text-gray-700 leading-relaxed text-left">Understanding of inclusion and exclusion of objects and categories.</p>
305312
</div>
306313
</div>
307314
</div>
@@ -314,7 +321,7 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Hierarchy</h4>
314321
</div>
315322
<div>
316323
<h4 class="text-lg font-bold text-gray-900 mb-2">Conservation</h4>
317-
<p class="text-sm text-gray-700 leading-relaxed text-justify">Invariances of properties despite transformations.</p>
324+
<p class="text-sm text-gray-700 leading-relaxed text-left">Invariances of properties despite transformations.</p>
318325
</div>
319326
</div>
320327
</div>
@@ -327,7 +334,7 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Conservation</h4>
327334
</div>
328335
<div>
329336
<h4 class="text-lg font-bold text-gray-900 mb-2">Tool Use</h4>
330-
<p class="text-sm text-gray-700 leading-relaxed text-justify">The capacity to manipulate specific objects to achieve goals.</p>
337+
<p class="text-sm text-gray-700 leading-relaxed text-left">The capacity to manipulate specific objects to achieve goals.</p>
331338
</div>
332339
</div>
333340
</div>
@@ -340,10 +347,10 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Tool Use</h4>
340347
</div>
341348
<div>
342349
<h4 class="text-lg font-bold text-gray-900 mb-2">Intentionality</h4>
343-
<p class="text-sm text-gray-700 leading-relaxed text-justify">To see what others want.</p>
344-
</div>
345-
</div>
346-
</div>
350+
<p class="text-sm text-gray-700 leading-relaxed text-left">To see what others want.</p>
351+
</div>
352+
</div>
353+
</div>
347354

348355
<!-- 12. Mechanical Reasoning -->
349356
<div class="p-6 rounded-xl hover:shadow-lg transition-all duration-300 shadow-md" style="background-color: #CFFAFE;">
@@ -353,10 +360,10 @@ <h4 class="text-lg font-bold text-gray-900 mb-2">Intentionality</h4>
353360
</div>
354361
<div>
355362
<h4 class="text-lg font-bold text-gray-900 mb-2">Mechanical Reasoning</h4>
356-
<p class="text-sm text-gray-700 leading-relaxed text-justify">Inferring actions from system states and vice versa.</p>
363+
<p class="text-sm text-gray-700 leading-relaxed text-left">Inferring actions from system states and vice versa.</p>
357364
</div>
358365
</div>
359-
</div>
366+
</div>
360367
</div>
361368
</div>
362369

@@ -386,14 +393,14 @@ <h3 class="text-2xl font-bold text-gray-900">Dataset Statistics</h3>
386393

387394
<!-- Statistic 3 -->
388395
<div>
389-
<div class="text-5xl font-bold text-purple-500 mb-2">&gt;26k</div>
390-
<div class="text-gray-700 font-medium">Total Judgments</div>
396+
<div class="text-5xl font-bold text-blue-500 mb-2">1503</div>
397+
<div class="text-gray-700 font-medium">Image-Question Pairs</div>
391398
</div>
392399

393400
<!-- Statistic 4 -->
394401
<div>
395-
<div class="text-5xl font-bold text-blue-500 mb-2">2,530</div>
396-
<div class="text-gray-700 font-medium">Image-Question Pairs</div>
402+
<div class="text-5xl font-bold text-purple-500 mb-2">&gt;3800k</div>
403+
<div class="text-gray-700 font-medium">Total Judgments</div>
397404
</div>
398405
</div>
399406
</div>
@@ -412,7 +419,7 @@ <h3 class="text-2xl font-bold text-gray-900">Dataset Statistics</h3>
412419
<div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
413420
<div class="text-center mb-12">
414421
<h2 class="text-3xl md:text-4xl font-bold text-gray-900 mb-4">Key Findings</h2>
415-
<p class="text-xl text-gray-600 max-w-4xl mx-auto leading-relaxed text-justify">
422+
<p class="text-xl text-gray-600 max-w-4xl mx-auto leading-relaxed text-center">
416423
Our study uncovers <strong>four primary shortcomings</strong> shared by state-of-the-art MLLMs:
417424
</p>
418425
</div>
@@ -575,7 +582,7 @@ <h2 class="text-3xl md:text-4xl font-bold text-gray-900 mb-4">Concept Hacking: A
575582
</div>
576583
<h4 class="text-lg font-bold text-gray-900">Core Knowledge</h4>
577584
</div>
578-
<p class="text-sm text-gray-700 leading-relaxed text-justify">
585+
<p class="text-sm text-gray-700 leading-relaxed text-left">
579586
Correct responses on both controlled and manipulated tasks indicate genuine conceptual understanding.
580587
</p>
581588
</div>
@@ -590,7 +597,7 @@ <h4 class="text-lg font-bold text-gray-900">Core Knowledge</h4>
590597
</div>
591598
<h4 class="text-lg font-bold text-gray-900">Shortcut-taking</h4>
592599
</div>
593-
<p class="text-sm text-gray-700 leading-relaxed text-justify">
600+
<p class="text-sm text-gray-700 leading-relaxed text-left">
594601
Models exploiting training data similarities perform well on controlled tasks but fail when familiar patterns are paired with inverted labels.
595602
</p>
596603
</div>
@@ -605,7 +612,7 @@ <h4 class="text-lg font-bold text-gray-900">Shortcut-taking</h4>
605612
</div>
606613
<h4 class="text-lg font-bold text-gray-900">Core Deficits</h4>
607614
</div>
608-
<p class="text-sm text-gray-700 leading-relaxed text-justify">
615+
<p class="text-sm text-gray-700 leading-relaxed text-left">
609616
Incorrect responses to controlled tasks, regardless of manipulation performance, indicate the absence of core knowledge.
610617
</p>
611618
</div>
@@ -625,7 +632,7 @@ <h4 class="text-lg font-bold text-gray-900">Core Deficits</h4>
625632
<div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
626633
<div class="text-center mb-8">
627634
<h2 class="text-3xl md:text-4xl font-bold mb-4">Citation</h2>
628-
<p class="text-xl text-gray-300 max-w-3xl mx-auto text-justify">
635+
<p class="text-xl text-gray-300 max-w-3xl mx-auto text-center">
629636
If you find this project useful in your research, please consider citing:
630637
</p>
631638
</div>

0 commit comments

Comments
 (0)