|
1 |
| -# RewardAnything GitHub Pages |
| 1 | +# Core Knowledge Deficits in Multi-Modal Language Models |
2 | 2 |
|
3 |
| -This directory contains the GitHub Pages website for the Core Knowledge Deficits in Multi-Modal Language Models project. |
| 3 | +**Official website for the ICML 2025 paper submission** |
4 | 4 |
|
5 |
| -## 🏗️ Structure |
| 5 | +🌐 **Website**: [https://williamium3000.github.io/core-knowledge](https://williamium3000.github.io/core-knowledge) |
| 6 | +📄 **Paper**: [https://arxiv.org/abs/2410.10855](https://arxiv.org/abs/2410.10855) |
| 7 | +🤗 **Dataset**: [https://huggingface.co/grow-ai-like-a-child](https://huggingface.co/grow-ai-like-a-child) |
6 | 8 |
|
7 |
| -``` |
8 |
| -pages/ |
9 |
| -├── _config.yml # Jekyll configuration |
10 |
| -├── _layouts/ |
11 |
| -│ └── default.html # Main layout template |
12 |
| -├── index.html # Homepage content |
13 |
| -├── assets/ |
14 |
| -│ ├── images/ # Logo and image placeholders |
15 |
| -│ └── favicon.svg # Site favicon |
16 |
| -├── Gemfile # Ruby dependencies |
17 |
| -├── setup.sh # Local setup script |
18 |
| -└── README.md # This file |
19 |
| -``` |
| 9 | +## 📖 About |
20 | 10 |
|
21 |
| -## 🚀 Automatic Deployment |
| 11 | +This repository contains the official website for our paper "Core Knowledge Deficits in Multi-Modal Language Models". The website presents our comprehensive evaluation of 230 multi-modal language models using the **CoreCognition** benchmark, which assesses 12 foundational cognitive concepts grounded in developmental cognitive science. |
22 | 12 |
|
23 |
| -1. **Changes are pushed** to the `main` branch in the `pages/` directory |
24 |
| -2. **Manual trigger** via GitHub Actions tab |
| 13 | +## 🔍 Key Findings |
25 | 14 |
|
26 |
| -The deployment is handled by the GitHub Actions workflow in `.github/workflows/deploy-pages.yml`. |
| 15 | +Our research reveals four critical shortcomings in state-of-the-art Multi-modal Large Language Models (MLLMs): |
27 | 16 |
|
28 |
| -## 🏠 Local Development |
| 17 | +1. **Core Knowledge Deficits**: MLLMs excel at higher-level abilities but struggle with lower-level cognitive abilities |
| 18 | +2. **Misaligned Dependency**: Core abilities show weak cross-stage correlations, lacking developmental scaffolding |
| 19 | +3. **Predictability**: Performance on core knowledge predicts higher-level abilities |
| 20 | +4. **Limited Scaling**: MLLMs show minimal scalability improvements on low-level abilities compared to high-level ones |
29 | 21 |
|
30 |
| -### Quick Setup |
| 22 | +## 🧠 CoreCognition Benchmark |
31 | 23 |
|
32 |
| -```bash |
33 |
| -# Navigate to pages directory |
34 |
| -cd pages |
| 24 | +The **CoreCognition** benchmark evaluates twelve foundational cognitive concepts: |
| 25 | + |
| 26 | +1. **Permanence** - Objects persist when not perceived |
| 27 | +2. **Continuity** - Objects remain unified across space and time |
| 28 | +3. **Boundary** - Transitions between objects |
| 29 | +4. **Spatiality** - Understanding Euclidean properties |
| 30 | +5. **Perceptual Constancy** - Appearance changes ≠ property changes |
| 31 | +6. **Intuitive Physics** - Laws of physical interaction |
| 32 | +7. **Perspective** - Seeing what others see |
| 33 | +8. **Hierarchy** - Inclusion/exclusion of objects and categories |
| 34 | +9. **Conservation** - Property invariances despite transformations |
| 35 | +10. **Tool Use** - Manipulating objects to achieve goals |
| 36 | +11. **Intentionality** - Understanding what others want |
| 37 | +12. **Mechanical Reasoning** - Inferring actions from system states |
| 38 | + |
| 39 | +## 🔬 Concept Hacking |
35 | 40 |
|
36 |
| -# Run setup script (macOS/Linux) |
37 |
| -chmod +x setup.sh |
38 |
| -./setup.sh |
| 41 | +We introduce **Concept Hacking**, a novel controlled evaluation method that systematically manipulates task-relevant features while preserving task-irrelevant conditions. This reveals that MLLMs fail to develop genuine core knowledge understanding and instead rely on shortcut learning as they scale. |
| 42 | + |
| 43 | +## 📊 Evaluation Scale |
| 44 | + |
| 45 | +- **230 MLLMs** evaluated across different model families and sizes |
| 46 | +- **11 different prompts** to ensure robust evaluation |
| 47 | +- **>26,000 total judgments** across all models and tasks |
| 48 | +- **2,530 image-question pairs** in the benchmark |
| 49 | + |
| 50 | +## 🏗️ Website Structure |
39 | 51 |
|
40 |
| -# Start development server |
41 |
| -bundle exec jekyll serve |
42 | 52 | ```
|
| 53 | +├── _config.yml # Jekyll configuration |
| 54 | +├── _layouts/ |
| 55 | +│ └── default.html # Main layout template |
| 56 | +├── index.html # Homepage with full paper content |
| 57 | +├── assets/ |
| 58 | +│ ├── images/ # Paper figures and illustrations |
| 59 | +│ ├── growai.png # Site favicon |
| 60 | +│ └── favicon.svg # Backup favicon |
| 61 | +├── Gemfile # Ruby dependencies |
| 62 | +└── README.md # This file |
| 63 | +``` |
| 64 | + |
| 65 | +## 🚀 Local Development |
43 | 66 |
|
44 |
| -### Manual Setup |
| 67 | +To run the website locally: |
45 | 68 |
|
46 | 69 | ```bash
|
47 |
| -# Install Ruby dependencies |
| 70 | +# Install dependencies |
48 | 71 | gem install jekyll bundler
|
49 | 72 | bundle install
|
50 | 73 |
|
51 | 74 | # Serve the site locally
|
52 | 75 | bundle exec jekyll serve --livereload
|
53 | 76 | ```
|
54 | 77 |
|
55 |
| -Then visit: `http://localhost:4000/RewardAnything` |
| 78 | +Then visit: `http://localhost:4000/core-knowledge` |
56 | 79 |
|
57 |
| -## 📝 Configuration |
| 80 | +## 👥 Authors |
58 | 81 |
|
59 |
| -### GitHub Pages Settings |
| 82 | +**Yijiang Li¹**, **Qingying Gao²,§**, **Tianwei Zhao²,§**, **Bingyang Wang³,§**, **Haoran Sun²**, **Haiyun Lyu⁴**, **Robert D. Hawkins⁵**, **Nuno Vasconcelos¹**, **Tal Golan⁶**, **Dezhi Luo⁷,⁸,†**, **Hokin Deng⁹,†** |
60 | 83 |
|
61 |
| -1. Go to **Repository Settings** → **Pages** |
62 |
| -2. Source: **GitHub Actions** |
63 |
| -3. The workflow will handle the rest automatically |
| 84 | +¹University of California San Diego, ²Johns Hopkins University, ³Emory University, ⁴University of North Carolina at Chapel Hill, ⁵Stanford University, ⁶Ben-Gurion University of the Negev, ⁷University of Michigan, ⁸University College London, ⁹Carnegie Mellon University |
64 | 85 |
|
65 |
| -### Environment Variables |
| 86 | +§Equal Contribution, †Corresponding author |
66 | 87 |
|
67 |
| -The following are configured in `_config.yml`: |
| 88 | +## 📄 Citation |
68 | 89 |
|
69 |
| -- `github_username`: Your GitHub username |
70 |
| -- `paper_url`: Link to your arXiv paper |
71 |
| -- `huggingface_url`: Link to model weights |
72 |
| -- `pypi_url`: Link to PyPI package |
| 90 | +If you find this work useful in your research, please consider citing: |
73 | 91 |
|
74 |
| -## 🎨 Customization |
75 |
| - |
76 |
| -### Replacing Placeholder Images |
77 |
| - |
78 |
| -Replace the SVG placeholders in `assets/images/` with your actual logos: |
79 |
| - |
80 |
| -- `logo-placeholder.svg` → Navigation logo |
81 |
| -- `logo-placeholder-white.svg` → Footer logo (white version) |
82 |
| -- `hero-logo-placeholder.svg` → Large hero section logo |
83 |
| -- `favicon.svg` → Browser favicon |
84 |
| - |
85 |
| -### Updating Content |
86 |
| - |
87 |
| -- **Homepage**: Edit `index.html` |
88 |
| -- **Navigation**: Modify `_layouts/default.html` |
89 |
| -- **Site settings**: Update `_config.yml` |
90 |
| -- **Styling**: Customize Tailwind classes in templates |
91 |
| - |
92 |
| -### Adding New Pages |
93 |
| - |
94 |
| -Create new `.html` or `.md` files with front matter: |
95 |
| - |
96 |
| -```yaml |
97 |
| ---- |
98 |
| -layout: default |
99 |
| -title: "Page Title" |
100 |
| -description: "Page description" |
101 |
| ---- |
102 |
| - |
103 |
| -Your content here... |
| 92 | +```bibtex |
| 93 | +@article{li2025core, |
| 94 | + title={Core Knowledge Deficits in Multi-Modal Language Models}, |
| 95 | + author={Li, Yijiang and Gao, Qingying and Zhao, Tianwei and Wang, Bingyang and Sun, Haoran and Lyu, Haiyun and Luo, Dezhi and Deng, Hokin}, |
| 96 | + journal={arXiv preprint arXiv:2410.10855}, |
| 97 | + year={2025} |
| 98 | +} |
104 | 99 | ```
|
105 | 100 |
|
106 |
| -## 🔧 Troubleshooting |
| 101 | +## 📧 Contact |
107 | 102 |
|
108 |
| -### Local Development Issues |
| 103 | +For questions about the paper or dataset, please contact the corresponding authors: |
| 104 | +- Dezhi Luo: [dezhi@umich.edu](mailto:dezhi@umich.edu) |
| 105 | +- Hokin Deng: [hokindeng@cmu.edu](mailto:hokindeng@cmu.edu) |
109 | 106 |
|
110 |
| -```bash |
111 |
| -# Clean build files |
112 |
| -bundle exec jekyll clean |
113 |
| - |
114 |
| -# Rebuild dependencies |
115 |
| -bundle install --force |
116 |
| - |
117 |
| -# Verbose build for debugging |
118 |
| -bundle exec jekyll serve --verbose |
119 |
| -``` |
120 |
| - |
121 |
| -### Deployment Issues |
122 |
| - |
123 |
| -1. Check **Actions** tab for build logs |
124 |
| -2. Ensure `pages/` directory changes are pushed to `main` |
125 |
| -3. Verify GitHub Pages settings are correct |
| 107 | +## 🔧 Technical Details |
126 | 108 |
|
127 |
| -## 📊 Performance |
128 |
| - |
129 |
| -The site is optimized for: |
130 |
| -- ✅ Mobile responsiveness |
131 |
| -- ✅ Fast loading (Tailwind CSS via CDN) |
132 |
| -- ✅ SEO optimization |
133 |
| -- ✅ Accessibility |
134 |
| -- ✅ Modern browsers |
135 |
| - |
136 |
| -## 🤝 Contributing |
137 |
| - |
138 |
| -When making changes: |
139 |
| - |
140 |
| -1. Test locally first: `bundle exec jekyll serve` |
141 |
| -2. Commit changes to `pages/` directory |
142 |
| -3. Push to `main` branch |
143 |
| -4. Automatic deployment will trigger |
| 109 | +The website is built with: |
| 110 | +- **Jekyll** for static site generation |
| 111 | +- **Tailwind CSS** for styling |
| 112 | +- **GitHub Pages** for hosting |
| 113 | +- **Responsive design** optimized for all devices |
| 114 | +- **SEO optimization** for better discoverability |
144 | 115 |
|
145 | 116 | ---
|
| 117 | + |
| 118 | +*This website presents the official results and findings from our comprehensive evaluation of multi-modal language models on core cognitive abilities.* |
0 commit comments