feat: Add Zebra Grid dataset support with ZeroEval alignment by max-yue · Pull Request #2234 · open-compass/opencompass

max-yue · 2025-08-11T03:59:00Z

📝 Description

Add support for Zebra Grid logic puzzle dataset from allenai/ZebraLogicBench-private with exact ZeroEval alignment.

🚀 Features

ZebraGridDataset class for loading HuggingFace datasets
ZebraGridEvaluator with puzzle accuracy and cell accuracy metrics
Exact ZeroEval prompt template and JSON extraction logic
Complete evaluation configuration with example Qwen3-8B model
Achieved 82.4% accuracy on 1000 samples (close to official 84.8%)

🧪 Testing

Tested on 1000 samples with Qwen3-8B thinking model:

Puzzle Accuracy: 82.40%
Cell Accuracy: 83.74%
Performance close to ZeroEval benchmark

📚 Usage

python run.py opencompass/configs/eval_zebra_grid.py --work-dir ./outputs/zebra_grid

🔗 Related

Based on allenai/ZebraLogicBench-private dataset
Aligned with ZeroEval evaluation framework
Compatible with VLLM backend for efficient inference

- Add ZebraGridDataset class for loading allenai/ZebraLogicBench-private - Implement ZebraGridEvaluator with puzzle accuracy and cell accuracy metrics - Use exact ZeroEval prompt template and JSON extraction logic - Include complete evaluation configuration and model setup - Achieve 82.4% accuracy on 1000 samples, close to official 84.8% Features: - Supports HuggingFace datasets integration - Compatible with VLLM backend for efficient inference - Provides detailed evaluation metrics (puzzle_accuracy, cell_accuracy) - Includes example Qwen3-8B thinking model configuration Usage: python run.py opencompass/configs/eval_zebra_grid.py --work-dir ./outputs/zebra_grid

mm-assistant bot assigned bittersweet1999 Aug 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Zebra Grid dataset support with ZeroEval alignment#2234

feat: Add Zebra Grid dataset support with ZeroEval alignment#2234
max-yue wants to merge 1 commit intoopen-compass:mainfrom
max-yue:add-zebra-grid-dataset

max-yue commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

max-yue commented Aug 11, 2025

📝 Description

🚀 Features

🧪 Testing

📚 Usage

🔗 Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants