-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Pull requests: EleutherAI/lm-evaluation-harness
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Feat: Add team Shaikespear submission from NeurIPS E2LM Competition
#3437
opened Nov 29, 2025 by
younesbelkada
Loading…
Fix pretty_print_task for external custom task configs
#3436
opened Nov 28, 2025 by
safikhanSoofiyani
Loading…
Refactor: Decouple
ContextSampler from Task; build_qa_turn
#3429
opened Nov 25, 2025 by
baberabb
Loading…
Fix wrong
gpqa_diamond_generative_n_shot answer template
#3407
opened Nov 15, 2025 by
fxmarty-amd
Loading…
Fix: Prevent infinite loop when max_seq_lengths < 4096 in prepare_niah.py
#3372
opened Oct 28, 2025 by
vnayakde
Loading…
Add support for configurable chrF metric parameters in task YAML, fix…
#3363
opened Oct 23, 2025 by
augustlakia
Loading…
[AIME24 | AIME25] Enable Multiple Generation Repeats with Pass@k and Majority@k Metrics
#3351
opened Oct 17, 2025 by
ihebchaa
Loading…
feat: Add support for accelerate-wrapped models in simple_evaluate()
#3313
opened Sep 26, 2025 by
DhruvaKashyap
Loading…
Support empty response for Completions and ChatCompletions API
#3309
opened Sep 22, 2025 by
tboerstad
Loading…
Adding New Task SLR-Bench : Scalable Logical Reasoning Benchmark
#3305
opened Sep 20, 2025 by
Ahmad21Omar
Loading…
Add long-context evaluation benchmarks (LongBench v2, Babilong, InfiniteBench, Phonebook)
#3256
opened Aug 21, 2025 by
Mariani-code
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.