-
Notifications
You must be signed in to change notification settings - Fork 33
Adds lm_eval to evaluations #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bigximik
wants to merge
49
commits into
main
Choose a base branch
from
denis/lm_eval
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 10 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
cb744b2
copy from sandbox
bigximik 0967483
changes for loss test for new tests structure
bigximik 71ff61a
lm_eval integration changes for the new api
bigximik 79fd43e
made lm_eval dependency lazy imported for optional dependency
bigximik 2d9f479
removed hard coded batch size
bigximik 7c62100
remved unncecessary set to evaluatation
bigximik c89d269
commit wandb step after finishing logging
bigximik 9455cd5
support for env varieables for lm_eval integration
bigximik 69180a3
merge from main
bigximik c9a3b18
user guide for evaluators added
bigximik 426b5e3
fix tensor concatination for logits from different gpus
bigximik 0bf8282
docs update
bigximik 68f524b
removed manual test configs
bigximik a36e0be
added debug prints
bigximik 9baa512
fix for gather_list and remove debug print
bigximik 21678ab
removed debug print
bigximik 7cccf9a
moved returned logits to cpu in lm_eval wrapper
bigximik 7cd681a
fix to move all logits computations to cpu
bigximik 59ff1e5
Merge branch 'main' of github.com:ServiceNow/Fast-LLM into denis/lm_eval
bigximik 27e5de8
Merge branch 'main' of github.com:ServiceNow/Fast-LLM into denis/lm_eval
bigximik 88faca0
fix typo
bigximik e3a4a6e
removed commented code, obsolete todo
bigximik 89e67d2
changes to wrapper
bigximik 6871359
refactorred lm_eval integration
bigximik 6b74739
import change
bigximik c398444
zero stage 3 inference warning added and TODO
bigximik 62846d2
removed docstrings
bigximik e61cc3e
removed unused fields, change generate call
bigximik 6a2ab35
changed to all fields to be private, removed properties which are useβ¦
bigximik 6e1704f
Simplify scatter/gather
jlamypoirier 2499b4e
clean up, more comments
bigximik 44aa138
fixed tipo
bigximik f81a673
moved setting of NUMEXPR_MAX_THREADS
bigximik d56ce57
Evaluators renames
bigximik b32c91f
return change
bigximik 93091dd
change local function to lambda
bigximik 50e65ee
somme speedup
bigximik d32258e
fix not to log absent head output
bigximik 98d1d77
added lm_eval integration tests
bigximik 9f2de97
fix not removal comment for import
bigximik b451543
docs update
bigximik 910d54e
scatter fix
bigximik 077f2ac
fix offset normalization in validation
bigximik ac9025d
tests polishing
bigximik 30d85df
more tests polishing
bigximik f60fa35
fixes
jlamypoirier ada41ca
Merge branch 'main' of github.com:ServiceNow/Fast-LLM into denis/lm_eval
bigximik 2f5d2d0
changed prepare funciton to just copy traning runs
bigximik f05db2c
disabled test
bigximik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
import huggingface_hub | ||
import pytest | ||
import transformers | ||
|
||
from tests.models.test_checkpoint import _prepare_resume_fn | ||
from tests.utils.model_configs import ModelTestingGroup | ||
from tests.utils.utils import requires_cuda, requires_lm_eval | ||
|
||
# NOTE: These tests only verify that the functionality runs without crashing. | ||
# NOTE: The tokenizer is from a LLaMA-style model, which may not be suitable for all models, | ||
# but it should be sufficient since we are not concerned with actual accuracy in this tests. | ||
|
||
|
||
@pytest.fixture(scope="module") | ||
def model_path(result_path): | ||
bigximik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return huggingface_hub.snapshot_download( | ||
repo_id="HuggingFaceTB/SmolLM2-135M-Instruct", | ||
local_dir=result_path / "lm_eval/model", | ||
) | ||
|
||
|
||
def get_lm_eval_config(base_path, tokenizer_path): | ||
tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_path) | ||
return [ | ||
f"data.tokenizer.path={tokenizer_path}", | ||
f"model.base_model.vocab_size={tokenizer.vocab_size}", | ||
"training.evaluators.evaluation_test.interval=1", | ||
"training.evaluators.evaluation_test.evaluator.type=lm_eval", | ||
"training.evaluators.evaluation_test.evaluator.cli_args=" | ||
f'["--tasks","gsm8k,xnli_en,wikitext","--output_path","{str(base_path / "lm_eval")}","--limit","10"]', | ||
] | ||
|
||
|
||
@pytest.mark.extra_slow | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How long does this take? It would be worrying not to have any tests other than extra-slow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. very long 40-80 sec per test |
||
@requires_lm_eval | ||
@requires_cuda | ||
@pytest.mark.model_testing_group(ModelTestingGroup.basic) | ||
bigximik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def test_lm_eval_in_training(run_test_script_for_all_models, run_test_script_base_path, model_path): | ||
run_test_script_for_all_models( | ||
get_lm_eval_config(run_test_script_base_path / "test_lm_eval_in_training", model_path) | ||
+ ["training.checkpoint.interval=1"] | ||
) | ||
|
||
|
||
@pytest.mark.extra_slow | ||
@requires_lm_eval | ||
@requires_cuda | ||
@pytest.mark.depends_on(on=["test_lm_eval_in_training[{model_testing_config}]"]) | ||
@pytest.mark.model_testing_group(ModelTestingGroup.basic) | ||
def test_lm_eval_evaluation(run_test_script_for_all_models, run_test_script_base_path, model_path): | ||
run_test_script_for_all_models( | ||
get_lm_eval_config(run_test_script_base_path / "test_lm_eval_evaluation", model_path), | ||
compare="test_lm_eval_in_training", | ||
prepare_fn=_prepare_resume_fn, | ||
do_compare=False, | ||
task="evaluate", | ||
) | ||
|
||
|
||
@pytest.mark.extra_slow | ||
@requires_lm_eval | ||
@requires_cuda | ||
@pytest.mark.model_testing_group(ModelTestingGroup.distributed) | ||
def test_lm_eval_in_training_dp2(run_test_script_for_all_models, run_test_script_base_path, model_path): | ||
run_test_script_for_all_models( | ||
get_lm_eval_config(run_test_script_base_path / "test_lm_eval_in_training_dp2", model_path) | ||
+ ["training.checkpoint.interval=1"], | ||
num_gpus=2, | ||
) | ||
|
||
|
||
@pytest.mark.extra_slow | ||
@requires_lm_eval | ||
@requires_cuda | ||
@pytest.mark.depends_on(on=["test_lm_eval_in_training_dp2[{model_testing_config}]"]) | ||
@pytest.mark.model_testing_group(ModelTestingGroup.distributed) | ||
def test_lm_eval_evaluation_dp2(run_test_script_for_all_models, run_test_script_base_path, model_path): | ||
bigximik marked this conversation as resolved.
Show resolved
Hide resolved
|
||
run_test_script_for_all_models( | ||
get_lm_eval_config(run_test_script_base_path / "test_lm_eval_evaluation_dp2", model_path), | ||
compare="test_lm_eval_in_training_dp2", | ||
prepare_fn=_prepare_resume_fn, | ||
do_compare=False, | ||
num_gpus=2, | ||
task="evaluate", | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.