Adds new metrics calculated #64

flaviabeo · 2025-06-18T13:32:14Z

Adds/updates metrics for the models; Shapes are:

max_new_tokens = [128]
batch_sizes = [1,2,4,8]
sequence_lengths = [64, 2048]
default_dtypes = ["fp16"]

Models:

mistralai/Mistral-7B-Instruct-v0.3

$ python3 get_thresholds.py --models mistralai/Mistral-7B-Instruct-v0.3 --metrics diff_mean ce --file_base /fms-generate-metrics/output
found 8 metric files
mistralai--Mistral-7B-Instruct-v0.3 diff_mean 0.0008768103783950205
found 8 metric files
mistralai--Mistral-7B-Instruct-v0.3 ce 2.846206340789795

micro: /fms-generate-metrics/granite-3.2-8b-layers-3-step-100000

$ python3 get_thresholds.py --models /fms-generate-metrics/granite-3.2-8b-layers-3-step-100000 --metrics diff_mean ce --file_base /fms-generate-metrics/output
found 8 metric files
--fms-generate-metrics--granite-3.2-8b-layers-3-step-100000 diff_mean 0.00018840670207282534
found 8 metric files
--fms-generate-metrics--granite-3.2-8b-layers-3-step-100000 ce 2.7449850964546205

meta-llama/Llama-3.1-8B-Instruct

$ python3 get_thresholds.py --models meta-llama/Llama-3.1-8B-Instruct --metrics diff_mean ce --file_base /fms-generate-metrics/output
found 8 metric files
meta-llama--Llama-3.1-8B-Instruct diff_mean 0.0004068055667448795
found 8 metric files
meta-llama--Llama-3.1-8B-Instruct ce 2.7080255031585696

ibm-granite/granite-20b-code-instruct-8k

$ python3 get_thresholds.py --models ibm-granite/granite-20b-code-instruct-8k --metrics diff_mean ce --file_base /fms-generate-metrics/output
found 8 metric files
ibm-granite--granite-20b-code-instruct-8k diff_mean 0.0003458251833217223
found 8 metric files
ibm-granite--granite-20b-code-instruct-8k ce 2.646075320243838

TODO: Llama 70B - need more GPUs to run it.

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

JRosenkranz · 2025-06-19T12:47:05Z

tests/models/test_decoders.py

    (LLAMA_3p1_70B_INSTRUCT, False): (
        2.841279556751251,
        0.0044301633024588115,
    ),
+    (MISTRAL_0p3_7B_INSTRUCT, False): (


Can this be added to #60 (rather than directly in this file)

Hi! Ok, so do I create a new file like the one in the PR #60 or the idea is to append all models to that same file?

JRosenkranz · 2025-06-19T12:47:27Z

tests/models/test_decoders.py

@@ -42,6 +42,7 @@
 GRANITE_3p3_8B_INSTRUCT = "ibm-granite/granite-3.3-8b-instruct"
 GRANITE_20B_CODE_INSTRUCT_8K = "ibm-granite/granite-20b-code-instruct-8k"
 LLAMA_3p1_70B_INSTRUCT = "meta-llama/Llama-3.1-70B-Instruct"
+MISTRAL_0p3_7B_INSTRUCT = "mistralai/Mistral-7B-Instruct-v0.3"


same comment as above, this should be added as part of the config here: #60

Adds new metrics calculated

9f02efc

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

JRosenkranz reviewed Jun 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds new metrics calculated #64

Adds new metrics calculated #64

Uh oh!

flaviabeo commented Jun 18, 2025

Uh oh!

JRosenkranz Jun 19, 2025

Uh oh!

flaviabeo Jun 20, 2025

Uh oh!

JRosenkranz Jun 19, 2025

Uh oh!

Uh oh!

Adds new metrics calculated #64

Are you sure you want to change the base?

Adds new metrics calculated #64

Uh oh!

Conversation

flaviabeo commented Jun 18, 2025

Uh oh!

JRosenkranz Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

flaviabeo Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

JRosenkranz Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!