Generate metrics from model's layers #63

flaviabeo · 2025-06-16T19:16:37Z

This adds the piece for generating the metrics by layer. We can leverage the get_thresholds.py with some modifications, to later use the mean diff in the pytests.

The idea is to run, the prompts through the model with the pre- and post-hooks added, and then get the metrics for the outputs intercepted by each layer, as in this diagram. Then we can have a baseline with CPU/GPU for a failure threshold in AIU tests. Same idea as the test_decoders.py, but for each layer. This way we can measure the discrepancies for the outputs and use the thresholds for detailed debugging problems in AIU.

How to run

Generate the csv metrics:

cd aiu-fms-testing-utils/tests/resources

mkdir sharegpt

cd sharegpt

wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

mkdir /tmp/output

python3 aiu-fms-testing-utils/scripts/generate_layers_metrics.py --mode generate

files should get created at /tmp/output dir:

ibm-granite--granite-3.2-8b-instruct_max-new-tokens-128_batch-size-1_seq-length-0_dtype-float16--model.base_model.layers7.ln.abs_diff.csv
ibm-granite--granite-3.2-8b-instruct_max-new-tokens-128_batch-size-1_seq-length-0_dtype-float16--model.base_model.layers7.ln.cos_sim.csv
ibm-granite--granite-3.2-8b-instruct_max-new-tokens-128_batch-size-1_seq-length-0_dtype-float16--model.base_model.layers8.attn.dense.abs_diff.csv
ibm-granite--granite-3.2-8b-instruct_max-new-tokens-128_batch-size-1_seq-length-0_dtype-float16--model.base_model.layers8.attn.dense.cos_sim.csv

Get the thresholds (WIP):

cd /aiu-fms-testing-utils/tests/resources

python3 get_thresholds.py --models ibm-granite/granite-3.2-8b-instruct --metrics abs_diff cos_sim --file_base /tmp/output --layer_io

should print the metric of each layer:

Layer model.base_model.layers25.attn.in_proj.query avg abs_diff = 2.079996666484281
Layer model.base_model.layers25.attn.in_proj.key avg abs_diff = 1.2256532914682756
Layer model.base_model.layers25.attn.in_proj.value avg abs_diff = 0.8446561344670284
Layer model.base_model.layers25.attn.in_proj avg abs_diff = 0.0
Layer model.base_model.layers25.attn.dense avg abs_diff = 0.23142293885894077
Layer model.base_model.layers25.ff_ln avg abs_diff = 0.9550253005897409
Layer model.base_model.layers25.ff_sub_layer.wg avg abs_diff = 1.2256491705546648
Layer model.base_model.layers25.ff_sub_layer.a avg abs_diff = 0.5235781749861929
Layer model.base_model.layers25.ff_sub_layer.w1 avg abs_diff = 1.2707070667436549
Layer model.base_model.layers25.ff_sub_layer.w2 avg abs_diff = 0.5201997339672954
Layer model.base_model.layers25.ff_sub_layer avg abs_diff = 0.5201997339672954
Layer model.base_model.layers26.ln avg abs_diff = 0.04852477119171675
[...]
Layer model.base_model.layers39.attn.in_proj.query avg cos_sim = 0.999176025390625
Layer model.base_model.layers39.attn.in_proj.key avg cos_sim = 0.9991455078125
Layer model.base_model.layers39.attn.in_proj.value avg cos_sim = 0.9986572265625
Layer model.base_model.layers39.attn.in_proj avg cos_sim = 0.0
Layer model.base_model.layers39.attn.dense avg cos_sim = 0.9987258911132812

Also, a JSON file is saved to the same output dir. A sample file can be found at: sample_layer_th.json

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

kaoutar55 · 2025-06-30T16:09:06Z