Skip to content

Conversation

ani300
Copy link
Contributor

@ani300 ani300 commented Jul 11, 2025

No description provided.

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
@@ -140,8 +140,8 @@ def generate(
from fms_mo.aiu_addons.fp8.fp8_utils import ScaledTensor
kwargs["past_key_value_states"] = [
(
ScaledTensor(torch.zeros(NUM_BLOCKS, BLOCK_SIZE, kvheads, head_size, dtype=torch.float8_e4m3fn), torch.tensor(1.0), False),
ScaledTensor(torch.zeros(NUM_BLOCKS, BLOCK_SIZE, kvheads, head_size, dtype=torch.float8_e4m3fn), torch.tensor(1.0), False),
ScaledTensor(torch.zeros(NUM_BLOCKS, BLOCK_SIZE, kvheads, head_size, dtype=torch.float8_e4m3fn), torch.tensor([1.0] * input_ids.shape[0], dtype=torch.float32), False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the dtype torch.float32 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP8 scales are stored in fp32 always

@JRosenkranz
Copy link
Contributor

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=8 SEQUENCE_LENGTH=64 USE_TINY_MODEL=1

@JRosenkranz
Copy link
Contributor

bot:test
TEST_FILE=test_decoders.py MODEL_ID=ibm-granite/granite-3.3-8b-instruct BATCH_SIZE=8 SEQUENCE_LENGTH=64 USE_TINY_MODEL=0

Copy link
Contributor

@JRosenkranz JRosenkranz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@JRosenkranz JRosenkranz merged commit f43cc04 into main Jul 18, 2025
1 of 2 checks passed
@tharapalanivel tharapalanivel deleted the fp8_sequence branch August 12, 2025 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants