Skip to content

Loss is NAN, stopping training during quantizing Mistral-7B-v0.1 #3

@lybbill

Description

@lybbill

I am attempting to reproduce the results presented in the paper by quantizing Mistral-7B-v0.1 to 2 bits using the default parameters on the Wikitext2-raw-v1 dataset. However, I encountered an issue where the loss becomes. Would you help me?
Image
My environment is as follows:
Package Version Editable project location


absl-py 2.3.1
accelerate 1.8.1
aiohappyeyeballs 2.6.1
aiohttp 3.12.13
aiosignal 1.4.0
annotated-types 0.7.0
antlr4-python3-runtime 4.9.3
anyio 4.9.0
apiq 0.1.0
async-timeout 5.0.1
attributedict 0.3.0
attrs 25.3.0
blessings 1.7
cachetools 6.1.0
certifi 2025.6.15
chardet 5.2.0
charset-normalizer 3.4.2
click 8.2.1
codecov 2.1.13
colorama 0.4.6
coloredlogs 15.0.1
colour-runner 0.1.1
coverage 7.9.2
DataProperty 1.1.0
datasets 3.6.0
deepdiff 8.5.0
dill 0.3.8
distlib 0.3.9
distro 1.9.0
einops 0.8.1
evaluate 0.4.4
exceptiongroup 1.3.0
filelock 3.18.0
frozenlist 1.7.0
fsspec 2025.3.0
h11 0.16.0
hf-xet 1.1.5
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.33.2
humanfriendly 10.0
idna 3.10
inspecta 0.1.3
Jinja2 3.1.6
jiter 0.10.0
joblib 1.5.1
jsonlines 4.0.0
MarkupSafe 3.0.2
mbstrdecoder 1.1.4
mpmath 1.3.0
multidict 6.6.3
multiprocess 0.70.16
networkx 3.4.2
nltk 3.9.1
numexpr 2.11.0
numpy 2.2.6
nvidia-cublas-cu12 12.6.4.1
nvidia-cuda-cupti-cu12 12.6.80
nvidia-cuda-nvrtc-cu12 12.6.77
nvidia-cuda-runtime-cu12 12.6.77
nvidia-cudnn-cu12 9.5.1.17
nvidia-cufft-cu12 11.3.0.4
nvidia-cufile-cu12 1.11.1.6
nvidia-curand-cu12 10.3.7.77
nvidia-cusolver-cu12 11.7.1.2
nvidia-cusparse-cu12 12.5.4.2
nvidia-cusparselt-cu12 0.6.3
nvidia-nccl-cu12 2.26.2
nvidia-nvjitlink-cu12 12.6.85
nvidia-nvtx-cu12 12.6.77
omegaconf 2.3.0
openai 1.93.0
orderly-set 5.4.1
packaging 25.0
pandas 2.3.0
pathvalidate 3.3.1
peft 0.16.0
pillow 11.3.0
pip 25.1.1
platformdirs 4.3.8
pluggy 1.6.0
portalocker 3.2.0
propcache 0.3.2
protobuf 6.31.1
psutil 7.0.0
pyarrow 20.0.0
pybind11 2.13.6
pycountry 24.6.1
pydantic 2.11.7
pydantic_core 2.33.2
Pygments 2.19.2
pyproject-api 1.9.1
pytablewriter 1.2.1
python-dateutil 2.9.0.post0
pytz 2025.2
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.4
rootpath 0.1.1
rouge_score 0.1.2
sacrebleu 1.5.0
safetensors 0.5.3
scikit-learn 1.7.0
scipy 1.15.3
sentencepiece 0.2.0
setuptools 75.1.0
six 1.17.0
sniffio 1.3.1
sqlitedict 2.1.0
sympy 1.14.0
tabledata 1.3.4
tcolorpy 0.1.7
termcolor 3.1.0
texttable 1.7.0
threadpoolctl 3.6.0
tokenizers 0.20.3
toml 0.10.2
tomli 2.2.1
torch 2.7.1
torchvision 0.22.1
tox 4.27.0
tqdm 4.67.1
tqdm-multiprocess 0.0.11
transformers 4.45.1
triton 3.3.1
typepy 1.3.4
typing_extensions 4.14.0
typing-inspection 0.4.1
tzdata 2025.2
urllib3 2.5.0
virtualenv 20.31.2
wheel 0.44.0
xxhash 3.5.0
yarl 1.20.1
zstandard 0.23.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions