Skip to content

Can't load tokenizer #384

@dannychantszfong

Description

@dannychantszfong

I am attempting to load the Meta-LLaMA 3.3 70B Instruct model locally using the Hugging Face transformers library. While I have downloaded the required files, I am encountering an issue when trying to load the tokenizer.

My code:

from transformers import LlamaTokenizer, AutoModelForCausalLM
import torch

model_path = "G:\AI_models\models--meta-llama--Llama-3.3-70B-Instruct"

Load the tokenizer and model
tokenizer = LlamaTokenizer.from_pretrained(model_path, local_files_only=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
local_files_only=True,
torch_dtype=torch.float16,
device_map="auto"
)

Test the model
input_text = "Explain the benefits of artificial intelligence."
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)

output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output_text)

Error:

Traceback (most recent call last):
File "g:\Coding\AI\huggingface\llama3.3_70b\v1.py", line 7, in
tokenizer = LlamaTokenizer.from_pretrained(model_path, local_files_only=True)
File "C:\Users\ctz20\anaconda3\envs\rl_trading_bot\lib\site-packages\transformers\tokenization_utils_base.py", line 2020, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'G:\AI_models\models--meta-llama--Llama-3.3-70B-Instruct'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'G:\AI_models\models--meta-llama--Llama-3.3-70B-Instruct' is the correct path to a
directory containing all relevant files for a LlamaTokenizer tokenizer.

Any Additional Context:
OS: Windows 10
Python: 3.9 (Conda environment)
Model Source: Hugging Face (Meta-LLaMA 3.3 70B Instruct)
transformers version: Latest (as of 2025-01-27)

What I Need Help With:
Why does the LlamaTokenizer fail to load the tokenizer files from the specified directory, despite all required files being present?
Is there a specific step I’m missing in configuring the tokenizer for this model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions