Incorrect example explanation

The description of the unigram tokenization unigram  in the article seems to be incorrect? see [this](https://huggingface.co/learn/llm-course/chapter6/7#tokenization-algorithm)

> Here are the frequencies of all the possible subwords in the vocabulary:
> `("h", 15) ("u", 36) ("g", 20) ("hu", 15) ("ug", 20) ("p", 17) ("pu", 17) ("n", 16)
("un", 16) ("b", 4) ("bu", 4) ("s", 5) ("hug", 15) ("gs", 5) ("ugs", 5)`

> The tokenization probability of ["p", "u", "g"] for "pug" is 5/210 * 36/210 * 20/210

Shouldn't it be 37/210 * 36/210 * 20/210? I'm also a beginner so I'm not sure if he is correct...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect example explanation #1057

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect example explanation #1057

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions