-
Notifications
You must be signed in to change notification settings - Fork 28.8k
Updated Albert model Card #37753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Updated Albert model Card #37753
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for your contribution!
docs/source/en/model_doc/albert.md
Outdated
print(tokenizer.decode(outputs.logits[0].argmax(-1))) | ||
``` | ||
|
||
> ALBERT is not compatible with `AttentionMaskVisualizer` as it uses masked self-attention rather than causal attention. So don't have `_update_causal_mask` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> ALBERT is not compatible with `AttentionMaskVisualizer` as it uses masked self-attention rather than causal attention. So don't have `_update_causal_mask` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since almost every model's model card will have an AttentionMaskVisualizer
but ALBERT is not getting it, isn't it good to explain why it don't have any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AttentionMaskVisualizer
has only been implemented for ~50/300 models so far so I think its okay to leave it out at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok alright. thanks for your response.
- ALBERT supports a maximum sequence length of 512 tokens. | ||
- Cannot be used for autoregressive generation (unlike GPT) | ||
- ALBERT requires absolute positional embeddings, and it expects right-padding (i.e., pad tokens should be added at the end, not the beginning). | ||
- ALBERT uses token_type_ids, just like BERT. So you should indicate which token belongs to which segment (e.g., sentence A vs. sentence B) when doing tasks like question answering or sentence-pair classification. | ||
- ALBERT uses a different pretraining objective called Sentence Order Prediction (SOP) instead of Next Sentence Prediction (NSP), so fine-tuned models might behave slightly differently from BERT when modeling inter-sentence relationships. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ALBERT supports a maximum sequence length of 512 tokens. | |
- Cannot be used for autoregressive generation (unlike GPT) | |
- ALBERT requires absolute positional embeddings, and it expects right-padding (i.e., pad tokens should be added at the end, not the beginning). | |
- ALBERT uses token_type_ids, just like BERT. So you should indicate which token belongs to which segment (e.g., sentence A vs. sentence B) when doing tasks like question answering or sentence-pair classification. | |
- ALBERT uses a different pretraining objective called Sentence Order Prediction (SOP) instead of Next Sentence Prediction (NSP), so fine-tuned models might behave slightly differently from BERT when modeling inter-sentence relationships. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please briefly explain why it's better not to mention internal differences like SOP or token_type_ids
in the model card? I want to make sure I follow the best style when contributing next time.
<hfoption id="AutoModel"> | ||
|
||
```py | ||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be simplified a bit without the comments and include SDPA:
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
model = AutoModelForMaskedLM.from_pretrained(
"albert/albert-base-v2",
torch_dtype=torch.float16,
attn_implementation="sdpa",
device_map="auto"
)
prompt = "Plants create energy through a process known as [MASK]."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
predictions = outputs.logits[0, mask_token_index]
top_k = torch.topk(predictions, k=5).indices.tolist()
for token_id in top_k[0]:
print(f"Prediction: {tokenizer.decode([token_id])}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of .to("cuda")
i wrote .to(model.device)
. I hope it's okay
added the quotes in <hfoption id="Pipeline"> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
updated checkpoints Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
changed !Tips description Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
updated text Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
updated transformer-cli implementation Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
changed text Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
removed repeated description Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
removed lines
updated pipeline code
updated auto model code, removed quantization as model size is not large, removed the attention visualizer part
updated notes Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
reduced a repeating point in notes
What does this PR do?
Updates
ALBERT
model card as per #36979Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@stevhliu Please check the PR and see if it's alright 😄