Skip to content

Updated Albert model Card #37753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

souvikchand
Copy link

What does this PR do?

Updates ALBERT model card as per #36979

Before submitting

Who can review?

@stevhliu Please check the PR and see if it's alright 😄

@github-actions github-actions bot marked this pull request as draft April 24, 2025 12:44
Copy link

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for your contribution!

print(tokenizer.decode(outputs.logits[0].argmax(-1)))
```

> ALBERT is not compatible with `AttentionMaskVisualizer` as it uses masked self-attention rather than causal attention. So don't have `_update_causal_mask` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> ALBERT is not compatible with `AttentionMaskVisualizer` as it uses masked self-attention rather than causal attention. So don't have `_update_causal_mask` method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since almost every model's model card will have an AttentionMaskVisualizer but ALBERT is not getting it, isn't it good to explain why it don't have any?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AttentionMaskVisualizer has only been implemented for ~50/300 models so far so I think its okay to leave it out at the moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok alright. thanks for your response.

Comment on lines +150 to +154
- ALBERT supports a maximum sequence length of 512 tokens.
- Cannot be used for autoregressive generation (unlike GPT)
- ALBERT requires absolute positional embeddings, and it expects right-padding (i.e., pad tokens should be added at the end, not the beginning).
- ALBERT uses token_type_ids, just like BERT. So you should indicate which token belongs to which segment (e.g., sentence A vs. sentence B) when doing tasks like question answering or sentence-pair classification.
- ALBERT uses a different pretraining objective called Sentence Order Prediction (SOP) instead of Next Sentence Prediction (NSP), so fine-tuned models might behave slightly differently from BERT when modeling inter-sentence relationships.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ALBERT supports a maximum sequence length of 512 tokens.
- Cannot be used for autoregressive generation (unlike GPT)
- ALBERT requires absolute positional embeddings, and it expects right-padding (i.e., pad tokens should be added at the end, not the beginning).
- ALBERT uses token_type_ids, just like BERT. So you should indicate which token belongs to which segment (e.g., sentence A vs. sentence B) when doing tasks like question answering or sentence-pair classification.
- ALBERT uses a different pretraining objective called Sentence Order Prediction (SOP) instead of Next Sentence Prediction (NSP), so fine-tuned models might behave slightly differently from BERT when modeling inter-sentence relationships.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please briefly explain why it's better not to mention internal differences like SOP or token_type_ids in the model card? I want to make sure I follow the best style when contributing next time.

<hfoption id="AutoModel">

```py
import torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be simplified a bit without the comments and include SDPA:

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
model = AutoModelForMaskedLM.from_pretrained(
    "albert/albert-base-v2",
    torch_dtype=torch.float16,
    attn_implementation="sdpa",
    device_map="auto"
)

prompt = "Plants create energy through a process known as [MASK]."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
    predictions = outputs.logits[0, mask_token_index]

top_k = torch.topk(predictions, k=5).indices.tolist()
for token_id in top_k[0]:
    print(f"Prediction: {tokenizer.decode([token_id])}")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of .to("cuda") i wrote .to(model.device) . I hope it's okay

souvikchand and others added 12 commits April 25, 2025 01:21
added the quotes in <hfoption id="Pipeline">

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
updated checkpoints

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
changed !Tips description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
updated text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
updated transformer-cli implementation

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
changed text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
removed repeated description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
removed lines
updated pipeline code
updated auto model code, removed quantization as model size is not large, removed the attention visualizer part
updated notes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
reduced a  repeating point in notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants