Updated Albert model Card #37753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

souvikchand wants to merge 13 commits into huggingface:main from souvikchand:patch-albert

+69 −76

souvikchand commented Apr 24, 2025

What does this PR do?

Updates ALBERT model card as per #36979

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? [Community contributions] Model cards #36979
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@stevhliu Please check the PR and see if it's alright 😄


          Updated Albert model Card

0dfb72c

github-actions bot marked this pull request as draft

April 24, 2025 12:44

github-actions bot commented Apr 24, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

souvikchand mentioned this pull request

[Community contributions] Model cards #36979

Open

stevhliu reviewed

View reviewed changes

Member

stevhliu left a comment

Nice, thanks for your contribution!

docs/source/en/model_doc/albert.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/albert.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/albert.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/albert.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/albert.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/albert.md Outdated

    
                  print(tokenizer.decode(outputs.logits[0].argmax(-1)))

              ```

              > ALBERT is not compatible with `AttentionMaskVisualizer` as it uses masked self-attention rather than causal attention. So don't have `_update_causal_mask` method.

Member

stevhliu Apr 24, 2025

Suggested change

      
            > ALBERT is not compatible with `AttentionMaskVisualizer` as it uses masked self-attention rather than causal attention. So don't have `_update_causal_mask` method.

Author

souvikchand Apr 24, 2025

Since almost every model's model card will have an AttentionMaskVisualizer but ALBERT is not getting it, isn't it good to explain why it don't have any?

Member

stevhliu Apr 25, 2025

The AttentionMaskVisualizer has only been implemented for ~50/300 models so far so I think its okay to leave it out at the moment.

Author

souvikchand Apr 25, 2025

ok alright. thanks for your response.

docs/source/en/model_doc/albert.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/albert.md

Comment on lines +150 to +154

    
              - ALBERT supports a maximum sequence length of 512 tokens.

              - Cannot be used for autoregressive generation (unlike GPT)

              - ALBERT requires absolute positional embeddings, and it expects right-padding (i.e., pad tokens should be added at the end, not the beginning).

              - ALBERT uses token_type_ids, just like BERT. So you should indicate which token belongs to which segment (e.g., sentence A vs. sentence B) when doing tasks like question answering or sentence-pair classification.

              - ALBERT uses a different pretraining objective called Sentence Order Prediction (SOP) instead of Next Sentence Prediction (NSP), so fine-tuned models might behave slightly differently from BERT when modeling inter-sentence relationships.

Member

stevhliu Apr 24, 2025

Suggested change

      
            - ALBERT supports a maximum sequence length of 512 tokens.
          
            - Cannot be used for autoregressive generation (unlike GPT)
          
            - ALBERT requires absolute positional embeddings, and it expects right-padding (i.e., pad tokens should be added at the end, not the beginning).
          
            - ALBERT uses token_type_ids, just like BERT. So you should indicate which token belongs to which segment (e.g., sentence A vs. sentence B) when doing tasks like question answering or sentence-pair classification.
          
            - ALBERT uses a different pretraining objective called Sentence Order Prediction (SOP) instead of Next Sentence Prediction (NSP), so fine-tuned models might behave slightly differently from BERT when modeling inter-sentence relationships.

Author

souvikchand Apr 26, 2025

could you please briefly explain why it's better not to mention internal differences like SOP or token_type_ids in the model card? I want to make sure I follow the best style when contributing next time.

docs/source/en/model_doc/albert.md Show resolved Hide resolved

docs/source/en/model_doc/albert.md

    
              <hfoption id="AutoModel">

              ```py

              import torch

Member

stevhliu Apr 24, 2025

This can also be simplified a bit without the comments and include SDPA:

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2")
model = AutoModelForMaskedLM.from_pretrained(
    "albert/albert-base-v2",
    torch_dtype=torch.float16,
    attn_implementation="sdpa",
    device_map="auto"
)

prompt = "Plants create energy through a process known as [MASK]."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
    predictions = outputs.logits[0, mask_token_index]

top_k = torch.topk(predictions, k=5).indices.tolist()
for token_id in top_k[0]:
    print(f"Prediction: {tokenizer.decode([token_id])}")

Author

souvikchand Apr 26, 2025

it is done

Author

souvikchand Apr 26, 2025

instead of .to("cuda") i wrote .to(model.device) . I hope it's okay

souvikchand and others added 12 commits

April 25, 2025 01:21


          Update docs/source/en/model_doc/albert.md

1f2a809

added the quotes in <hfoption id="Pipeline">

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/albert.md

be83fe9

updated checkpoints

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/albert.md

077ac58

changed !Tips description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/albert.md

dd024e3

updated text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/albert.md

21e3c7a

updated transformer-cli implementation

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/albert.md

69baa29

changed text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update docs/source/en/model_doc/albert.md

7ba1110

removed repeated description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update albert.md

155b733

removed lines


          Update albert.md

b1420c5

updated pipeline code


          Update albert.md

df6376e

updated auto model code, removed quantization as model size is not large, removed the attention visualizer part


          Update docs/source/en/model_doc/albert.md

updated notes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>


          Update albert.md

d36e638

reduced a  repeating point in notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet