Skip to content

New Model: Ava #37770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

New Model: Ava #37770

wants to merge 2 commits into from

Conversation

Kuduxaaa
Copy link

🚀 Add AvaModel: A Scalable & Efficient LLM Architecture

Overview

This PR introduces AvaModel, a new transformer-based language model architecture designed for high performance across different scales (100M to 100B parameters).

Architecture Highlights

AvaForCausalLM(
  (model): AvaModel(
    (embed_tokens): Embedding(32000, 1280)
    (layers): ModuleList(
      (0-11): 12 x AvaDecoderLayer(
        (self_attn): AvaAttention(
          (q_proj): Linear(in_features=1280, out_features=1280, bias=False)
          (k_proj): Linear(in_features=1280, out_features=640, bias=False)
          (v_proj): Linear(in_features=1280, out_features=640, bias=False)
          (o_proj): Linear(in_features=1280, out_features=1280, bias=False)
          (dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): AvaMLP(
          (gate_proj): Linear(in_features=1280, out_features=5120, bias=False)
          (up_proj): Linear(in_features=1280, out_features=5120, bias=False)
          (down_proj): Linear(in_features=5120, out_features=1280, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): AvaRMSNorm()
        (post_attention_layernorm): AvaRMSNorm()
      )
    )
    (norm): AvaRMSNorm()
    (rotary_emb): AvaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=1280, out_features=32000, bias=False)
)

More info: Kuduxaaa/ava-llm

@github-actions github-actions bot marked this pull request as draft April 24, 2025 19:41
Copy link

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@Kuduxaaa Kuduxaaa marked this pull request as ready for review April 24, 2025 20:00
@Rocketknight1
Copy link
Member

Hi @Kuduxaaa, we generally don't add architectures to transformers until a significant pre-trained model exists using that architecture! The architecture looks good and we appreciate the educational goal, but we probably won't be able to merge this PR. I'm sorry!

@Kuduxaaa
Copy link
Author

Thanks for the clarification! Totally understandable. I appreciate the feedback and the kind words about the architecture. Once a pre-trained model is ready and properly evaluated, I’ll revisit this with a new PR 🥳🧠

@Kuduxaaa Kuduxaaa closed this by deleting the head repository Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants