New Model: Ava #37770

Kuduxaaa · 2025-04-24T19:41:45Z

🚀 Add AvaModel: A Scalable & Efficient LLM Architecture

Overview

This PR introduces AvaModel, a new transformer-based language model architecture designed for high performance across different scales (100M to 100B parameters).

Architecture Highlights

AvaForCausalLM(
  (model): AvaModel(
    (embed_tokens): Embedding(32000, 1280)
    (layers): ModuleList(
      (0-11): 12 x AvaDecoderLayer(
        (self_attn): AvaAttention(
          (q_proj): Linear(in_features=1280, out_features=1280, bias=False)
          (k_proj): Linear(in_features=1280, out_features=640, bias=False)
          (v_proj): Linear(in_features=1280, out_features=640, bias=False)
          (o_proj): Linear(in_features=1280, out_features=1280, bias=False)
          (dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): AvaMLP(
          (gate_proj): Linear(in_features=1280, out_features=5120, bias=False)
          (up_proj): Linear(in_features=1280, out_features=5120, bias=False)
          (down_proj): Linear(in_features=5120, out_features=1280, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): AvaRMSNorm()
        (post_attention_layernorm): AvaRMSNorm()
      )
    )
    (norm): AvaRMSNorm()
    (rotary_emb): AvaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=1280, out_features=32000, bias=False)
)

More info: Kuduxaaa/ava-llm

github-actions · 2025-04-24T19:42:00Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-04-25T12:08:07Z

Hi @Kuduxaaa, we generally don't add architectures to transformers until a significant pre-trained model exists using that architecture! The architecture looks good and we appreciate the educational goal, but we probably won't be able to merge this PR. I'm sorry!

Kuduxaaa · 2025-04-25T12:31:07Z

Thanks for the clarification! Totally understandable. I appreciate the feedback and the kind words about the architecture. Once a pre-trained model is ready and properly evaluated, I’ll revisit this with a new PR 🥳🧠

AvaForCausalLM

a6218ff

github-actions bot marked this pull request as draft April 24, 2025 19:41

Fixed 'Blank line contains whitespace'

069bc44

Kuduxaaa marked this pull request as ready for review April 24, 2025 20:00

github-actions bot requested review from ArthurZucker and Rocketknight1 April 24, 2025 20:00

Kuduxaaa closed this by deleting the head repository Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Model: Ava #37770

New Model: Ava #37770

Uh oh!

Kuduxaaa commented Apr 24, 2025

Uh oh!

github-actions bot commented Apr 24, 2025

Uh oh!

Rocketknight1 commented Apr 25, 2025

Uh oh!

Kuduxaaa commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New Model: Ava #37770

New Model: Ava #37770

Uh oh!

Conversation

Kuduxaaa commented Apr 24, 2025

🚀 Add AvaModel: A Scalable & Efficient LLM Architecture

Overview

Architecture Highlights

Uh oh!

github-actions bot commented Apr 24, 2025

Uh oh!

Rocketknight1 commented Apr 25, 2025

Uh oh!

Kuduxaaa commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants