Skip to content

Conversation

@Kuduxaaa
Copy link

🚀 Add AvaModel: A Scalable & Efficient LLM Architecture

Overview

This PR introduces AvaModel, a new transformer-based language model architecture designed for high performance across different scales (100M to 100B parameters).

Architecture Highlights

AvaForCausalLM(
  (model): AvaModel(
    (embed_tokens): Embedding(32000, 1280)
    (layers): ModuleList(
      (0-11): 12 x AvaDecoderLayer(
        (self_attn): AvaAttention(
          (q_proj): Linear(in_features=1280, out_features=1280, bias=False)
          (k_proj): Linear(in_features=1280, out_features=640, bias=False)
          (v_proj): Linear(in_features=1280, out_features=640, bias=False)
          (o_proj): Linear(in_features=1280, out_features=1280, bias=False)
          (dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): AvaMLP(
          (gate_proj): Linear(in_features=1280, out_features=5120, bias=False)
          (up_proj): Linear(in_features=1280, out_features=5120, bias=False)
          (down_proj): Linear(in_features=5120, out_features=1280, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): AvaRMSNorm()
        (post_attention_layernorm): AvaRMSNorm()
      )
    )
    (norm): AvaRMSNorm()
    (rotary_emb): AvaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=1280, out_features=32000, bias=False)
)

More info: Kuduxaaa/ava-llm

@github-actions github-actions bot marked this pull request as draft April 24, 2025 19:41
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@Kuduxaaa Kuduxaaa marked this pull request as ready for review April 24, 2025 20:00
@Rocketknight1
Copy link
Member

Hi @Kuduxaaa, we generally don't add architectures to transformers until a significant pre-trained model exists using that architecture! The architecture looks good and we appreciate the educational goal, but we probably won't be able to merge this PR. I'm sorry!

@Kuduxaaa
Copy link
Author

Thanks for the clarification! Totally understandable. I appreciate the feedback and the kind words about the architecture. Once a pre-trained model is ready and properly evaluated, I’ll revisit this with a new PR 🥳🧠

@Kuduxaaa Kuduxaaa closed this by deleting the head repository Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants