Skip to content

ashrithssreddy/llm-from-scratch

Repository files navigation

LLM from Scratch

A project to build and train a Large Language Model (LLM) from scratch, implementing core components and training procedures to understand how modern language models work.

🎯 Objective

The final goal is to train a complete LLM from scratch, scaling to whatever size your hardware allows. This project focuses on understanding the fundamentals of transformer architectures, tokenization, training loops, and model optimization.

This project is a learning exercise to understand LLMs at a fundamental level. The implementation will prioritize clarity and educational value over optimization.

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended for training)
  • Sufficient RAM/VRAM for your target model size

🛠️ Planned Components (not finalized)

  • Tokenizer implementation (BPE/WordPiece)
  • Transformer architecture (attention, feed-forward, layer norm)
  • Positional encoding
  • Training loop with gradient accumulation
  • Data loading and preprocessing pipeline
  • Model checkpointing and resuming
  • Inference engine
  • Model quantization (for deployment)

📚 Resources


About

Building GPTs from the ground up. A hands-on journey through attention mechanisms, tokenization, and training loops.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published