This repository guides you through the process of building a GPT-style Large Language Model (LLM) from scratch using PyTorch. The structure and approach are inspired by the book Build a Large Language Model (From Scratch) by Sebastian Raschka.
- Title: Build a Large Language Model (From Scratch)
- Author: Sebastian Raschka
- Publisher: Manning Publications
- Link: manning.com/books/build-a-large-language-model-from-scratch
- Free Version: Github Gist
- Download PDF: PDF Version
- Resourse: Check the resource from where I learned.
- Set up the environment and tools.
- Learn about tokenization, embeddings, and the idea of a "language model".
- Encode input/output sequences and build basic forward models.
- Understand unidirectional processing and causal language modeling.
- Explore Transformer components: attention, multi-head attention, and positional encoding.
- Implement residual connections, normalization, and feedforward layers.
- Build a GPT-style decoder-only transformer architecture.
- Load and preprocess datasets like TinyShakespeare.
- Implement batch creation, context windows, and training routines.
- Use cross-entropy loss, optimizers, and learning rate schedulers.
- Monitor perplexity and improve generalization.
- Generate text using greedy, top-k, top-p, and temperature sampling.
- Evaluate and tune generation.
- Export and convert model for Hugging Face compatibility.
- Deploy via Hugging Face Hub and Gradio Space.
- Python 3.8+
- PyTorch
- NumPy
- Matplotlib
- JupyterLab or Notebooks
- Hugging Face libraries:
transformers
,datasets
,huggingface_hub
gradio
for deployment
git clone https://github.com/codewithdark-git/Building-LLMs-from-scratch.git
cd Building-LLMs-from-scratch
pip install -r requirements.txt
Building-LLMs-from-scratch/
├── notebooks/ # Weekly learning notebooks
├── models/ # Model architectures & checkpoints
├── data/ # Preprocessing and datasets
├── hf_deploy/ # Hugging Face config & deployment scripts
├── theoretical/ # Podcast & theoretical discussions
├── utils/ # Helper scripts
├── requirements.txt
└── README.md
This project includes:
- Scripts to convert the model for 🤗 Transformers compatibility
- Uploading to Hugging Face Hub
- Launching an interactive demo on Hugging Face Spaces using Gradio
You’ll find detailed instructions inside the hf_deploy/
folder.
MIT License — see the LICENSE
file for details.