feat: Implement NanoGPT model and training utilities (closes #3) #19

carrycooldude · 2025-03-31T14:35:06Z

Added all the new NanoGPT implementation files
Created a commit with a descriptive message that references issue [Feature request]: Implement NanoGPT and required layers #3
Pushed the changes to the remote repository on the feature/nanogpt branch
The implementation includes:
Core GPT model with causal self-attention
Training utilities with JAX/Flax
Data processing and tokenization
Configuration management
Example script for training

gemini-code-assist

Hello @carrycooldude, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a NanoGPT model implemented using JAX and Flax. It includes the core GPT model with causal self-attention, training utilities, data processing and tokenization, configuration management, and an example training script. The implementation is designed to be a simple and understandable implementation of the popular NanoGPT architecture.

Highlights

Core GPT Model: Implements the GPT model with causal self-attention using JAX and Flax. The model includes token and position embeddings, transformer blocks, and a language modeling head.
Training Utilities: Provides training utilities such as a TrainState, functions for creating the training state, performing training and evaluation steps, and creating a learning rate schedule.
Data Processing: Includes data processing utilities for loading and preparing the Shakespeare dataset, creating a character-level tokenizer, and generating training and validation batches.
Configuration Management: Introduces a configuration system using dataclasses to manage model and training parameters.
Example Training Script: Provides an example script demonstrating how to use the NanoGPT implementation to train a model on the Shakespeare dataset.

Changelog

Click here to see the changelog

examples/nanogpt_example.py
- Added an example script demonstrating NanoGPT usage, including data loading, model initialization, training loop, and evaluation.
- Implements the main training loop, including learning rate scheduling, batch sampling, loss estimation, and model saving.
jax_layers/nanogpt/init.py
- Added imports for all the new NanoGPT implementation files, including config, data, model, and train modules.
- Defined the all list to expose the public API of the nanogpt package.
jax_layers/nanogpt/config.py
- Defined dataclasses for GPTConfig, TrainConfig, and Config to manage model and training parameters.
- These configurations include parameters such as vocab_size, block_size, n_layer, learning_rate, and batch_size.
jax_layers/nanogpt/data.py
- Implemented data processing utilities, including a Tokenizer class for character-level tokenization.
- Added functions for downloading and loading the Shakespeare dataset, and creating training and validation datasets.
jax_layers/nanogpt/model.py
- Implemented the core GPT model, including CausalSelfAttention and Block modules.
- The GPT model includes token and position embeddings, transformer blocks, and a language modeling head.
jax_layers/nanogpt/train.py
- Implemented training utilities, including a TrainState, and functions for creating the training state.
- Added functions for performing training and evaluation steps, and creating a learning rate schedule.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Trivia time!

What is the origin of the name 'JAX'?

Click here for the answer

JAX is Autograd and XLA (Accelerated Linear Algebra) brought together for high-performance numerical computing.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a NanoGPT model implementation using JAX and Flax, along with training utilities and data processing. The code appears well-structured and follows common practices for JAX/Flax projects. However, there are a few areas that could be improved for clarity, efficiency, and maintainability.

Summary of Findings

Random Seed Management: The example script initializes the random seed using jax.random.PRNGKey(config.train.seed). It's crucial to ensure that this seed is properly used throughout the training process to guarantee reproducibility. Consider using jax.random.split to generate different PRNGKeys for different parts of the training loop (e.g., data shuffling, model initialization, dropout).
Data Shuffling: The create_dataset function shuffles the data using np.random.permutation. While this works, it might be beneficial to explore JAX's random number generation capabilities for shuffling, especially when scaling to larger datasets or distributed training scenarios. This ensures consistency and avoids potential issues with NumPy's random state.
Evaluation RNG: In eval_step, a fixed RNG key jax.random.PRNGKey(0) is used for dropout. While this ensures deterministic evaluation, it might be more appropriate to disable dropout entirely during evaluation by setting deterministic=True in the dropout layer.

Merge Readiness

The pull request introduces a significant feature and appears to be well-structured. However, addressing the comments regarding random seed management, data shuffling, and evaluation RNG would improve the code's robustness and maintainability. I am unable to directly approve the pull request, and other reviewers should review and approve this code before merging. I recommend addressing the identified issues before merging.

examples/nanogpt_example.py

jax_layers/nanogpt/data.py

jax_layers/nanogpt/train.py

jax_layers/nanogpt/model.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

carrycooldude · 2025-03-31T16:52:19Z

Made changes that is reviewed by Gemini assist @monatis

gemini-code-assist · 2025-03-31T16:52:24Z

Thanks for the update, @carrycooldude! It's great to see the NanoGPT implementation integrated. I've reviewed the changes and provided feedback in the review comments.

carrycooldude · 2025-03-31T17:28:18Z

Thanks for the update, @carrycooldude! It's great to see the NanoGPT implementation integrated. I've reviewed the changes and provided feedback in the review comments.

That I have done ✅

monatis

After raising that issue, the scope of this project changed a little bit --we're now aiming at a "model garden"-like library, where SotA models are implemented with unique value proposals of JAX in mind. There are some models selected for the first development phase --please checkout the existing issues. Thus, the priority of a simpler NanoGPT implementation is lower now. With that said, it might be a good addition if it makes of the newer NNX API, integrates the existing attention implementations, documents the training workflow in detail etc. In this case, it can serve as a good benchmarking code while implementing optimizations in the future. So I left some comments.

jax_layers/nanogpt/model.py

jax_layers/nanogpt/train.py

pyproject.toml

monatis · 2025-03-31T20:27:32Z

Linting and type checking are also failing.

feat: Implement NanoGPT model and training utilities (closes ml-gde#3)

7f78424

gemini-code-assist bot reviewed Mar 31, 2025

View reviewed changes

examples/nanogpt_example.py Show resolved Hide resolved

jax_layers/nanogpt/data.py Show resolved Hide resolved

jax_layers/nanogpt/train.py Show resolved Hide resolved

jax_layers/nanogpt/model.py Outdated Show resolved Hide resolved

jax_layers/nanogpt/model.py Outdated Show resolved Hide resolved

carrycooldude and others added 3 commits March 31, 2025 20:20

Update jax_layers/nanogpt/train.py

70e239d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

chore: update dependencies for NanoGPT implementation

f0fe67d

fix: update model implementation

f8722af

monatis requested changes Mar 31, 2025

View reviewed changes

jax_layers/nanogpt/model.py Show resolved Hide resolved

jax_layers/nanogpt/model.py Outdated Show resolved Hide resolved

jax_layers/nanogpt/train.py Show resolved Hide resolved

pyproject.toml Show resolved Hide resolved

made changes that is proposed and solved the linting issue

62aabf6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement NanoGPT model and training utilities (closes #3) #19

feat: Implement NanoGPT model and training utilities (closes #3) #19

Uh oh!

carrycooldude commented Mar 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carrycooldude commented Mar 31, 2025

Uh oh!

gemini-code-assist bot commented Mar 31, 2025

Uh oh!

carrycooldude commented Mar 31, 2025

Uh oh!

monatis left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monatis commented Mar 31, 2025

Uh oh!

Uh oh!

feat: Implement NanoGPT model and training utilities (closes #3) #19

Are you sure you want to change the base?

feat: Implement NanoGPT model and training utilities (closes #3) #19

Uh oh!

Conversation

carrycooldude commented Mar 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carrycooldude commented Mar 31, 2025

Uh oh!

gemini-code-assist bot commented Mar 31, 2025

Uh oh!

carrycooldude commented Mar 31, 2025

Uh oh!

monatis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monatis commented Mar 31, 2025

Uh oh!

Uh oh!