Add SmolLM3 #422

joelpaulkoch · 2025-10-04T09:24:44Z

Hey, this is the SmolLM3 model from huggingface. It's smol, fully open and supports reasoning, so I figured it would be a nice addition to bumblebee.

I didn't implement YaRN extrapolation.

jonatanklosko

Hey @joelpaulkoch, this looks great! I dropped a few small comments and it's good to go :)

jonatanklosko · 2025-10-06T10:11:10Z

lib/bumblebee/text/smollm3.ex

+          question_answering_mapping = %{
+            "output_norm" => "transformer.norm",
+            "embedder.token_embedding" => "transformer.embed_tokens",
+            "decoder.blocks.0.output_norm" => "transformer.layers.0.post_attention_layernorm",
+            "decoder.blocks.0.self_attention.key" => "transformer.layers.0.self_attn.k_proj",
+            "decoder.blocks.0.self_attention.query" => "transformer.layers.0.self_attn.q_proj",
+            "decoder.blocks.0.self_attention.value" => "transformer.layers.0.self_attn.v_proj",
+            "decoder.blocks.0.self_attention_norm" => "transformer.layers.0.input_layernorm",
+            "decoder.blocks.0.self_attention.output" => "transformer.layers.0.self_attn.o_proj",
+            "decoder.blocks.0.ffn.output" => "transformer.layers.0.mlp.down_proj",
+            "decoder.blocks.0.ffn.intermediate" => "transformer.layers.0.mlp.up_proj",
+            "decoder.blocks.0.ffn.gate" => "transformer.layers.0.mlp.gate_proj"
+          }
+
+          Map.merge(mapping, question_answering_mapping)


The use a different prefix for all layers, so we can probably just do this:

Suggested change

question_answering_mapping = %{

"output_norm" => "transformer.norm",

"embedder.token_embedding" => "transformer.embed_tokens",

"decoder.blocks.0.output_norm" => "transformer.layers.0.post_attention_layernorm",

"decoder.blocks.0.self_attention.key" => "transformer.layers.0.self_attn.k_proj",

"decoder.blocks.0.self_attention.query" => "transformer.layers.0.self_attn.q_proj",

"decoder.blocks.0.self_attention.value" => "transformer.layers.0.self_attn.v_proj",

"decoder.blocks.0.self_attention_norm" => "transformer.layers.0.input_layernorm",

"decoder.blocks.0.self_attention.output" => "transformer.layers.0.self_attn.o_proj",

"decoder.blocks.0.ffn.output" => "transformer.layers.0.mlp.down_proj",

"decoder.blocks.0.ffn.intermediate" => "transformer.layers.0.mlp.up_proj",

"decoder.blocks.0.ffn.gate" => "transformer.layers.0.mlp.gate_proj"

}

Map.merge(mapping, question_answering_mapping)

for {key, value} <- mapping, into: %{} do

{key, String.replace_leading(value, "model.", "transformer.")}

end

jonatanklosko · 2025-10-06T10:11:36Z

test/bumblebee/text/smollm3_test.exs

+      Nx.tensor([
+        [[-0.4167, -0.0137, 0.7160], [-0.2624, -1.1185, -0.3098], [-0.0383, -0.8390, -0.0039]]
+      ])


Just double-checking, these values come from Python, right?

yes, coming from Python :) although the repo config is so tiny, it's not even hitting the no rope layer case.

As a sidenote, I think next time I'll try to set up a simple validation script with pythonx so that it can be reused for contributing model implementations.

jonatanklosko · 2025-10-06T11:08:56Z

lib/bumblebee/text/smollm3.ex

+        For more details see https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases
+        """
+      ],
+      no_rope_layers: [


This naming is very confusing, initially I thought it means not-RoPE, but 1 (true) actually enables RoPE. So I guess it rather means No- and Ro-PE.

One alternative configuration I can think of would be :rotary_embedding_enabled, with a list of booleans true/false (and if omitted, defaults to true). We can easily convert the representation when loading the config. What do you think?

On a sidenote, we generally use "block" wherever they say "layer" (because it is a group of whole layers).

I agree the naming is very confusing, took it directly from huggingface to see what you are going to suggest, sorry. Also, very confusing that they have no_rope_layers and no_rope_layer_interval.

:rotary_embedding_enabled sounds good to me 👍

jonatanklosko · 2025-10-06T11:17:52Z

lib/bumblebee/text/pre_trained_tokenizer.ex

+    smollm3: %{
+      special_tokens: %{
+        eos: "<|im_end|>",
+        pad: "<|im_end|>"
+      }
+    },


We should also add a simple tokenizer test in https://github.com/elixir-nx/bumblebee/blob/main/test/bumblebee/text/pre_trained_tokenizer_test.exs.

joelpaulkoch · 2025-10-08T14:49:59Z

The implementation is basically llama + NoPE support (in the transformer block) + architectures that are supported but missing in llama (i.e. :for_question_answering and :for_token_classification). So, would you prefer to add the optional NoPE support and architectures in the llama implementation and map smollm3 to llama?

joelpaulkoch added 4 commits September 1, 2025 19:55

add smollm3 scaffold from llama

bca0b48

implement smollm3

bc5e0e1

get nope layers config from config

2177e11

don't implement yarn

177238f

jonatanklosko reviewed Oct 6, 2025

View reviewed changes

jonatanklosko mentioned this pull request Oct 6, 2025

Add Qwen3 model support #423

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SmolLM3 #422

Add SmolLM3 #422

Uh oh!

joelpaulkoch commented Oct 4, 2025

Uh oh!

jonatanklosko left a comment

Uh oh!

jonatanklosko Oct 6, 2025

Uh oh!

jonatanklosko Oct 6, 2025

Uh oh!

joelpaulkoch Oct 8, 2025

Uh oh!

jonatanklosko Oct 6, 2025

Uh oh!

joelpaulkoch Oct 8, 2025

Uh oh!

jonatanklosko Oct 6, 2025

Uh oh!

joelpaulkoch commented Oct 8, 2025

Uh oh!

Uh oh!

Add SmolLM3 #422

Are you sure you want to change the base?

Add SmolLM3 #422

Uh oh!

Conversation

joelpaulkoch commented Oct 4, 2025

Uh oh!

jonatanklosko left a comment

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

joelpaulkoch Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

joelpaulkoch Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

joelpaulkoch commented Oct 8, 2025

Uh oh!

Uh oh!