Support for Plamo2ForCausalLM architecture #13874
Replies: 4 comments
-
+1 |
Beta Was this translation helpful? Give feedback.
-
plamo-2-translate is an extremely good model for this task (way better than Gemma 3 or Granite), and it's damn small too. |
Beta Was this translation helpful? Give feedback.
-
I'm currently working on this implementation here: #13930, but so far the output is completely incorrect. I'm trying to fix it, but lacking experience with llama.cpp development, I don't know how to examine intermediate outputs to identify where the issue might be occurring. Could someone please offer some advice? |
Beta Was this translation helpful? Give feedback.
-
To investigate the intermediate values, I implemented a callback in llm_build_plamo2 to construct the graph, then printed some values within the decode function of llama-context.cpp. This allowed me to determine exactly where the implementation is different from expected behavior. Currently, while token ID conversion to embeddings is correct, I discovered that the output already differs at the point immediately before the first Mamba layer after applying RMS norm. I'll continue to debug this issue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The Plamo 2 Translate model was released recently, and it claims to have a new architecture (Plamo2ForCausalLM). It'd be neat to see support for this model added to lcpp. Not sure what more info is needed to open this as an issue and get it moving.
Beta Was this translation helpful? Give feedback.
All reactions