You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been playing around with KV caches lately, but struggling to get a grip of how the kv_caches work in this new architecture.
According to my current knowledge, the kv_caches are stored using class llama_kv_cells_unified which holds vectors for pos, shift and seq.
Out of these terms, I am quite sure that pos holds the position of the cell where the value is being stored, I doubt what the real purpose shift and seq. However, shift might be used to apply shift, but what sort of shift? seq should be sequence but can you please elaborate?
The KV caches are stored while processing the ubatch inside process_ubatch function.
Please correct me if I am wrong at any point above.
Even after knowing this much, I am not confident about how this works internally. Additionally, how are the cells linked to the graph built from LLM architecture, are they even linked?
What type of data is stored in KV_caches, is it of type uint32_t? And how the memory (llama_memory_i) is managed such that I could save and load the state easily?
The state itself holds output ids, embeddings, and logits, out of which I suspect logits being the most important one.
I would appreciate if someone takes time to give in-depth technical knowledge about the internal working.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have been playing around with KV caches lately, but struggling to get a grip of how the kv_caches work in this new architecture.
According to my current knowledge, the kv_caches are stored using
class llama_kv_cells_unified
which holds vectors forpos
,shift
andseq
.Out of these terms, I am quite sure that
pos
holds the position of the cell where the value is being stored, I doubt what the real purposeshift
andseq
. However,shift
might be used to apply shift, but what sort of shift?seq
should be sequence but can you please elaborate?The KV caches are stored while processing the ubatch inside
process_ubatch
function.Please correct me if I am wrong at any point above.
Even after knowing this much, I am not confident about how this works internally. Additionally, how are the cells linked to the graph built from LLM architecture, are they even linked?
What type of data is stored in KV_caches, is it of type
uint32_t
? And how the memory (llama_memory_i
) is managed such that I could save and load the state easily?The state itself holds output ids, embeddings, and logits, out of which I suspect logits being the most important one.
I would appreciate if someone takes time to give in-depth technical knowledge about the internal working.
Beta Was this translation helpful? Give feedback.
All reactions