Memory Management and how graph and KV_caches work. #15422

J4e6eR · 2025-08-19T14:32:26Z

J4e6eR
Aug 19, 2025

I have been playing around with KV caches lately, but struggling to get a grip of how the kv_caches work in this new architecture.
According to my current knowledge, the kv_caches are stored using class llama_kv_cells_unified which holds vectors for pos, shift and seq.

Out of these terms, I am quite sure that pos holds the position of the cell where the value is being stored, I doubt what the real purpose shift and seq. However, shift might be used to apply shift, but what sort of shift? seq should be sequence but can you please elaborate?
The KV caches are stored while processing the ubatch inside process_ubatch function.

Please correct me if I am wrong at any point above.
Even after knowing this much, I am not confident about how this works internally. Additionally, how are the cells linked to the graph built from LLM architecture, are they even linked?

What type of data is stored in KV_caches, is it of type uint32_t? And how the memory (llama_memory_i) is managed such that I could save and load the state easily?
The state itself holds output ids, embeddings, and logits, out of which I suspect logits being the most important one.

I would appreciate if someone takes time to give in-depth technical knowledge about the internal working.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Management and how graph and KV_caches work. #15422

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Memory Management and how graph and KV_caches work. #15422

Uh oh!

Uh oh!

J4e6eR Aug 19, 2025

Replies: 0 comments

J4e6eR
Aug 19, 2025