Skip to content

When generating embeddings, I get multiple embedding vectors for a single string. #14957

Answered by blargg
blargg asked this question in Q&A
Discussion options

You must be logged in to vote

Alright, I found a bit more. This is affected by the --pooling flag. Like this, it will just generate a vector for every token.

Using --pooling cls generates a single vector for classification. If I understand correctly, it picks a representative token, which doesn't sound right to me. I expect that the embedding model creates an embedding over the whole string, not just looking at a specific token to generate the embedding.

Maybe I misunderstand what is going on though.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@ggerganov
Comment options

@blargg
Comment options

@iamlemec
Comment options

iamlemec Aug 1, 2025
Collaborator

Answer selected by blargg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants