Model produces identical embeddings for different, closely related sequences

Hello Amazon Science Team,

First, thank you for your work on LC-PLM and for sharing it with the community.  I've found a bug related to the model's sensitivity.

When processing two protein sequences that are very similar (differing by only one or a few amino acids), the model LcPlmForMaskedLM produces bit-for-bit identical embeddings. However, for two very different sequences, it does produce slightly different embeddings. 

<img width="770" height="482" alt="Image" src="https://github.com/user-attachments/assets/3f5c6da3-03ca-43fd-bdf2-421872782751" />

Furthermore, upon loading the model, the following warning was displayed. This indicates that a significant portion of the model's weights (bimamba.backbone layers) were not found in the checkpoint and were initialized randomly. This likely explains the observed lack of sensitivity:

Some weights of LcPlmForMaskedLM were not initialized from the model checkpoint at ./LC-PLM and are newly initialized: ['bimamba.backbone.layers.0.mixer.mamba_rev.in_proj.weight', 'bimamba.backbone.layers.0.mixer.mamba_rev.out_proj.weight', 'bimamba.backbone.layers.1.mixer.mamba_rev.in_proj.weight', 'bimamba.backbone.layers.1.mixer.mamba_rev.out_proj.weight', 
....................................
'bimamba.backbone.layers.47.mixer.mamba_rev.in_proj.weight', 'bimamba.backbone.layers.47.mixer.mamba_rev.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

**My Environment**
PyTorch Version: 2.4.1+cu118
Transformers Version: 4.56.2
PyTorch CUDA Version: 11.8v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model produces identical embeddings for different, closely related sequences #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model produces identical embeddings for different, closely related sequences #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions