You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,8 @@ Update: The choice of the norm or gating (still need to ablate to figure out whi
14
14
15
15
Update: Nevermind, MLP attention seems to be working, but about the same as dot product attention.
16
16
17
+
Update: By using the negative of the euclidean distance for dot product of higher types in dot product attention, I now see results that are far better than before as well as MLP attention. My conclusion is that the choice of norm and gating is contributing way more to the results in the paper than MLP attention
18
+
17
19
<ahref="https://wandb.ai/lucidrains/equiformer/reports/equiformer-and-mlp-attention---VmlldzozMDQwMTY3?accessToken=xmj0a1c80m8hehylrmbr0hndka8kk1vxmdrmvtmy7r1qgphtnuhq1643cb76zgfo">Running experiment, denoising residue positions in protein sequence</a>
single_headed_kv=False, # whether to do single headed key/values for dot product attention, to save on memory and compute
690
691
ff_include_htype_norms=False, # whether for type0 projection to also involve norms of all higher types, in feedforward first projection. this allows for all higher types to be gated by other type norms
691
-
dot_product_attention=True,
692
-
dot_product_attention_use_cdist_sim=True,
692
+
dot_product_attention=True, # turn to False to use MLP attention as proposed in paper, but dot product attention with -cdist similarity is still far better, and i haven't even rotated distances (rotary embeddings) into the type 0 features yet
0 commit comments