-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
When I train with KTO, the KL value quickly drops to 0, is this normal?
{'loss': 0.4173, 'grad_norm': 1.4672807732482507, 'learning_rate': 4.765488274413721e-06, 'rewards/chosen': 1.19
4046974182129, 'logps/chosen': -18.560531616210938, 'rewards/rejected': 0.43546485900878906, 'logps/rejected': -
29.158364868164064, 'rewards/margins': 0.7585821151733398, 'kl': 0.10797347873449326, 'logits/chosen': -15973750
4.0, 'logits/rejected': -125256448.0, 'epoch': 0.08}
{'loss': 0.4038, 'grad_norm': 15.43012523262249, 'learning_rate': 4.7611130556527825e-06, 'rewards/chosen': 1.25
86393356323242, 'logps/chosen': -25.3940673828125, 'rewards/rejected': 0.3017548084259033, 'logps/rejected': -41
.6545654296875, 'rewards/margins': 0.9568845272064209, 'kl': 0.0654844269156456, 'logits/chosen': -185916384.0,
'logits/rejected': -143640992.0, 'epoch': 0.08}
{'loss': 0.4329, 'grad_norm': 3.9429698141756444, 'learning_rate': 4.7567378368918445e-06, 'rewards/chosen': 1.1
291874647140503, 'logps/chosen': -30.11488151550293, 'rewards/rejected': 0.19891568024953207, 'logps/rejected':
-38.57758585611979, 'rewards/margins': 0.9302717844645182, 'kl': 0.0, 'logits/chosen': -149832224.0, 'logits/rej
ected': -177144672.0, 'epoch': 0.08}
{'loss': 0.347, 'grad_norm': 2.2398680774090054, 'learning_rate': 4.7523626181309066e-06, 'rewards/chosen': 1.24
91761666757089, 'logps/chosen': -24.273625126591437, 'rewards/rejected': 0.33067967341496396, 'logps/rejected':
-24.57554978590745, 'rewards/margins': 0.9184964932607449, 'kl': 0.0, 'logits/chosen': -204615376.0, 'logits/rej
ected': -111233384.0, 'epoch': 0.08}
Metadata
Metadata
Assignees
Labels
No labels