Transformer bridge layer norm folding #1071

bryce13950 · 2025-09-27T17:37:15Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests

* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests * cleaned up test * started working through individual matches * added test coverage * tested function a bit * integrated weight conversion into weight proccessing * simplified functions * identified individual problem lines * identified divergences more clearly * brought back error lines

…for already initialized components (#1066)

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests * fixed typing issue * fixed typing and format issues * fixed ci issues * ran format * fixed mypy issues * removed extra file * removed old scripts * tested format * fixed some tests * ran format * fixed tests * fixed acceptance tests * fixed some more tests * synced functionality completely * reduced old references * removed remaining references * moved forward functions * removed forward * tested various forwards * worked on getting original forwards back into place * added more coverage * cleaned up model * git status * Fix automatic weight extraction to use reference HookedTransformer This restores the working weight extraction mechanism that creates a reference HookedTransformer internally and extracts exact processed weights for perfect compatibility with ablation studies. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * moved embed stuff from bridge * moved MLP stuff * claned up a bit * cleaned up a bit * removed extra block * created pos embed bridge * fixed unembed --------- Co-authored-by: Claude <noreply@anthropic.com>

* moved final layer norm * moved layer norm forward * cleaned up more things * updated attention weight loading * fixed function names

* fixed some ci issues * fixed type issues * ran format * fixed test * fixed type issues * fixed type issue * fixed type issue * fixed test * fixed test * fixed issues * ran format * fixed typing * fixed tests * fixed tests * simplified test * sped up tests * added check for kv cache * ran format * skipped some tests * marked a couple tests to skip * ran some more optimizations * ran poetry lock * regenerated lock * fixed commands * set random seed * updated parallelism prop * updated command * reverted some changes * updated notebook settings * updated verbosity * removed extra test * cleaned up tests some more * marked test as skipped * fixed more tests * sped up CI * reverted CI changes * reverted actions changes * improved cache * sped up some tests * optimzed more tests * sped up some more tests * made more speed improvements * fixed error * fixed typing

* cleaned up some debug points * fixed attention hooks * enabled hooks in test

* split out some tasks into their own jobs * removed bad file * updated name

* fixed batch dimension * removed log point * fixed potential error * sped up load * ran format * improved hf cache handling * fixed bridge * fixed cache again * added more checks * removed parallel execution

* fixed cache hooks * fixed test and typing * fixed test

* fixed bias displaying * fixed ablation issue * fixed type issue

* setup new hooks properly * fixed type checks

* fixed alias hook props * ran format

* made all hooks show properly * ran format * fixed type checks

* updated loading in main demo to use transformers bridge * updated model name * updated imports * updated some cells * reran demo * updated some cells * reran some cells * reran demo * ran demo again * finished generating new cells

* Update README.md (#957) Update link to Streamlit tutorial and guide. Co-authored-by: Bryce Meyer <bryce13950@gmail.com> * improve model properties table in docs (#769) * add static to gitignore * making a meaningless change to see if tests pass at all * making a meaningless change to see if tests pass at all * add interactive table static html only adding things one at a time to see what causes things to break * run poetry update with no changes to deps * revert lockfile change * add tiktoken >=0.7.0 to group docs * add dep muutils >=0.6.15 to group docs * add improved interactive table generation we still generate a plain markdown table code is from the old PR: https://github.com/mivanit/TransformerLens/blob/add-better-model-properties-table/docs/make_docs.py which is in turn a modified version of https://github.com/mivanit/transformerlens-model-table * fix format -- missing trailing newline * fix type hints for compatibility * fix torch device meta in make docs script, also improved hot reload * TEMPORARY: allow_except when getting models to deal with mixtral HF_TOKEN issue * added simple test for get_model_info * context manager for controlling device, tests were breaking due to default device meta * formatted with wrong version of black, oops * fix path to generated model_properties_table * fix md table header, add title in yaml frontmatter * add line to frontmatter yaml, re-run tests bc huggingface down? * do not allow exceptions when getting models * re-run poetry lock * attempt fix lockfile * re-run poetry lock --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> * switch pyproject from toml to uv, generate lockfile also update tiktoken dep for 3.13 compatibility * update makefile to use uv * update actions * hack to get version to work * wip * make dep * update contributing.md to reflect switch from poetry to uv * add type hints to supported_models * fix paths in make_docs.py * docs group not in default, update install instructions for docs * POETRY_PYPI_TOKEN_PYPI -> PYPI_TOKEN_PYPI * make format * fix default groups, re-add docs * add some deps needed in notebooks * removed use of torchtyping in othello_GPT.ipynb and deps - torchtyping causes various issues if it's imported - presumably jaxtyping should be used instead?? - othello GPT notebook doesn't actually use the imported TT - shouldn't a linter/formatter catch this sort of unused import? * fix: add pythonpath "." to pytest config for test imports Configure pytest to include project root in Python path, enabling `from tests.foo import bar` style imports, which were broken by switching to uv * attempt jupyter issue fix * issue ref explaining ipython version restriction * updated ci commands after recent work * fixed more setup items * added tabulate dependency * updated make docs command * updated dependencies * fixed docs --------- Co-authored-by: jmole <jmoeller@gmail.com> Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* setup tests for hooks * ran format * merged legacy hooks tests * ran format * enabled compatibility mode * added remaining hooks * fixed type issue * added main demo cached output * removed debug items * reran notebook * marked cell for skipping * reran notebook * regenerated demo * regenerated notebook

* updated loading in arena content demo to use transformer bridge * updated install reference * removed extra params * ran some cells * updated arena notebook --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>

* regeneerated with new hooks * ran first cell

* added test coverage for ensuring compatibility * ran format * fixed unit tests * resolved type issue * added init files * added init file * fixed tokaize function * fixed attention mask issues * reverted invalid change to test

This reverts commit 674f3a4.

* created test that asserts hook shapes for various models * created initial doc for explaining transformer bridge model structure * ran format * cleaned up docs and enabled test * ran format * imporeved hook test * ran format * did some memory cleanup * added more hook shape coverAGE * made more optimizations * fixed type checking

bryce13950 and others added 8 commits September 21, 2025 17:57

Add missing configuration parameters

6a00384

Add missing configuration parameters

7e691b3

Properly set up normalization_type and layer_norm_folding attributes …

8b434d7

…for already initialized components (#1066)

Process accuracy (#1067)

c0255e8

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files

bryce13950 changed the title ~~Dev 3.x folding~~ Transformer bridge layer norm folding Sep 27, 2025

bryce13950 and others added 21 commits September 29, 2025 22:33

Revision extra forwards (#1073)

ddcb4f5

* moved final layer norm * moved layer norm forward * cleaned up more things * updated attention weight loading * fixed function names

Merge branch 'dev-3.x' into dev-3.x-folding

fece8c9

Attention hooks full coverage for folding (#1078)

caf4c3d

* cleaned up some debug points * fixed attention hooks * enabled hooks in test

Ci job splitting (#1079)

ed23558

* split out some tasks into their own jobs * removed bad file * updated name

fixed batch dimension (#1082)

ebea4f4

* fixed batch dimension * removed log point * fixed potential error * sped up load * ran format * improved hf cache handling * fixed bridge * fixed cache again * added more checks * removed parallel execution

fixed cache hooks (#1083)

445f747

* fixed cache hooks * fixed test and typing * fixed test

fixed bias displaying (#1084)

f60e8d3

* fixed bias displaying * fixed ablation issue * fixed type issue

fixed return type none (#1085)

251b5ab

Create pass through for hooks in compatibility mode (#1086)

f053201

* setup new hooks properly * fixed type checks

fixed alias hook props (#1087)

817be64

* fixed alias hook props * ran format

made all hooks show properly (#1088)

b6477a0

* made all hooks show properly * ran format * fixed type checks

addded full kv cache (#1089)

92585df

fixed first two tests

674f3a4

regeneerated with new hooks (#1091)

8b259a2

* regeneerated with new hooks * ran first cell

added test coverage for ensuring compatibility (#989)

d6934cb

* added test coverage for ensuring compatibility * ran format * fixed unit tests * resolved type issue * added init files * added init file * fixed tokaize function * fixed attention mask issues * reverted invalid change to test

bryce13950 added 2 commits October 17, 2025 11:23

Revert "fixed first two tests"

3e64a48

This reverts commit 674f3a4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformer bridge layer norm folding #1071

Transformer bridge layer norm folding #1071

Uh oh!

bryce13950 commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Transformer bridge layer norm folding #1071

Are you sure you want to change the base?

Transformer bridge layer norm folding #1071

Uh oh!

Conversation

bryce13950 commented Sep 27, 2025

Description

Type of change

Screenshots

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants