-
Notifications
You must be signed in to change notification settings - Fork 457
Transformer bridge layer norm folding #1071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bryce13950
wants to merge
31
commits into
dev-3.x
Choose a base branch
from
dev-3.x-folding
base: dev-3.x
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests
* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests * cleaned up test * started working through individual matches * added test coverage * tested function a bit * integrated weight conversion into weight proccessing * simplified functions * identified individual problem lines * identified divergences more clearly * brought back error lines
* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests * cleaned up test * started working through individual matches * added test coverage * tested function a bit * integrated weight conversion into weight proccessing * simplified functions * identified individual problem lines * identified divergences more clearly * brought back error lines
…for already initialized components (#1066)
* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files
* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests
* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests * fixed typing issue * fixed typing and format issues * fixed ci issues * ran format * fixed mypy issues * removed extra file * removed old scripts * tested format * fixed some tests * ran format * fixed tests * fixed acceptance tests * fixed some more tests * synced functionality completely * reduced old references * removed remaining references * moved forward functions * removed forward * tested various forwards * worked on getting original forwards back into place * added more coverage * cleaned up model * git status * Fix automatic weight extraction to use reference HookedTransformer This restores the working weight extraction mechanism that creates a reference HookedTransformer internally and extracts exact processed weights for perfect compatibility with ablation studies. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * moved embed stuff from bridge * moved MLP stuff * claned up a bit * cleaned up a bit * removed extra block * created pos embed bridge * fixed unembed --------- Co-authored-by: Claude <noreply@anthropic.com>
* moved final layer norm * moved layer norm forward * cleaned up more things * updated attention weight loading * fixed function names
* fixed some ci issues * fixed type issues * ran format * fixed test * fixed type issues * fixed type issue * fixed type issue * fixed test * fixed test * fixed issues * ran format * fixed typing * fixed tests * fixed tests * simplified test * sped up tests * added check for kv cache * ran format * skipped some tests * marked a couple tests to skip * ran some more optimizations * ran poetry lock * regenerated lock * fixed commands * set random seed * updated parallelism prop * updated command * reverted some changes * updated notebook settings * updated verbosity * removed extra test * cleaned up tests some more * marked test as skipped * fixed more tests * sped up CI * reverted CI changes * reverted actions changes * improved cache * sped up some tests * optimzed more tests * sped up some more tests * made more speed improvements * fixed error * fixed typing
* cleaned up some debug points * fixed attention hooks * enabled hooks in test
* split out some tasks into their own jobs * removed bad file * updated name
* fixed batch dimension * removed log point * fixed potential error * sped up load * ran format * improved hf cache handling * fixed bridge * fixed cache again * added more checks * removed parallel execution
* fixed cache hooks * fixed test and typing * fixed test
* fixed bias displaying * fixed ablation issue * fixed type issue
* setup new hooks properly * fixed type checks
* fixed alias hook props * ran format
* made all hooks show properly * ran format * fixed type checks
* updated loading in main demo to use transformers bridge * updated model name * updated imports * updated some cells * reran demo * updated some cells * reran some cells * reran demo * ran demo again * finished generating new cells
* Update README.md (#957) Update link to Streamlit tutorial and guide. Co-authored-by: Bryce Meyer <bryce13950@gmail.com> * improve model properties table in docs (#769) * add static to gitignore * making a meaningless change to see if tests pass at all * making a meaningless change to see if tests pass at all * add interactive table static html only adding things one at a time to see what causes things to break * run poetry update with no changes to deps * revert lockfile change * add tiktoken >=0.7.0 to group docs * add dep muutils >=0.6.15 to group docs * add improved interactive table generation we still generate a plain markdown table code is from the old PR: https://github.com/mivanit/TransformerLens/blob/add-better-model-properties-table/docs/make_docs.py which is in turn a modified version of https://github.com/mivanit/transformerlens-model-table * fix format -- missing trailing newline * fix type hints for compatibility * fix torch device meta in make docs script, also improved hot reload * TEMPORARY: allow_except when getting models to deal with mixtral HF_TOKEN issue * added simple test for get_model_info * context manager for controlling device, tests were breaking due to default device meta * formatted with wrong version of black, oops * fix path to generated model_properties_table * fix md table header, add title in yaml frontmatter * add line to frontmatter yaml, re-run tests bc huggingface down? * do not allow exceptions when getting models * re-run poetry lock * attempt fix lockfile * re-run poetry lock --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com> * switch pyproject from toml to uv, generate lockfile also update tiktoken dep for 3.13 compatibility * update makefile to use uv * update actions * hack to get version to work * wip * make dep * update contributing.md to reflect switch from poetry to uv * add type hints to supported_models * fix paths in make_docs.py * docs group not in default, update install instructions for docs * POETRY_PYPI_TOKEN_PYPI -> PYPI_TOKEN_PYPI * make format * fix default groups, re-add docs * add some deps needed in notebooks * removed use of torchtyping in othello_GPT.ipynb and deps - torchtyping causes various issues if it's imported - presumably jaxtyping should be used instead?? - othello GPT notebook doesn't actually use the imported TT - shouldn't a linter/formatter catch this sort of unused import? * fix: add pythonpath "." to pytest config for test imports Configure pytest to include project root in Python path, enabling `from tests.foo import bar` style imports, which were broken by switching to uv * attempt jupyter issue fix * issue ref explaining ipython version restriction * updated ci commands after recent work * fixed more setup items * added tabulate dependency * updated make docs command * updated dependencies * fixed docs --------- Co-authored-by: jmole <jmoeller@gmail.com> Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* setup tests for hooks * ran format * merged legacy hooks tests * ran format * enabled compatibility mode * added remaining hooks * fixed type issue * added main demo cached output * removed debug items * reran notebook * marked cell for skipping * reran notebook * regenerated demo * regenerated notebook
* updated loading in arena content demo to use transformer bridge * updated install reference * removed extra params * ran some cells * updated arena notebook --------- Co-authored-by: Bryce Meyer <bryce13950@gmail.com>
* regeneerated with new hooks * ran first cell
* added test coverage for ensuring compatibility * ran format * fixed unit tests * resolved type issue * added init files * added init file * fixed tokaize function * fixed attention mask issues * reverted invalid change to test
This reverts commit 674f3a4.
* created test that asserts hook shapes for various models * created initial doc for explaining transformer bridge model structure * ran format * cleaned up docs and enabled test * ran format * imporeved hook test * ran format * did some memory cleanup * added more hook shape coverAGE * made more optimizations * fixed type checking
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist: