-
Notifications
You must be signed in to change notification settings - Fork 256
Updated the Technical Note for WY of DPLR #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe README for Generalized Delta Rule replaces an external-note reference in "Efficient Chunkwise Implementation" with a self-contained derivation of WY representations for DPLR (with diagonal D_t), including definitions, base cases, induction steps, and final WY expressions. No code, API, or runtime changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @phnazari, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request updates a technical note related to the WY representation of products of DPLR matrices. The update addresses minor theoretical analysis mistakes, such as index-mismatches and mix-ups in matrix multiplications, within the note itself. It's important to note that these corrections are specific to the theoretical document and do not imply any issues with the existing implementation, which is believed to be correct.
Highlights
- Documentation Update: The primary change involves updating the link to the technical note in the
README.md
file, pointing to a revised version that corrects theoretical analysis errors.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the link to a technical note. The change itself is correct. My review includes a suggestion to improve the long-term maintainability of the project's documentation by hosting it within the repository itself, rather than relying on an external Google Drive link which may become inaccessible in the future.
@@ -34,4 +34,4 @@ Here, $\mathbf{I}$ is replaced by a diagonal matrix $\mathbf{D}_t$. This transit | |||
|
|||
## Efficient Chunkwise Implementation | |||
|
|||
For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1rJbO3dU4fe7OKG3w7Yg058z_BNIuavNF/view?usp=sharing). | |||
For detailed information about efficient chunkwise implementation, please refer to our [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hosting technical documentation on Google Drive can be fragile for a public project. Links can break, access permissions might change, or the file could be deleted, making it inaccessible to future users and contributors.
For better long-term stability and to keep documentation versioned alongside the code, consider committing the technical note directly into the repository, for instance, within a docs/
directory.
@phnazari Thank you for your contribution. Would you mind including the original author’s name and a link to the source in your PDF? Please also describe any additional contributions you have made. If you could provide the original Markdown file or any supplementary derivations, I would be immensely grateful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
fla/ops/generalized_delta_rule/README.md (1)
1-1
: Remove trailing whitespace in documentation to fix CIThere is a trailing space on line 74 of
fla/ops/generalized_delta_rule/README.md
causing the pre-commit hook to fail:
- fla/ops/generalized_delta_rule/README.md:74
- where we used $\mathbf \Gamma_{t+1}^t = \mathbf I$ in the last step. + where we used $\mathbf \Gamma_{t+1}^t = \mathbf I$ in the last step.After applying this change, run:
pre-commit install pre-commit run --all-files
to update your commit and restore a passing CI build.
♻️ Duplicate comments (1)
fla/ops/generalized_delta_rule/README.md (1)
35-39
: Self-contained derivation is a big improvement; consider fully migrating away from external Drive dependencies.This change already mitigates the previous fragility concern. To fully close the loop, commit any original note/derivations to
docs/
so everything is versioned with the code.
🧹 Nitpick comments (6)
fla/ops/generalized_delta_rule/README.md (6)
63-64
: Use consistent boldface for matrices in the base case.D_1 should be bold to match notation elsewhere.
-We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$. +We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = \mathbf D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$.
43-44
: Correct matrix dimension notation.Use d×d rather than d, d.
-for vectors $\mathbf a_t, \mathbf b_t, \mathbf v_t, \mathbf k_t \in \mathbb R^d$ and matrices $\mathbf D_t \in \mathbb R^{d, d}$. +for vectors $\mathbf a_t, \mathbf b_t, \mathbf v_t, \mathbf k_t \in \mathbb R^d$ and matrices $\mathbf D_t \in \mathbb R^{d \times d}$.
50-53
: Avoid nesting equation environments inside display math.Using \begin{equation*} inside
$$…$$ is redundant and can break rendering. Keep the $$ block only.-\begin{equation*} \mathbf P_t = \mathbf \Gamma_1^t + \left( \sum_{i=1}^t \mathbf w_i \mathbf b_i^\top \mathbf \Gamma_{i+1}^{t} \right) -\end{equation*}
36-36
: Style nit: “re-do” → “redo”.Minor wording cleanup.
-The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we re-do the computations. +The original [technical note](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing) on chunking DPLR contains minor mathematical inconsistencies. Below, we redo the computations.
36-46
: Add attribution and references per maintainer request.Maintainer requested: include original author’s name and source, list your additional contributions, and provide original Markdown/supplementary derivations. Suggest adding a short section at the end of this subsection.
Proposed insertion after Line 99:
+### References and Attribution + +- Original technical note: <ADD AUTHOR NAME(S)>, “<ADD TITLE>,” <ADD YEAR>. Link: https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view +- Additional contributions in this PR (by <ADD YOUR NAME>): corrected index mismatches, fixed matrix multiplication order, and provided a self-contained WY derivation for DPLR with diagonal $\mathbf D_t$. +- Supplementary materials: please include the original Markdown or a derivation appendix under `docs/` (e.g., `docs/dplr_wy_derivation.md`) for versioned, in-repo access.If you provide the author details and preferred filenames, I can prepare a follow-up commit.
66-73
: Optional: standardize math environments throughout.You mix fenced
math blocks (earlier) and $$…$$ (here), sometimes wrapping LaTeX environments (align*) inside $$…$$. Standardize to a single approach to improve rendering across viewers (e.g., use fenced
math consistently and avoid nesting environments inside $$).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
fla/ops/generalized_delta_rule/README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
fla/ops/generalized_delta_rule/README.md
[grammar] ~39-~39: There might be a mistake here.
Context: ...iciently compute the DPLR representation $$ \mathbf S_t = \mathbf S_{t-1} \le...
(QB_NEW_EN)
[grammar] ~40-~40: There might be a mistake here.
Context: ...ently compute the DPLR representation $$ \mathbf S_t = \mathbf S_{t-1} \left( \ma...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...p \right) + \mathbf v_t \mathbf k_t^\top $$ for vectors $\mathbf a_t, \mathbf b_t...
(QB_NEW_EN)
[grammar] ~42-~42: There might be a mistake here.
Context: ...right) + \mathbf v_t \mathbf k_t^\top $$ for vectors $\mathbf a_t, \mathbf b_t, ...
(QB_NEW_EN)
[grammar] ~59-~59: There might be a mistake here.
Context: ...\mathbf a_i, & i \geq 2. \end{cases} $$ where we define $\mathbf \Gamma_m^{n}...
(QB_NEW_EN)
[grammar] ~60-~60: There might be a mistake here.
Context: ...thbf a_i, & i \geq 2. \end{cases} $$ where we define $\mathbf \Gamma_m^{n} \c...
(QB_NEW_EN)
[grammar] ~86-~86: There might be a mistake here.
Context: ...\mathbf a_i, & i \geq 2. \end{cases} $$ We again show this claim by induction...
(QB_NEW_EN)
[grammar] ~87-~87: There might be a mistake here.
Context: ...thbf a_i, & i \geq 2. \end{cases} $$ We again show this claim by induction. T...
(QB_NEW_EN)
🪛 GitHub Actions: lint
fla/ops/generalized_delta_rule/README.md
[error] 1-1: Trailing whitespace detected by pre-commit hook 'trailing-whitespace' during 'pre-commit run --files fla/ops/generalized_delta_rule/README.md'. The file was auto-fixed; please re-run pre-commit to confirm all checks pass.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test H100 (PyTorch 2.7) / test-models
- GitHub Check: check-pt-python-compatibility
🔇 Additional comments (1)
fla/ops/generalized_delta_rule/README.md (1)
47-61
: Mathematical content LGTM.Beyond the noted nits and index fixes, the WY forms for P_t and S_t are clear and align with the intended DPLR recurrence, including base cases and induction structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
fla/ops/generalized_delta_rule/README.md (3)
36-39
: Add provenance/attribution + avoid bare email; consider de-relying on Google Drive.
- Add original author attribution, link to source, and a short note of your contributions as requested in review.
- Replace bare email with a mailto link to satisfy markdownlint MD034.
- Long-term, avoid external Drive links for critical docs; mirror or move into repo under docs/. (Echoing earlier feedback.)
Proposed edits:
-If you have questions about or comments on the below derivations, feel free to reach out: philipp.nazari@tuebingen.mpg.de. +If you have questions about or comments on the below derivations, feel free to reach out: [philipp.nazari@tuebingen.mpg.de](mailto:philipp.nazari@tuebingen.mpg.de).Optionally insert an attribution block (fill in names/links):
+#### Provenance and acknowledgments + +This section builds on an earlier technical note by <ORIGINAL AUTHOR(S)>, available at +[link](https://drive.google.com/file/d/1qqc6THTRc2bw-LtwsbGNxNDw00sNzi5M/view?usp=sharing). +In this PR, we: +- Correct minor inconsistencies (indexing, transpose, and grouping). +- Integrate a self-contained derivation in-repo. +- Clarify base cases and induction steps. + +If permissible, consider mirroring the original note (PDF/Markdown) under docs/ for archival and versioning.Would you like me to open a follow-up PR to add a docs/ note and populate the attribution from your sources?
70-76
: Indices and identity boundary case are now correct.The induction step for
$\mathbf P_{t+1}$ uses$\mathbf \Gamma_{i+1}^{t+1}$ and the boundary$\mathbf \Gamma_{t+2}^{t+1}=\mathbf I$ , addressing the prior mismatch.
78-101
: Derivation for S_t looks correct; fixes to base case, transpose, and grouping are in place.
- Base case uses
$\mathbf \Gamma_2^1=\mathbf I$ .- Transpose on
$\mathbf k_{t+1}$ present.- Parentheses around
$(\mathbf v_i \mathbf k_i^\top + \mathbf u_i \mathbf b_i^\top)$ are balanced.
🧹 Nitpick comments (5)
fla/ops/generalized_delta_rule/README.md (5)
5-7
: Unify transpose notation to use \top consistently.Current sections use both T and \top. Prefer \top throughout for consistency with later derivations.
-\mathbf{S}_t = \mathbf{S}_{t-1}(\mathbf{I}-\beta_t \mathbf{k}_t\mathbf{k}_t^T) + \beta_t \mathbf{v}_t\mathbf{k}_t^T +\mathbf{S}_t = \mathbf{S}_{t-1}(\mathbf{I}-\beta_t \mathbf{k}_t\mathbf{k}_t^\top) + \beta_t \mathbf{v}_t\mathbf{k}_t^\top-\mathbf{S}_t = \mathbf{S}_{t-1}(\mathbf{I}+\mathbf{a}_t\mathbf{b}_t^T) + \mathbf{v}_t\mathbf{k}_t^T +\mathbf{S}_t = \mathbf{S}_{t-1}(\mathbf{I}+\mathbf{a}_t\mathbf{b}_t^\top) + \mathbf{v}_t\mathbf{k}_t^\top-\mathbf{S}_t = \mathbf{S}_{t-1}(\mathbf{D}_t+\mathbf{a}_t\mathbf{b}_t^T) + \mathbf{v}_t\mathbf{k}_t^T +\mathbf{S}_t = \mathbf{S}_{t-1}(\mathbf{D}_t+\mathbf{a}_t\mathbf{b}_t^\top) + \mathbf{v}_t\mathbf{k}_t^\topAlso applies to: 15-17, 29-31
41-46
: Typo/notation: use d × d instead of d, d for matrix shape.Minor LaTeX/notation nit.
-for vectors $\mathbf a_t, \mathbf b_t, \mathbf v_t, \mathbf k_t \in \mathbb R^d$ and matrices $\mathbf D_t \in \mathbb R^{d, d}$. +for vectors $\mathbf a_t, \mathbf b_t, \mathbf v_t, \mathbf k_t \in \mathbb R^d$ and matrices $\mathbf D_t \in \mathbb R^{d \times d}$.
49-56
: Define P_t explicitly before giving its WY form.Add the product definition to make the section self-contained.
-### $WY$ Representation for $P_t$ +### $WY$ Representation for $P_t$ +Let $\displaystyle \mathbf P_t \coloneqq \prod_{i=1}^t \left(\mathbf D_i + \mathbf a_i \mathbf b_i^\top\right)$. Let $\mathbf \Gamma_i^t \coloneqq \prod_{j=i}^t \mathbf D_j$. Then
65-66
: Notation consistency: boldface D_1.Keep symbols bold throughout.
-We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$. +We proceed by induction. The base case is quickly established for $t=1$, considering that $\mathbf \Gamma_1^1 = \mathbf D_1$ and $\mathbf \Gamma_2^1 = \mathbf I$.
9-10
: Wording nit: clarify “I is not necessarily an identity matrix.”“I” by definition denotes the identity. Suggest rephrase to “the transition matrix is not necessarily the identity; …”
-This repository implements a delta rule variant where $\mathbf{I}$ is not necessarily an identity matrix; $\mathbf{k}_t$ in $\mathbf{I} - \beta_t \mathbf{k}_t\mathbf{k}_t^T$ might be different from input $\mathbf{k}_t$ in $\mathbf{v}_t\mathbf{k}_t^T$. +This repository implements a delta rule variant where the transition matrix is not necessarily the identity; $\mathbf{k}_t$ in $\mathbf{I} - \beta_t \mathbf{k}_t\mathbf{k}_t^\top$ might be different from the input $\mathbf{k}_t$ in $\mathbf{v}_t\mathbf{k}_t^\top$.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
fla/ops/generalized_delta_rule/README.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
fla/ops/generalized_delta_rule/README.md
[grammar] ~41-~41: There might be a mistake here.
Context: ...iciently compute the DPLR representation $$ \mathbf S_t = \mathbf S_{t-1} \le...
(QB_NEW_EN)
[grammar] ~42-~42: There might be a mistake here.
Context: ...ently compute the DPLR representation $$ \mathbf S_t = \mathbf S_{t-1} \left( \ma...
(QB_NEW_EN)
[grammar] ~43-~43: There might be a mistake here.
Context: ...p \right) + \mathbf v_t \mathbf k_t^\top $$ for vectors $\mathbf a_t, \mathbf b_t...
(QB_NEW_EN)
[grammar] ~44-~44: There might be a mistake here.
Context: ...right) + \mathbf v_t \mathbf k_t^\top $$ for vectors $\mathbf a_t, \mathbf b_t, ...
(QB_NEW_EN)
[grammar] ~61-~61: There might be a mistake here.
Context: ...\mathbf a_i, & i \geq 2. \end{cases} $$ where we define $\mathbf \Gamma_m^{n}...
(QB_NEW_EN)
[grammar] ~62-~62: There might be a mistake here.
Context: ...thbf a_i, & i \geq 2. \end{cases} $$ where we define $\mathbf \Gamma_m^{n} \c...
(QB_NEW_EN)
[grammar] ~88-~88: There might be a mistake here.
Context: ...\mathbf a_i, & i \geq 2. \end{cases} $$ We again show this claim by induction...
(QB_NEW_EN)
[grammar] ~89-~89: There might be a mistake here.
Context: ...thbf a_i, & i \geq 2. \end{cases} $$ We again show this claim by induction. T...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
fla/ops/generalized_delta_rule/README.md
38-38: Bare URL used
(MD034, no-bare-urls)
🪛 GitHub Actions: lint
fla/ops/generalized_delta_rule/README.md
[error] 1-1: Trailing whitespace detected by pre-commit hook 'trailing-whitespace'; the file was updated to fix trailing spaces. Please review and commit the changes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: check-pt-python-compatibility
🔇 Additional comments (1)
fla/ops/generalized_delta_rule/README.md (1)
1-1
: Note on CI: trailing whitespace was auto-fixed.Pre-commit updated the file to remove trailing spaces. Make sure to pull/merge those changes so CI stays green.
I have now updated my pull-request to include the computations in the README.md instead of the link to the drive (which is not really future proof). It also contains a link to the original derivations by Songlin. I believe the mistakes in the previous derivations are:
Please let me know if you find any mistakes in my derivations though, I would be happy to fix them! :) |
I have updated the technical note for the WY representation of products of DPLR matrices. I believe there were some minor mistakes in the existing theoretical analysis, like index-mismatches and mix-ups of matrix multiplications.
The mistakes do not translate to mistakes in the implementation, which might very well be correct!!!
Summary by CodeRabbit