-
Notifications
You must be signed in to change notification settings - Fork 4
fix: prevent synced invoices from getting updated by outdated webhooks #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9bce8bf
to
8387069
Compare
fe7dce9
to
04d51a1
Compare
.map((x) => `"${x}" = EXCLUDED."${x}"`) | ||
.join(',')}, | ||
last_synced_at = :last_synced_at | ||
WHERE "${table}"."last_synced_at" IS NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will upsert if ANY entry in the table matches this, but not necessarily this id, we can do something like
WHERE NOT EXISTS (select 1 from "${table}" where id = :id AND "${table}"."last_synced_at" > :last_synced_at;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration tests likely did not catch this (if true), given you test for a single entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! You're right - I'll add a test to cover this too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to do a deep dive into the Postgres docs, but it seems that Postgres actually scopes the WHERE
clause in ON CONFLICT DO UPDATE
very narrowly.
Specifically:
- The WHERE clause only runs after a conflict is detected (i.e. once a row with matching constraint is found).
- At that point, it only has access to two rows:
- The conflicting row from the table (via
${table}
or alias) - The proposed insert row (via
excluded
)
- The conflicting row from the table (via
So even though it looks like ${table}."last_synced_at"
could match any row, it actually only refers to the row that caused the conflict.
Source:
- PostgreSQL Docs on INSERT
“The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table’s name (or an alias), and to the row proposed for insertion using the special excluded table.” - This comment on StackOverflow also says the same thing and has a clearer example: https://stackoverflow.com/a/44579804
That said, I find this to be quite non-intuitive (even a bit misleading 😅). I've added a test to verify that this works and reviewed it thoroughly, and I believe it addresses this case. I've added a comment too.
Let me know what you think :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sgtm, also tested this locally and it makes sense
@@ -43,6 +43,7 @@ export const invoiceSchema: JsonSchema = { | |||
sync_failed_at: { type: 'string' }, | |||
voided_at: { type: 'string' }, | |||
will_auto_issue: { type: 'boolean' }, | |||
last_synced_at: { type: 'string' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last_synced_at
might be a little confusing wrt to the name, as this field is either
- timestamp when a manual sync was triggered (matches the name)
- Timestamp of the entity change, not when it was synced - i.e. entity changed at 21:00:00, webhook came in at 21:15:00 - so we synced it at 21:15:00, but we will be using 21:00:00 as date
I'm okay with the name as-is, but maybe we can come up with a more fitting name, just a nit though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for flagging this!
Timestamp of the entity change, not when it was synced - i.e. entity changed at 21:00:00, webhook came in at 21:15:00 - so we synced it at 21:15:00, but we will be using 21:00:00 as date
We will actually update last_synced_at
to the time of the webhook creation, not the time when it arrived. We're using the webhook's created_at
timestamp.
For example, taken from a recent Orb webhook:
{
"id": "<invoice_id>",
"created_at": "2025-06-27T17:54:17.144536+00:00",
"type": "invoice.issued",
"invoice": {
"created_at": "2025-06-27T17:54:15+00:00",
...
}
}
We'll use the first one as the last_synced_at
, which is two seconds after the Invoice entity was created/updated.
In the webhook handler:
await syncInvoices(this.postgresClient, [webhook.invoice], webhook.created_at);
Notice that we're using webhook.created_at
and not webhook.invoice.created_at
So last_synced_at
= “we synced this entity because of an event that occurred at this time”
I think that last_synced_at
still makes sense in this case, what do you think?
Happy to make the change if it doesn't make sense! :D
entries: T[], | ||
table: string, | ||
tableSchema: JsonSchema, | ||
syncTimestamp: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When doing a manual sync, when can always pass new Date()
as syncTimestamp
, given we sync at that time specifically - in that case we may end up ALWAYS having a syncTimestamp, so this separate method would not be necessary - or are there any cases I am missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate method is not strictly necessary, but my idea was to keep the new functionality contained to the invoices first, especially given that we don't have tests for the other events yet. In case something goes wrong, syncing up only the invoices would be easier.
But yeah, eventually, I don't see a reason why we wouldn't do the same for every other entity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me to get this out 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm!
This PR introduces a fix for out-of-order webhooks causing erroneous states in the invoices.
What is the current behavior?
Out-of-order webhook events from Orb can cause incorrect invoice states in the database. Since the system upserts invoices without checking the recency of the state, an issued status may overwrite a previously stored paid status.
What is the new behavior?
This PR introduces a general safeguard for upserting - the
upsertManyWithTimestampProtection
function. This function uses a SQL statement with aWHERE
clause that only updates records when the incominglast_synced_at
timestamp is newer than the existing one. This prevents older webhook events from overwriting newer data, ensuring data consistency. This pattern can easily be applied to other entities by using the same function in their respective sync functions.It also:
invoices
,customers
,subscriptions
,credit_notes
,plans
, andbillable_metrics
syncTimestamp
parameter to thesyncInvoices
method allows webhook handlers to pass the webhook'screated_at
timestamp, which is then used as thelast_synced_at
value for timestamp-based protection