-
Notifications
You must be signed in to change notification settings - Fork 127
Added new message modelling blog post #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh-pages
Are you sure you want to change the base?
Conversation
| Coarser messages can be simpler for a consumer because all the information they need comes in one message and can be immediately stored; there is no joining of messages, but there are also costs and we'll explore these ideas soon. | ||
|
|
||
| Before we do that, let's briefly consider the different decision points where specific choices around granularity have been made: | ||
| 1. Should each endpoint send out an event matching the endpoint payload or divide it into smaller messages for particular field sets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list isnt rendering on new lines, perhaps separate out so each bullet is on a new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, worked fine in my local markdown
| ### Endpoint to message mapping | ||
| Let's say there is a single update-profile REST endpoint, like PUT /profile where the XML/JSON payload includes an email, postal address, phone number etc (assume no separate email endpoint for now) | ||
|
|
||
| When generating a message there is a choice between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
| Next let's think about the scenario shown in the diagram with a separate endpoint for changing the email (we'll conveniently ignore complexities of changing email, which may actually be multi-step). | ||
| We could send an EMAIL_CHANGED event or EMAIL state message for the email endpoint and a PROFILE/PROFILE_UPDATED one (without the email) when the profile endpoint is hit. But... wouldn't a consumer expect to find an email in a profile message? If we find that persuasive then we might just send a "PROFILE" message including the email when either endpoint is hit meaning that the consumer has one simple message to listen for regardless of how the change occurred. In this case we are relating our events to the entities within our system rather than coupling them to the REST endpoints. | ||
|
|
||
| Such an approach makes good sense but brings consistency risks that'll we'll discuss shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'll typo
| "ID": "0f504d3b-d76a-4aaa-b628-5e9eeaa10bdc", | ||
| "datetime": 10:00, 23/04/1983 UTC | ||
| "name": "Premier League", | ||
| "shortName": "Man City", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EPL?
| "datetime": 10:00, 23/04/1983 UTC | ||
| "name": "Premier League", | ||
| "shortName": "Man City", | ||
| "location": "Manchester", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe location doesnt make sense here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, copy and paste mistake
|
|
||
| Many tools like Kafka/Kinesis/EventHubs can guarantee ordering within a shard/partition (it's up to you to pick a sensible key, e.g. user Id, to select the shard) and this will simplify consumers who don't have to worry about receiving and stashing out of order events. If you don't have this you'll have to rely on timestamps to enforce order and add some consumer complexity. | ||
|
|
||
| If you are sending events from application code after a database write and without ACID guarantees, reasoning about your messages will be difficult, not just in terms of change lists but also for overall system consistency and avoiding conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Id go further than that and just say this is a total no-no - unless you are using distributed transactions to get an atomic event + db write (which I've never seen been worthwhile given the complexity and lack of sound guarantees) then you have no guarantee of consistency in your events and its almost guaranteed to drift from the true state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To a point, but with many NoSQL DBs there has been no way to guarantee message+DB atomicity as many DBs haven't had transactions no a CDC mechanism that has proper guarantees
|
|
||
| ### Security in aggregated messages and accidental coupling | ||
| Moving on to a totally separate topic, let's briefly consider security. | ||
| With REST and other APIs it is normal to have access controls saying which endpoints an be accessed by who. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - an
|
|
||
| What this means is that going down the state transfer route as opposed to events can limit your security when the messages contain an aggregation of multiple REST endpoint payloads. So never aggregate data that has different access requirements. | ||
|
|
||
| Related to this, there's also a risk of accidental coupling. Imagine you add a new field to some message where that field is only really intended for one consumer. You think that if you ever need to change/remove the field it it'll be a quick conversation with that one consumer's dev team. However, perhaps this field is also added to an existing aggregated message because the consumer already gets this and it's easier than integrating with a new topic and message. Unfortunately, 20+ other consumers are also using the aggregated message and, over time, developers in the associated other teams choose to use/misuse this field but you have no visibility. All you know is that 20 consumers of the aggregated message may or may not be using a given field. Suddenly you can't make a change because you'll break lots of services, not just the original intended one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure I agree with these next two paragraphs. The producer is in control of the event and if its published for consumption outside the domain should treat it as public a contract that could be used in any way a consumer decides. There needs to be agreed pattern for how schemas are managed, published and evolved within the ecosystem.
Events and API's are both contracts the producer is giving guarantees over. If, as a producer, you are too concerned with specific consumers of those contracts then you possibly have your domains modelled incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might reword slightly or simplify the point. Ultimately I am trying to say that if you aggregate for convenience you can lose control over who is accessing what. The same is less true with GraphQL where you can control which parts of the schema each consumer can see. And yes agreed, if you don't model your domains well then you end up with messages crossing domains and half the fields are not relevant to many consumers but they need the message for the other half
|
|
||
|
|
||
| ## Enrichment pipelines | ||
| To finish, consider a slightly different pattern I am calling the enrichment or decoration pipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a little bit of an aside, and im not sure it adds a huge amount to the overall blog
|
|
||
| You can get round the cost issue to some degree by mandating that where a service enriches data it should effectively pass through the existing data. To put it another way you treat earlier data as a blob and don't map it into internal models on input and output. However, you still need to think about schemas and how you keep this up to date. If consumer A reads from Enricher N-1 at the end of the chain, it wants an async API schema from Enricher N-1 and that should include all the data added by previous stages. | ||
|
|
||
| ## Final thoughts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the content is great, but I think it could flow a bit more clearly, some suggestions:
- you draw on a few examples throughout - trading, sports, videos, user prefs etc.. - perhaps set up the prototypical system that we will use at the start, and then use that as a single example throughout?
- it feels like the last 3 sections in granularity could go in to a separate top level section on aggregation so you would have granularity, normalisation, aggregation, and tradeoffs which could be laid out in the intro.
- maybe in the final thoughts separate out the key recommendations as bullet points for readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have a look at that, thanks
Please add a direct link to your post here:
https://dhope-scottlogic.github.io/blog/2025/07/29/message_types_part2.html
Have you (please tick each box to show completion):
npm installfollowed bynpx mdspell "**/{FILE_NAME}.md" --en-gb -a -n -x -tif that's your thing)Posts are reviewed / approved by your Regional Tech Lead.