Skip to content

Commit ebf527b

Browse files
fix: Non-random KSUIDs from hashes in folder, dashboard, and alert migrations (#5726)
### Summary This PR updates the logic that is used to generate KSUID primary keys for the `folders`, `dashboards`, and `alerts` tables in existing migrations which transform data from the `meta` table into the `folders`, `dashboards`, and `alerts` tables. The new logic generates 160-bit "KSUIDs" which are actually 160-bit SHA-1 hashes of the records being transformed rather than being truly random KSUIDs. ### Motivation In the existing implementation, the migrations generate random KSUIDs for the records that are transformed from the `meta` table and into the different `folders`, `dashboards`, and `alerts` tables. When running the application in a single cluster this is acceptable behavior. However when running the application in a multi-cluster environment this causes problems. In a multi-cluster environment, each cluster will have it's own copy of the application database, and each database should contain the exact same data in the `meta` table. When we run the migration scripts in each cluster, the data from the `meta` table will be transformed into the `folders`, `dashboards`, and `alerts` tables. However if the migration generates a random KSUID for each new record inserted into those new tables, then different clusters will each generate different KSUID `id`s for rows in the `folders`, `dashboards`, and `alerts` tables which should actually have the same `id`s. Our solution to this problem is that during the migration from the `meta` table to the `folders`, `dashboards`, and `alerts` tables, rather than generating random 160-bit KSUID `id`s for the new records, we generate 160-bit SHA-1 hashes for each record and we use that SHA-1 hash as the `id` of the each new record. This ensure that each record that is transformed from the `meta` table will have an `id` in the `folders`, `alerts`, or `dashboards` tables that is consistent across all clusters. Newly generated records (ie, records generated AFTER the migration) in the `folders`, `alerts`, and `dashboards` tables will continue to be generated with random KSUID `id`s since these `id`s can be shared with other clusters via NATS super-cluster events and then records with those `id`s can be inserted into databases in other clusters, ensuring that records have consistent `id`s across clusters. ### Notes about KSUID generation To generate a KSUID this function generates the 160-bit SHA-1 hash of the alert's `org`, `stream_type`, `stream_name`, and `name` and interprets that 160-bit hash as a 160-bit KSUID. Therefore two KSUIDs generated in this manner will always be equal if the alerts have the same `org`, `stream_type`, `stream_name`, and `name`. ⚠️ It is important to note that although KSUIDs generated in this manner can be parsed as KSUIDs since they are 160-bit values, the KSUIDs generated in this manner will have timestamp bits which are effectively random, meaning that the timestamp in any KSUID generated with this function will be random. This will render the timestamp-sortability property of these KSUIDs useless. This probably isn't a big deal since we can always add and sort by a `created_at` column if we want to, but we should be aware of this limitation. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added SHA-1 hashing dependency to generate unique identifiers (KSUIDs) for alerts, folders, and dashboards. - Implemented KSUID generation based on specific entity attributes. - **Refactor** - Transitioned database tables from integer-based to KSUID-based primary keys. - Updated `StreamType` enum to support more flexible usage. - **Chores** - Updated project dependencies to include SHA-1 hashing library. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 4396756 commit ebf527b

File tree

5 files changed

+73
-5
lines changed

5 files changed

+73
-5
lines changed

Cargo.lock

Lines changed: 2 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,7 @@ sea-orm-migration = { version = "1.1.0", features = [
305305
segment = "~0.2.4"
306306
serde = { version = "1", features = ["derive"] }
307307
serde_json = "1"
308+
sha1 = "0.10.6"
308309
sha256 = "1.4.0"
309310
snafu = "0.7.5"
310311
snap = "1"

src/infra/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ once_cell.workspace = true
2525
parking_lot.workspace = true
2626
serde.workspace = true
2727
serde_json.workspace = true
28+
sha1.workspace = true
2829
sqlx.workspace = true
2930
svix-ksuid.workspace = true
3031
thiserror.workspace = true

src/infra/src/table/migration/m20241217_155000_populate_alerts_table.rs

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -675,7 +675,7 @@ mod meta_table_alerts {
675675
DerivedStream,
676676
}
677677

678-
#[derive(Default, Deserialize, Serialize)]
678+
#[derive(Default, Clone, Copy, Deserialize, Serialize)]
679679
#[serde(rename_all = "lowercase")]
680680
pub enum StreamType {
681681
#[default]
@@ -706,7 +706,7 @@ impl TryFrom<MetaAlertWithFolder> for alerts_table::ActiveModel {
706706
.ok_or("Alert in meta table references folder that does not exist")?;
707707
let meta_alert: meta_table_alerts::Alert = serde_json::from_str(m.alert_json())
708708
.map_err(|_| "Alert in meta table could not be deserialized")?;
709-
let id = svix_ksuid::Ksuid::new(None, None).to_string();
709+
let id = ksuid_from_hash(&meta_alert).to_string();
710710

711711
// Transform the parsed stream type from the meta table and serialize
712712
// into string for storage in DB.
@@ -847,6 +847,33 @@ impl TryFrom<MetaAlertWithFolder> for alerts_table::ActiveModel {
847847
}
848848
}
849849

850+
/// Generates a KSUID from a hash of the alert's `org`, `stream_type`,
851+
/// `stream_name`, and `name`.
852+
///
853+
/// To generate a KSUID this function generates the 160-bit SHA-1 hash of
854+
/// the alert's `org`, `stream_type`, `stream_name`, and `name` and interprets
855+
/// that 160-bit hash as a 160-bit KSUID. Therefore two KSUIDs generated in this
856+
/// manner will always be equal if the alerts have the same `org`,
857+
/// `stream_type`, `stream_name`, and `name`.
858+
///
859+
/// It is important to note that KSUIDs generated in this manner will have
860+
/// timestamp bits which are effectively random, meaning that the timestamp
861+
/// in any KSUID generated with this function will be random.
862+
fn ksuid_from_hash(alert: &meta_table_alerts::Alert) -> svix_ksuid::Ksuid {
863+
use sha1::{Digest, Sha1};
864+
865+
let stream_type: alerts_table_ser::StreamType = alert.stream_type.into();
866+
let stream_type = stream_type.to_string();
867+
868+
let mut hasher = Sha1::new();
869+
hasher.update(alert.org_id.clone());
870+
hasher.update(stream_type);
871+
hasher.update(alert.stream_name.clone());
872+
hasher.update(alert.name.clone());
873+
let hash = hasher.finalize();
874+
svix_ksuid::Ksuid::from_bytes(hash.into())
875+
}
876+
850877
impl From<meta_table_alerts::CompareHistoricData> for alerts_table_ser::CompareHistoricData {
851878
fn from(value: meta_table_alerts::CompareHistoricData) -> Self {
852879
Self {

src/infra/src/table/migration/m20250109_092400_recreate_tables_with_ksuids.rs

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -175,8 +175,8 @@ mod legacy_folders {
175175

176176
while let Some(folders) = pages.fetch_and_next().await? {
177177
for folder in folders {
178+
let ksuid = ksuid_from_hash(&folder).to_string();
178179
let mut am = folder.into_active_model();
179-
let ksuid = svix_ksuid::Ksuid::new(None, None).to_string();
180180
println!("folder ksuid: {}", ksuid);
181181
am.ksuid = Set(Some(ksuid));
182182
am.update(conn).await?;
@@ -193,6 +193,26 @@ mod legacy_folders {
193193
pub fn drop_table() -> TableDropStatement {
194194
Table::drop().table(Alias::new(NEW_TABLE_NAME)).to_owned()
195195
}
196+
197+
/// Generates a KSUID from a hash of the folder's `org` and `folder_id`.
198+
///
199+
/// To generate a KSUID this function generates the 160-bit SHA-1 hash of
200+
/// the folder's `org` and `folder_id` and interprets that 160-bit hash as
201+
/// a 160-bit KSUID. Therefore two KSUIDs generated in this manner will
202+
/// always be equal if the folders have the same `org` and `folder_id`.
203+
///
204+
/// It is important to note that KSUIDs generated in this manner will have
205+
/// timestamp bits which are effectively random, meaning that the timestamp
206+
/// in any KSUID generated with this function will be random.
207+
fn ksuid_from_hash(folder: &legacy_entities::legacy_folders::Model) -> svix_ksuid::Ksuid {
208+
use sha1::{Digest, Sha1};
209+
let mut hasher = Sha1::new();
210+
hasher.update(folder.org.clone());
211+
hasher.update(folder.r#type.to_string());
212+
hasher.update(folder.folder_id.clone());
213+
let hash = hasher.finalize();
214+
svix_ksuid::Ksuid::from_bytes(hash.into())
215+
}
196216
}
197217

198218
/// Data structures and migration statements for the legacy dashboards table.
@@ -240,8 +260,8 @@ mod legacy_dashboards {
240260

241261
while let Some(dashboards) = pages.fetch_and_next().await? {
242262
for dashboard in dashboards {
263+
let ksuid = ksuid_from_hash(&dashboard).to_string();
243264
let mut am = dashboard.into_active_model();
244-
let ksuid = svix_ksuid::Ksuid::new(None, None).to_string();
245265
am.ksuid = Set(Some(ksuid));
246266
am.update(conn).await?;
247267
}
@@ -257,6 +277,24 @@ mod legacy_dashboards {
257277
pub fn drop_table() -> TableDropStatement {
258278
Table::drop().table(Alias::new(NEW_TABLE_NAME)).to_owned()
259279
}
280+
281+
/// Generates a KSUID from a hash of the dashboards's `dashboard_id`.
282+
///
283+
/// To generate a KSUID this function generates the 160-bit SHA-1 hash of
284+
/// the dashboard's `dashboard_id` and interprets that 160-bit hash as a
285+
/// 160-bit KSUID. Therefore two KSUIDs generated in this manner will always
286+
/// be equal if the dashboard's have the same `dashboard_id`.
287+
///
288+
/// It is important to note that KSUIDs generated in this manner will have
289+
/// timestamp bits which are effectively random, meaning that the timestamp
290+
/// in any KSUID generated with this function will be random.
291+
fn ksuid_from_hash(dashboard: &legacy_entities::legacy_dashboards::Model) -> svix_ksuid::Ksuid {
292+
use sha1::{Digest, Sha1};
293+
let mut hasher = Sha1::new();
294+
hasher.update(dashboard.dashboard_id.clone());
295+
let hash = hasher.finalize();
296+
svix_ksuid::Ksuid::from_bytes(hash.into())
297+
}
260298
}
261299

262300
/// Data structures and migration statements for the legacy alerts table.

0 commit comments

Comments
 (0)