HIP-1193 - Records to block streams cutover #1193

Mark-Swirlds · 2025-05-08T22:09:17Z

Description:

Related issue(s):

Fixes #

Notes for reviewer:

Checklist

Documented (Code comments, README, etc.)
Tested (unit, integration, etc.)

Mark-Swirlds · 2025-05-08T22:28:54Z

Updated HIP w/ PR number and proper DCO.

This HIP defines the requirements and implementation details for transitioning the Hedera network from record and event streams to block streams. This transition is a critical step toward enhancing blockchain compatibility, improving data integrity, and enabling future network optimizations such as state proofs. The proposal outlines the coordinated changes required across consensus nodes, mirror nodes, and supporting infrastructure to ensure a clean cutover without service disruption. It specifies new storage paths, file formats, and mechanisms to handle the transition state, including a marker file approach to indicate the final record stream entry before blocks begin. Signed-off-by: Mark Blackman <mark@hashgraph.com>

Neurone · 2025-05-14T14:54:17Z

HIP/hip-1193.md

+The new block stream files will follow a revised path structure that includes network, shard, realm, and node ID components:
+
+```
+block/{network}-{YYYY-MM-DDTHH:mm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz


I suggest the following:

Include some information in the file name as well as using a subfolder structure to store the files. This will make it easier to support multiple folder structures in the future, including a completely flat structure, while maintaining coherence and avoiding collisions.

Use an ID (i.e. chainID = 295) instead of a label (i.e. 'mainnet') to identify the network.

Use the block date further down the folder structure.

Move the node ID as the first dynamic folder

Remove leading zeros in the block number.

In particular, I propose the following structure:

blocks/{node ID}/{network ID}/{realm ID}/{shard ID}/YYYYMMDD/{network ID}_{realm ID}_{shard ID}_{YYYYMMDD}_{block number}.blk.zstd

For example:

blocks/10/295/0/0/20250901/295_0_0_20250901_83406349.blk.zstd

It should be noted here that this file structure is still very much temporary.
The long-term approach is that Block Nodes will store blocks, and access via buckets will end. This temporary structure is intended to require minimal changes to existing clients during this interim period (so that clients can focus more resources on switching fully to block nodes as the data source).

Block Nodes (at least the Hiero implementation) use a more detailed and complex structure for local storage that makes automated access much easier; but that's an internal detail. Users of block streams will use the API to request individual blocks or a stream of blocks (with several associated options).

One small suggestion: we can remove the extra characters in the date/time format to make it a little shorter and more obviously intended for automation:

Suggested change

block/{network}-{YYYY-MM-DDTHH:mm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz

block/{network}-{YYYYMMDD_HHmm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz

Neurone · 2025-05-14T14:56:05Z

HIP/hip-1193.md

+- The shard and realm values will be included in the directory structure  
+- The nodeID will identify the consensus node that produced the block  
+- Block files will use a 36-digit zero-padded block number format  
+- Files will use gzip compression and the `.blk.gz` extension


We mentioned above we are going to use zstd (.blk.zstd), rather than gzip.

Good catch.
The bucket files will still be gzip because that's what the existing uploader and consensus node support.

Block Nodes may use ZStandard internally, but that is an internal implementation detail that might change arbitrarily. The use of ZStandard should not be included in this HIP.

Neurone · 2025-05-14T14:57:46Z

HIP/hip-1193.md

+Consensus nodes will implement the new path structure, for example:
+
+```
+block/mainnet/0/0/10/000000000000000000000000000083406349.blk.gz


I suggest changes to this here.

As a side note, in case we don't want to proceed with the suggested changes, this example is missing the date/time folder.

The date and time are optional and used for resettable networks (which mainnet is not).

Neurone · 2025-05-14T15:12:55Z

HIP/hip-1193.md

+- The network value will identify the network (mainnet, testnet, etc.)  
+- The shard and realm values will be included in the directory structure  
+- The nodeID will identify the consensus node that produced the block  
+- Block files will use a 36-digit zero-padded block number format  


What are the advantages of a 36-digit zero-padded number instead of the simple plain number?

The 36-digit length is an implementation choice in consensus node (it cannot be more than 19 non-zero digits, in actuality, and should be 9 or less).
The reason to choose a fixed-length value is that sorting works better and it is much easier for the software to pre-calculate reliably so that we do not need to list directories (which is abnormally expensive in cloud buckets).

jsync-swirlds

A few mild suggestions.

jsync-swirlds · 2025-05-15T00:06:58Z

HIP/hip-1193.md

+
+- Stop producing record and event streams at the designated cutover point  
+- Generate marker files to signal the end of record streams  
+- Begin producing block streams with proper running hash continuity  


As it's not technically a running hash on both sides (it's more complex than that) perhaps describe a from-to pair?

Suggested change

- Begin producing block streams with proper running hash continuity

- Begin producing block streams with proper continuity from record stream hash to block hash.

jsync-swirlds · 2025-05-15T00:08:42Z

HIP/hip-1193.md

+7. As a consumer of mirror node data and associated metrics, I want the transition to be seamless with no impact on how I consume mirror node data.
+
+## Specification
+i### 1\. Cutover Point Definition


typo

Suggested change

i### 1\. Cutover Point Definition

### 1\. Cutover Point Definition

jsync-swirlds · 2025-05-15T00:11:39Z

HIP/hip-1193.md

+- The first block will contain zero transactions to ensure proper formatting without impacting network usage  
+- Block continuity will be maintained by carrying forward:  
+  - The correct block number (incremented by one from the last record)  
+  - The running hash values \- cryptographic continuity of the Hedera blockchain.


Hash description might be a little clearer as follows:

Suggested change

- The running hash values \- cryptographic continuity of the Hedera blockchain.

- The last running hash value — cryptographic continuity of the Hedera blockchain.

jsync-swirlds · 2025-05-15T00:13:51Z

HIP/hip-1193.md

+
+Consensus nodes will:
+
+- Begin construction of the first block with zero transactions  


Small suggestion

Suggested change

- Begin construction of the first block with zero transactions

- Construct the first block with zero transactions

jsync-swirlds · 2025-05-15T00:14:56Z

HIP/hip-1193.md

+- Apply the correct running hash and block number from the final record  
+- Upload to new block path structure for uploaders
+
+NOTE:  It is required all event stream format changes are completed prior to the cutover


Clarity suggestion

Suggested change

NOTE: It is required all event stream format changes are completed prior to the cutover

NOTE: All event stream format changes must be completed and active prior to the cutover

jsync-swirlds · 2025-05-15T00:20:01Z

HIP/hip-1193.md

+The new block stream files will follow a revised path structure that includes network, shard, realm, and node ID components:
+
+```
+block/{network}-{YYYY-MM-DDTHH:mm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz


One small suggestion: we can remove the extra characters in the date/time format to make it a little shorter and more obviously intended for automation:

Suggested change

block/{network}-{YYYY-MM-DDTHH:mm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz

block/{network}-{YYYYMMDD_HHmm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz

jsync-swirlds · 2025-05-15T00:22:11Z

HIP/hip-1193.md

+- Block number-based naming for sequential processing  
+- ISO timestamp format of last network reset (optional field for resettable networks)
+
+Block files will use the `.blk.zstd` extension and Zstandard (Zstd) compression to optimize storage while maintaining reasonable processing performance.


Shouldn't refer to ZStandard here, as the transitional files will still be gzip.

Suggested change

Block files will use the `.blk.zstd` extension and Zstandard (Zstd) compression to optimize storage while maintaining reasonable processing performance.

Block files will use the `.blk.gz` extension and GZip compression to maintain better compatibility with current record stream processing.

jsync-swirlds · 2025-05-15T00:23:28Z

HIP/hip-1193.md

+- Support for sharding  
+- Node identification to identify source of block  
+- Block number-based naming for sequential processing  
+- ISO timestamp format of last network reset (optional field for resettable networks)


Minor clarification

Suggested change

- ISO timestamp format of last network reset (optional field for resettable networks)

- ISO 8601 date and time format values of last network reset (optional field for resettable networks)

jsync-swirlds · 2025-05-15T00:24:21Z

HIP/hip-1193.md

+Consensus nodes will implement the new path structure, for example:
+
+```
+block/mainnet/0/0/10/000000000000000000000000000083406349.blk.gz


The date and time are optional and used for resettable networks (which mainnet is not).

jsync-swirlds · 2025-05-15T00:59:12Z

HIP/hip-1193.md

+
+Timestamp to Block Number Mapping: A solution is needed for mirror nodes to map between timestamps and block numbers, particularly for startup and historical data access.
+## References
+A collection of URLs used as references throughout the HIP.


Might want to remove this boilerplate text.

nadineloepfe · 2025-05-20T09:25:08Z

HIP/hip-1193.md

+
+2. As a mirror node operator, I want to understand the new block file format and paths so that I can adjust my infrastructure to process block streams efficiently.  
+
+3. As a developer building on Hedera, I want to understand how the transition affects data availability and processing so that my applications continue to function correctly.  


does this user story take into account developers or projects that currently ingest data via S3 and record files, rather than using the mirror nodes?

If so, they will likely need to rework their architecture and it would be helpful to provide guidance or migration resources to support the transition.

Thank you for the comment Nadine. Happy to adjust the user story and document the impact appropriately. Do you know of devs or projects directly ingesting the files? If yes can you share their use case and why they are using records directly.

Mark-Swirlds requested a review from a team as a code owner May 8, 2025 22:09

Mark-Swirlds changed the title ~~Added file hip-0000.md - Records to block streams cutover~~ HIP-1193.md - Records to block streams cutover May 8, 2025

Mark-Swirlds force-pushed the hip_blockstreams_cutover branch from 285854b to b6a950e Compare May 8, 2025 22:59

Neurone reviewed May 14, 2025

View reviewed changes

jsync-swirlds reviewed May 15, 2025

View reviewed changes

Neurone changed the title ~~HIP-1193.md - Records to block streams cutover~~ HIP-1193 - Records to block streams cutover May 15, 2025

nadineloepfe reviewed May 20, 2025

View reviewed changes

	block/{network}-{YYYY-MM-DDTHH:mm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz
	block/{network}-{YYYYMMDD_HHmm}/{realm}/{shard}/{nodeID}/0000000000000000000000000000000000000.blk.gz

	- Begin producing block streams with proper running hash continuity
	- Begin producing block streams with proper continuity from record stream hash to block hash.

	i### 1\. Cutover Point Definition
	### 1\. Cutover Point Definition

	- The running hash values \- cryptographic continuity of the Hedera blockchain.
	- The last running hash value — cryptographic continuity of the Hedera blockchain.


		Consensus nodes will:

		- Begin construction of the first block with zero transactions

	- Begin construction of the first block with zero transactions
	- Construct the first block with zero transactions

	NOTE: It is required all event stream format changes are completed prior to the cutover
	NOTE: All event stream format changes must be completed and active prior to the cutover

	Block files will use the `.blk.zstd` extension and Zstandard (Zstd) compression to optimize storage while maintaining reasonable processing performance.
	Block files will use the `.blk.gz` extension and GZip compression to maintain better compatibility with current record stream processing.

	- ISO timestamp format of last network reset (optional field for resettable networks)
	- ISO 8601 date and time format values of last network reset (optional field for resettable networks)


		2. As a mirror node operator, I want to understand the new block file format and paths so that I can adjust my infrastructure to process block streams efficiently.

		3. As a developer building on Hedera, I want to understand how the transition affects data availability and processing so that my applications continue to function correctly.

HIP-1193 - Records to block streams cutover #1193

Are you sure you want to change the base?

HIP-1193 - Records to block streams cutover #1193

Uh oh!

Conversation

Mark-Swirlds commented May 8, 2025

Uh oh!

Mark-Swirlds commented May 8, 2025

Uh oh!

Neurone May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsync-swirlds May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsync-swirlds left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Neurone May 14, 2025 •

edited

Loading

jsync-swirlds May 14, 2025 •

edited

Loading