Skip to content

Conversation

dai1975
Copy link
Contributor

@dai1975 dai1975 commented Jul 28, 2025

This is second part of chain query parallelization and responsible for chain upgrading.

See first pullreq for more details: #169

  • first PR
    • handshake connection
    • handshake channel
  • this PR
    • channel upgrade
  • later PR
    • relay packets

@dai1975 dai1975 changed the base branch from main to features/lo7181 July 28, 2025 05:15
@dai1975 dai1975 force-pushed the lo7181-long-update-and-lost-state-2 branch 3 times, most recently from 963ea7a to dbea4aa Compare August 4, 2025 05:52
@dai1975 dai1975 requested a review from Copilot August 4, 2025 06:28
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors chain query operations for channel upgrade by parallelizing queries to both chains. The change addresses an issue where one chain taking a long time to update could cause the other chain to lose its old state during synchronous operations.

  • Refactors query functions to operate on single chains instead of pairs
  • Implements parallel query execution using goroutines for channel upgrade operations
  • Adds debug prover functionality for testing chain state scenarios

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
core/query.go Refactors query functions from pair-based to single-chain operations
core/connection.go Implements parallel querying for connection handshake operations
core/channel.go Implements parallel querying for channel handshake operations
core/channel-upgrade.go Implements parallel querying for channel upgrade operations
provers/debug/*.go Adds debug prover for testing chain state scenarios
log/slog.go Adds relative logging functions for chain pairs
coreutil/unwrap.go Adds support for unwrapping debug prover
chains/tendermint/query*.go Separates debug query implementation from production
Comments suppressed due to low confidence (1)

core/connection.go:245

  • [nitpick] The error message 'srcStream channel closed unexpectedly' could be more descriptive by indicating what caused the channel to close or suggesting recovery actions.
		if !ok {

@dai1975 dai1975 force-pushed the lo7181-long-update-and-lost-state-2 branch 2 times, most recently from 8feda1d to c86f1b7 Compare August 7, 2025 03:11
@dai1975 dai1975 changed the title fix upgrade-channel: in case of one chain takes long time for updating and the other chain loses old state part2 upgrade-channel: in case of one chain takes long time for updating and the other chain loses old state Aug 12, 2025
@dai1975 dai1975 requested review from siburu and Copilot August 12, 2025 11:36
Copilot

This comment was marked as outdated.

@dai1975 dai1975 force-pushed the lo7181-long-update-and-lost-state-2 branch from ad5f827 to e7fd602 Compare August 13, 2025 05:45
Daisuke Kanda added 10 commits August 14, 2025 07:16
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
@dai1975 dai1975 force-pushed the lo7181-long-update-and-lost-state-2 branch from e7fd602 to 07f68dc Compare August 14, 2025 10:52
@dai1975 dai1975 requested a review from Copilot August 14, 2025 12:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements parallelized querying for channel upgrade operations in the IBC relayer to handle scenarios where one chain takes significantly longer to update than the other, potentially causing the counterparty chain to lose old state. This is part 2 of the chain query parallelization effort, focusing specifically on channel upgrades.

  • Refactored channel upgrade querying to support parallel execution using goroutines
  • Introduced environment variables for debugging relayer pruning and wait behavior
  • Extended timeout configurations to accommodate longer upgrade processes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
core/query.go Refactored QueryChannelUpgradePair to use parallel goroutines and extracted QueryChannelUpgrade helper function
core/channel-upgrade.go Restructured upgrade logic with new queryUpgradeChannelState function and parallel query execution
tests/cases/tm2tm/scripts/test-channel-upgrade Added debug environment variables and increased timeout values for testing
tests/cases/docker-compose-test.yaml Increased IBC channel upgrade timeout from 20 to 100 seconds

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}
return nil
}, rtyAtt, rtyDel, rtyErr, retry.Context(ctx), retry.OnRetry(func(n uint, err error) {
// logRetryUpdateHeaders(src, dst, n, err)
Copy link
Preview

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The commented out logRetryUpdateHeaders function call should either be implemented or removed to avoid confusion and maintain code clarity.

Suggested change
// logRetryUpdateHeaders(src, dst, n, err)

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented out logRetryUpdateHeaders function call should be removed ultimately, and so this issue is not resolved yet.

Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
@dai1975 dai1975 marked this pull request as ready for review August 19, 2025 00:55
@dai1975 dai1975 requested a review from a team as a code owner August 19, 2025 00:55
Copy link
Contributor

@siburu siburu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dai1975 I have reviewed the PR. Please check my comments, thank you!

}
return nil
}, rtyAtt, rtyDel, rtyErr, retry.Context(ctx), retry.OnRetry(func(n uint, err error) {
// logRetryUpdateHeaders(src, dst, n, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented out logRetryUpdateHeaders function call should be removed ultimately, and so this issue is not resolved yet.

Comment on lines 79 to 81
export DEBUG_RELAYER_PRUNE_AFTER_BLOCKS_CHAIN_ibc0="10"
export DEBUG_RELAYER_PRUNE_AFTER_BLOCKS_PROVER_ibc0="10"
export DEBUG_RELAYER_SHFU_WAIT_ibc1="20"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to halve each number? It would shorten testing time much.

var err error
ret.updateHeaders, err = sh.SetupHeadersForUpdate(ctx, prover, counterparty)
if err != nil {
logger.ErrorContext(ctx, "failed to set up headers for LC update on both chains", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.ErrorContext(ctx, "failed to set up headers for LC update on both chains", err)
logger.ErrorContext(ctx, "failed to set up headers for LC update", err)


ret.chanUpg, ret.settled, err = querySettledChannelUpgrade(queryCtx, logger, prover, true)
if err != nil {
logger.ErrorContext(ctx, "failed to query the channel upgrade pair with proofs", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.ErrorContext(ctx, "failed to query the channel upgrade pair with proofs", err)
logger.ErrorContext(ctx, "failed to query the channel upgrade with proof", err)


ret.channel, ret.settled, err = querySettledChannel(queryCtx, logger, prover, true)
if err != nil {
logger.ErrorContext(ctx, "failed to query the channel with proofs", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.ErrorContext(ctx, "failed to query the channel with proofs", err)
logger.ErrorContext(ctx, "failed to query the channel with proof", err)

Comment on lines 363 to 373
/**
* upgErr is used only if
* - counterparty's actions is UPGRADE_ACTION_CANCEL or UPGRADE_ACTION_CANCEL_FLUSHCOMPLETE
* - my upgradeState is UPGRADE_STATE_UNINIT or UPGRADE_STATE_INIT
*/
if ret.upgradeState == UPGRADE_STATE_UNINIT || ret.upgradeState == UPGRADE_STATE_INIT {
ret.upgErr, err = QueryChannelUpgradeError(queryCtx, prover, true)
if err != nil {
return nil, err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If QueryChannelUpgradeError (for Ethereum) can't find an error event, it traverses events backwards ... up to the genesis block!
Therefore we must not call this function under overly broad conditions.

Fortunately, QueryChannelUpgradeError only accesses blockchain events, not blockchain states, and so we need not call this in queryUpgradeChannelState.

"src_height", srcCtx.Height().String(),
"dst_height", dstCtx.Height().String(),
) (*chantypes.QueryUpgradeResponse, bool, error) {
logger := &log.RelayLogger{Logger: logger_.With(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use logger as the function parameter name rather than logger_, because we don't want to accidentally access logger_ after this point.

Daisuke Kanda added 4 commits August 20, 2025 07:36
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
…it may take long time

Signed-off-by: Daisuke Kanda <daisuke.kanda@datachain.jp>
@dai1975 dai1975 marked this pull request as draft August 22, 2025 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants