-
Notifications
You must be signed in to change notification settings - Fork 286
IGNORE ME: debug iscp restart #22492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jiangxinmeng1
wants to merge
295
commits into
matrixorigin:main
Choose a base branch
from
jiangxinmeng1:iscp_merge
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ne into cdc_sqlexecutor_cleanup
be08cb3
to
cc29c71
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/bug
Something isn't working
Possible security concern
Review effort 5/5
size/XXL
Denotes a PR that changes 2000+ lines
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue #21835
What this PR does / why we need it:
debug iscp restart
PR Type
Enhancement, Tests, Bug fix
Description
• Major Enhancement: Implemented comprehensive asynchronous index support for HNSW, IVF, and fulltext indexes with CDC (Change Data Capture) synchronization
• HNSW Vector Index: Added complete HNSW model implementation with CRUD operations, file-based persistence, memory management, and CDC synchronization capabilities
• Index SQL Writer: Implemented comprehensive SQL writer for different index types supporting Insert, Upsert, Delete operations with CDC integration
• ASYNC Keyword Support: Added parser support for
ASYNC
keyword in index creation across MySQL grammar and enhanced index parameter handling• CDC Integration: Added CDC task management utilities, consumer implementations, and watermark handling for index operations
• Bug Fixes: Fixed null check bugs in watermark updater and enhanced error handling across multiple components
• Test Coverage: Added extensive test suites for HNSW operations, index consumers, CDC functionality, and async index creation
• DDL Integration: Enhanced ALTER TABLE and CREATE INDEX operations to support ISCP job management and CDC task lifecycle
• Type System: Enhanced vector type descriptions and array casting with dimension validation
Diagram Walkthrough
File Walkthrough
3 files
index_sqlwriter.go
Implement Index SQL Writer for CDC Operations
pkg/iscp/index_sqlwriter.go
• Added comprehensive SQL writer implementation for different index
types (HNSW, IVFFLAT, Fulltext)
• Implemented
IndexSqlWriter
interfacewith methods for Insert, Upsert, Delete operations
• Added SQL
generation logic for CDC operations on vector and fulltext indexes
•
Included type checking and validation for different data types and
index algorithms
sync.go
Add HNSW Vector Index CDC Synchronization
pkg/vectorindex/hnsw/sync.go
• Added HNSW index CDC synchronization functionality with
CdcSync
function
• Implemented parallel processing for index updates with
thread management
• Added model management for HNSW indexes including
loading, unloading, and capacity handling
• Included SQL generation
for metadata and index table updates
model.go
Implement HNSW Vector Index Model Management
pkg/vectorindex/hnsw/model.go
• Added complete HNSW model implementation with CRUD operations (Add,
Remove, Contains, Search)
• Implemented file-based persistence with
chunked loading/saving and checksum validation
• Added memory
management with load/unload capabilities and capacity tracking
•
Included SQL generation for database synchronization of index data
26 files
util.go
Enable Additional Data Type Support in ISCP Utils
pkg/iscp/util.go
• Uncommented and enabled support for multiple data types (json, bit,
array_float32/64, date, time, decimal, uuid, etc.)
• Added
appendHex
function for binary data conversion
• Enhanced
convertColIntoSql
tohandle NULL values with proper type casting
• Expanded type support in
extractRowFromVector
for comprehensive data extractionalter.go
Integrate ISCP Management in ALTER TABLE Operations
pkg/sql/compile/alter.go
• Added ISCP job cleanup during ALTER TABLE operations to prevent
conflicts with temporary tables
• Implemented fulltext index handling
in the alter table copy process
• Added proper sequencing of
operations to drop ISCP tasks before data copying
• Enhanced error
handling and logging for index-related operations
index_consumer.go
Add IndexConsumer for ISCP index synchronization
pkg/iscp/index_consumer.go
• Added new IndexConsumer struct implementing Consumer interface for
index synchronization
• Implemented data processing for both snapshot
and tail data types with SQL generation and execution
• Added support
for insert, delete, and upsert operations with proper SQL batching and
channel communication
• Integrated with IndexSqlWriter for generating
SQL statements and managing transaction execution
ddl.go
Integrate ISCP CDC tasks for index operations
pkg/sql/compile/ddl.go
• Integrated ISCP job management for database and table operations
•
Added CDC task creation and deletion for vector and fulltext indexes
•
Implemented PITR (Point-in-Time Recovery) management for index
operations
• Added async index support with CDC task registration
during table creation and index operations
cdc_util.go
Add CDC utilities for index task management
pkg/sql/compile/cdc_util.go
• Added comprehensive CDC task management utilities for index
operations
• Implemented PITR creation and deletion for index tables
•
Added validation logic for determining which indexes require CDC tasks
• Provided functions for bulk CDC task creation and deletion based on
table definitions
fulltext.go
Enhance fulltext tokenization with composite key support
pkg/sql/plan/fulltext.go
• Enhanced fulltext index tokenization to support both table scan and
values scan
• Added support for composite primary keys with proper
type handling
• Implemented primary key type extraction from function
parameters for values scan
• Updated function signature handling to
accommodate different scan types
build_dml_util.go
Add async index support in DML operations
pkg/sql/plan/build_dml_util.go
• Added async index support by checking
IndexAlgoParams
for async flag• Modified index processing to skip async indexes in pre-insert
operations
• Updated MultiTableIndex struct to include
IndexAlgoParams
field
• Enhanced index validation to support async processing
workflows
types.go
Add CDC types and structures for vector indexes
pkg/vectorindex/types.go
• Added VectorIndexCdc struct for CDC operations with insert, upsert,
delete support
• Implemented CDC entry management with JSON
serialization capabilities
• Added HnswCdcParam struct for CDC
parameter configuration
• Enhanced type definitions with async
parameter support for HNSW and IVF indexes
ddl_index_algo.go
Add async support for fulltext and IVF indexes
pkg/sql/compile/ddl_index_algo.go
• Added async index support for fulltext and IVF indexes
• Integrated
CDC task creation for async indexes during index creation
• Enhanced
index creation workflow to support asynchronous processing
• Added
logging for async index operations
func_hnsw.go
Implement HNSW CDC update function
pkg/sql/plan/function/func_hnsw.go
• Implemented
hnswCdcUpdate
function for processing CDC updates•
Added JSON deserialization for CDC data and parameter validation
•
Integrated with HNSW CdcSync functionality for index updates
• Added
comprehensive error handling and logging for CDC operations
func_cast.go
Enhance array casting with dimension validation
pkg/sql/plan/function/func_cast.go
• Enhanced array casting with dimension validation for vector types
•
Added bypass for maximum dimension arrays to skip validation
•
Improved error reporting for array dimension mismatches
• Refactored
string to array conversion with proper type checking
sqlexec.go
Add transaction support for SQL execution
pkg/vectorindex/sqlexec/sqlexec.go
• Added
RunTxn
function for executing SQL operations withintransactions
• Implemented proper context and account ID handling for
transactional operations
• Enhanced SQL execution capabilities with
transaction support
create.go
Add async option support to index creation
pkg/sql/parsers/tree/create.go
• Added
Async
boolean field to IndexOption struct• Implemented
formatting support for async keyword in index creation
• Enhanced
index option parsing to support asynchronous index creation
list_builtIn.go
Register HNSW CDC update as built-in function
pkg/sql/plan/function/list_builtIn.go
• Added
HNSW_CDC_UPDATE
function definition to built-in functions list• Implemented function signature with varchar and int32 parameters
•
Added overload configuration for HNSW CDC update functionality
function_id.go
Register HNSW CDC update function ID
pkg/sql/plan/function/function_id.go
• Added
HNSW_CDC_UPDATE
function ID constant• Updated function end
number and registered function mapping
• Enhanced function registry
with new HNSW CDC functionality
iscp.go
Add logging for ISCP task operations
pkg/frontend/iscp.go
• Added logging for ISCP task creation and existence checking
•
Enhanced visibility into ISCP task management operations
types.go
Add vector type description support
pkg/container/types/types.go
• Added description string support for vector array types (VECF32,
VECF64)
• Enhanced type description formatting for vector types with
dimensions
keywords.go
Add async keyword to MySQL parser
pkg/sql/parsers/dialect/mysql/keywords.go
• Added "async" keyword to MySQL parser keyword list
• Enhanced parser
vocabulary for asynchronous index operations
secondary_index_utils.go
Add async parameter support to index utilities
pkg/catalog/secondary_index_utils.go
• Added
Async
constant and support for async parameter in indexconfigurations
• Implemented
IsIndexAsync
function to check if anindex is configured for async processing
• Enhanced index parameter
parsing and string generation with async support
• Updated fulltext
and vector index parameter handling for async operations
build_show_util.go
Add ASYNC parameter support to index SQL construction
pkg/sql/plan/build_show_util.go
• Added support for
ASYNC
parameter in index creation SQL construction• Checks for
catalog.Async
parameter and appends " ASYNC" to indexstring when present
consumer.go
Add IndexSync consumer type support
pkg/iscp/consumer.go
• Added new consumer type handling for
ConsumerType_IndexSync
•
Returns
NewIndexConsumer
when consumer type matchesIndexSync
data_retriever.go
Add account and table ID getter methods
pkg/iscp/data_retriever.go
• Added
GetAccountID()
method returningr.accountID
• Added
GetTableID()
method returningr.tableID
types.go
Add index algorithm parameters field to MultiTableIndex
pkg/sql/plan/types.go
• Added
IndexAlgoParams
string field toMultiTableIndex
struct•
Reformatted struct field alignment
types.go
Extend DataRetriever interface with ID getter methods
pkg/iscp/types.go
• Added
GetAccountID() uint32
method toDataRetriever
interface•
Added
GetTableID() uint64
method toDataRetriever
interfacetypes.go
Add async parameter to fulltext parser configuration
pkg/fulltext/types.go
• Added
Async
string field toFullTextParserParam
structmysql_sql.y
Add ASYNC keyword and parsing support to MySQL grammar
pkg/sql/parsers/dialect/mysql/mysql_sql.y
• Added
ASYNC
token definition and parsing rules• Added
ASYNC
asnon-reserved keyword
• Added index option handling for async parameter
16 files
index_consumer_test.go
Add Index Consumer Test Suite
pkg/iscp/index_consumer_test.go
• Added comprehensive test suite for index consumer functionality
•
Implemented mock objects for testing SQL execution and data retrieval
• Added test cases for HNSW snapshot and tail processing scenarios
•
Included validation of generated SQL statements for CDC operations
sync_test.go
Add HNSW Synchronization Test Coverage
pkg/vectorindex/hnsw/sync_test.go
• Added extensive test coverage for HNSW synchronization operations
•
Implemented test cases for various CDC operations (upsert, delete,
insert, update)
• Added tests for multi-model scenarios and shuffled
data processing
• Included mock functions for SQL execution and
streaming operations
index_sqlwriter_test.go
Add comprehensive tests for IndexSqlWriter
pkg/iscp/index_sqlwriter_test.go
• Added comprehensive test cases for IndexSqlWriter functionality
•
Implemented test helpers for creating table definitions for IVF,
fulltext, and HNSW indexes
• Added tests for different primary key
types including composite keys and binary keys
• Verified SQL
generation for insert, upsert, and delete operations across different
index types
model_test.go
Add comprehensive tests for HnswModel
pkg/vectorindex/hnsw/model_test.go
• Added comprehensive test suite for HnswModel functionality
•
Implemented tests for index loading, searching, adding, and removing
vectors
• Added validation for model state management including dirty
flag handling
• Tested SQL generation and file operations for model
persistence
search_test.go
Enhance HNSW search tests with multi-file support
pkg/vectorindex/hnsw/search_test.go
• Added mock functions for testing with multiple index files
•
Implemented test helpers for creating metadata and index batches with
multiple files
• Added catalog SQL mock for testing index metadata
retrieval
• Enhanced test coverage for multi-file index scenarios
func_hnsw_test.go
Add tests for HNSW CDC update function
pkg/sql/plan/function/func_hnsw_test.go
• Added test cases for
hnswCdcUpdate
function validation• Implemented
tests for null parameter handling and invalid JSON input
• Added
validation tests for required function arguments
• Ensured proper
error handling for various edge cases
mysql_sql_test.go
Add async keyword tests for index creation
pkg/sql/parsers/dialect/mysql/mysql_sql_test.go
• Added test cases for async keyword support in index creation
•
Updated test expectations for fulltext and vector index creation with
async flag
• Enhanced parser test coverage for new async syntax
build_test.go
Update HNSW build tests for unified model
pkg/vectorindex/hnsw/build_test.go
• Updated test code to use HnswModel instead of HnswBuildIndex and
HnswSearchIndex
• Modified test function calls to match new unified
model interface
• Updated variable declarations and type assertions
for new model structure
types_test.go
Add tests for vector index CDC functionality
pkg/vectorindex/types_test.go
• Added comprehensive test suite for VectorIndexCdc functionality
•
Implemented tests for insert, delete, and upsert operations
• Added
JSON serialization and state management tests
• Validated CDC data
structure behavior and edge cases
function_id_test.go
Update function ID tests for HNSW CDC
pkg/sql/plan/function/function_id_test.go
• Updated predefined function IDs to include
HNSW_CDC_UPDATE
•
Adjusted function end number for new function addition
vector_ivf_async.result
Test results for async IVF vector indexing
test/distributed/cases/vector/vector_ivf_async.result
• Test results for IVF vector index with ASYNC option
• Shows
successful creation and querying of async vector indexes
vector_ivf_async.sql
Test cases for async IVF vector indexing
test/distributed/cases/vector/vector_ivf_async.sql
• Test cases for creating IVF vector indexes with ASYNC option
• Tests
index creation, data insertion, and vector similarity queries
vector_hnsw_async.result
Test results for async HNSW vector indexing
test/distributed/cases/vector/vector_hnsw_async.result
• Test results for HNSW vector index with ASYNC option
• Shows
successful creation and querying of async HNSW indexes
vector_hnsw_async.sql
Test cases for async HNSW vector indexing
test/distributed/cases/vector/vector_hnsw_async.sql
• Test cases for creating HNSW vector indexes with ASYNC option
•
Tests various scenarios including empty data, updates, and deletions
fulltext_async.sql
Test cases for async fulltext indexing
test/distributed/cases/fulltext/fulltext_async.sql
• Test cases for creating fulltext indexes with ASYNC option
• Tests
fulltext search functionality with async indexing
fulltext_async.result
Test results for async fulltext indexing
test/distributed/cases/fulltext/fulltext_async.result
• Test results for fulltext index with ASYNC option
• Shows successful
creation and searching with async fulltext indexes
2 files
search.go
Refactor HNSW search to use unified HnswModel
pkg/vectorindex/hnsw/search.go
• Removed HnswSearchIndex struct and moved functionality to HnswModel
• Refactored LoadMetadata to be a standalone function instead of
method
• Simplified search implementation by delegating index loading
and management to HnswModel
• Updated error handling and resource
management for index operations
build.go
Refactor HNSW build to use unified HnswModel
pkg/vectorindex/hnsw/build.go
• Replaced HnswBuildIndex struct with HnswModel for unified index
management
• Removed duplicate index building logic and delegated to
HnswModel methods
• Updated function signatures to use HnswModel
instead of HnswBuildIndex
• Simplified build process by leveraging
shared model functionality
3 files
watermark_updater.go
Fix null check and empty list handling bugs
pkg/iscp/watermark_updater.go
• Fixed null check bug in
queryIndexLog
function by using correctindex
• Added safety check for empty table ID list in
unregisterJobsByDBName
• Enhanced error handling and edge case
management for job unregistration
build_ddl.go
Fix fulltext index parameter handling
pkg/sql/plan/build_ddl.go
• Fixed fulltext index parameter handling to support cases without
parser name
• Enhanced index parameter JSON generation for fulltext
indexes
• Improved error handling in fulltext index creation
mock_consumer.go
Fix tenant ID handling in mock consumer
pkg/iscp/mock_consumer.go
• Updated context setup to use
catalog.System_Account
instead ofhardcoded value
• Enhanced tenant ID handling for internal SQL
consumer operations
2 files
util.go
Add error handling to fulltext index SQL generation
pkg/sql/compile/util.go
• Updated
genInsertIndexTableSqlForFullTextIndex
function signature toreturn error
• Enhanced error handling for fulltext index SQL
generation
iteration.go
Enhance error handling in ISCP iteration
pkg/iscp/iteration.go
• Added error handling for
CollectChanges
function call• Enhanced
context setup with proper tenant ID for consumer operations
• Improved
error propagation in iteration execution
5 files
hnsw.go
Simplify HNSW create function validation
pkg/sql/plan/hnsw.go
• Commented out table scan validation in HNSW create function
•
Simplified node type checking for HNSW operations
task_runner.go
Add debug logging for task executor retrieval
pkg/taskservice/task_runner.go
• Added debug logging statement with executor code information
vector_hnsw.result
Update vector dimension error message format
test/distributed/cases/vector/vector_hnsw.result
• Updated error message format for vector dimension mismatch
vector_index.result
Update vector dimension error message format
test/distributed/cases/vector/vector_index.result
• Updated error message format for vector dimension mismatch
array.result
Update vector dimension error message format
test/distributed/cases/array/array.result
• Updated error message format for vector dimension mismatch
1 files