Skip to content

Conversation

jiangxinmeng1
Copy link
Contributor

@jiangxinmeng1 jiangxinmeng1 commented Sep 9, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #21835

What this PR does / why we need it:

debug iscp restart


PR Type

Enhancement, Tests, Bug fix


Description

Major Enhancement: Implemented comprehensive asynchronous index support for HNSW, IVF, and fulltext indexes with CDC (Change Data Capture) synchronization
HNSW Vector Index: Added complete HNSW model implementation with CRUD operations, file-based persistence, memory management, and CDC synchronization capabilities
Index SQL Writer: Implemented comprehensive SQL writer for different index types supporting Insert, Upsert, Delete operations with CDC integration
ASYNC Keyword Support: Added parser support for ASYNC keyword in index creation across MySQL grammar and enhanced index parameter handling
CDC Integration: Added CDC task management utilities, consumer implementations, and watermark handling for index operations
Bug Fixes: Fixed null check bugs in watermark updater and enhanced error handling across multiple components
Test Coverage: Added extensive test suites for HNSW operations, index consumers, CDC functionality, and async index creation
DDL Integration: Enhanced ALTER TABLE and CREATE INDEX operations to support ISCP job management and CDC task lifecycle
Type System: Enhanced vector type descriptions and array casting with dimension validation


Diagram Walkthrough

flowchart LR
  A["SQL Parser"] -- "ASYNC keyword" --> B["Index Creation"]
  B -- "CDC Task" --> C["ISCP Consumer"]
  C -- "Index Operations" --> D["HNSW Model"]
  C -- "SQL Generation" --> E["Index SQL Writer"]
  D -- "Persistence" --> F["File Storage"]
  E -- "Database Updates" --> G["Index Tables"]
  H["CDC Stream"] --> C
  I["Watermark Updater"] --> C
Loading

File Walkthrough

Relevant files
Feature
3 files
index_sqlwriter.go
Implement Index SQL Writer for CDC Operations                       

pkg/iscp/index_sqlwriter.go

• Added comprehensive SQL writer implementation for different index
types (HNSW, IVFFLAT, Fulltext)
• Implemented IndexSqlWriter interface
with methods for Insert, Upsert, Delete operations
• Added SQL
generation logic for CDC operations on vector and fulltext indexes

Included type checking and validation for different data types and
index algorithms

+649/-0 
sync.go
Add HNSW Vector Index CDC Synchronization                               

pkg/vectorindex/hnsw/sync.go

• Added HNSW index CDC synchronization functionality with CdcSync
function
• Implemented parallel processing for index updates with
thread management
• Added model management for HNSW indexes including
loading, unloading, and capacity handling
• Included SQL generation
for metadata and index table updates

+676/-0 
model.go
Implement HNSW Vector Index Model Management                         

pkg/vectorindex/hnsw/model.go

• Added complete HNSW model implementation with CRUD operations (Add,
Remove, Contains, Search)
• Implemented file-based persistence with
chunked loading/saving and checksum validation
• Added memory
management with load/unload capabilities and capacity tracking

Included SQL generation for database synchronization of index data

+573/-0 
Enhancement
26 files
util.go
Enable Additional Data Type Support in ISCP Utils               

pkg/iscp/util.go

• Uncommented and enabled support for multiple data types (json, bit,
array_float32/64, date, time, decimal, uuid, etc.)
• Added appendHex
function for binary data conversion
• Enhanced convertColIntoSql to
handle NULL values with proper type casting
• Expanded type support in
extractRowFromVector for comprehensive data extraction

+134/-127
alter.go
Integrate ISCP Management in ALTER TABLE Operations           

pkg/sql/compile/alter.go

• Added ISCP job cleanup during ALTER TABLE operations to prevent
conflicts with temporary tables
• Implemented fulltext index handling
in the alter table copy process
• Added proper sequencing of
operations to drop ISCP tasks before data copying
• Enhanced error
handling and logging for index-related operations

+32/-11 
index_consumer.go
Add IndexConsumer for ISCP index synchronization                 

pkg/iscp/index_consumer.go

• Added new IndexConsumer struct implementing Consumer interface for
index synchronization
• Implemented data processing for both snapshot
and tail data types with SQL generation and execution
• Added support
for insert, delete, and upsert operations with proper SQL batching and
channel communication
• Integrated with IndexSqlWriter for generating
SQL statements and managing transaction execution

+435/-0 
ddl.go
Integrate ISCP CDC tasks for index operations                       

pkg/sql/compile/ddl.go

• Integrated ISCP job management for database and table operations

Added CDC task creation and deletion for vector and fulltext indexes

Implemented PITR (Point-in-Time Recovery) management for index
operations
• Added async index support with CDC task registration
during table creation and index operations

+80/-3   
cdc_util.go
Add CDC utilities for index task management                           

pkg/sql/compile/cdc_util.go

• Added comprehensive CDC task management utilities for index
operations
• Implemented PITR creation and deletion for index tables

Added validation logic for determining which indexes require CDC tasks

• Provided functions for bulk CDC task creation and deletion based on
table definitions

+273/-0 
fulltext.go
Enhance fulltext tokenization with composite key support 

pkg/sql/plan/fulltext.go

• Enhanced fulltext index tokenization to support both table scan and
values scan
• Added support for composite primary keys with proper
type handling
• Implemented primary key type extraction from function
parameters for values scan
• Updated function signature handling to
accommodate different scan types

+54/-12 
build_dml_util.go
Add async index support in DML operations                               

pkg/sql/plan/build_dml_util.go

• Added async index support by checking IndexAlgoParams for async flag

• Modified index processing to skip async indexes in pre-insert
operations
• Updated MultiTableIndex struct to include IndexAlgoParams
field
• Enhanced index validation to support async processing
workflows

+53/-4   
types.go
Add CDC types and structures for vector indexes                   

pkg/vectorindex/types.go

• Added VectorIndexCdc struct for CDC operations with insert, upsert,
delete support
• Implemented CDC entry management with JSON
serialization capabilities
• Added HnswCdcParam struct for CDC
parameter configuration
• Enhanced type definitions with async
parameter support for HNSW and IVF indexes

+87/-0   
ddl_index_algo.go
Add async support for fulltext and IVF indexes                     

pkg/sql/compile/ddl_index_algo.go

• Added async index support for fulltext and IVF indexes
• Integrated
CDC task creation for async indexes during index creation
• Enhanced
index creation workflow to support asynchronous processing
• Added
logging for async index operations

+37/-5   
func_hnsw.go
Implement HNSW CDC update function                                             

pkg/sql/plan/function/func_hnsw.go

• Implemented hnswCdcUpdate function for processing CDC updates

Added JSON deserialization for CDC data and parameter validation

Integrated with HNSW CdcSync functionality for index updates
• Added
comprehensive error handling and logging for CDC operations

+77/-0   
func_cast.go
Enhance array casting with dimension validation                   

pkg/sql/plan/function/func_cast.go

• Enhanced array casting with dimension validation for vector types

Added bypass for maximum dimension arrays to skip validation

Improved error reporting for array dimension mismatches
• Refactored
string to array conversion with proper type checking

+11/-2   
sqlexec.go
Add transaction support for SQL execution                               

pkg/vectorindex/sqlexec/sqlexec.go

• Added RunTxn function for executing SQL operations within
transactions
• Implemented proper context and account ID handling for
transactional operations
• Enhanced SQL execution capabilities with
transaction support

+27/-0   
create.go
Add async option support to index creation                             

pkg/sql/parsers/tree/create.go

• Added Async boolean field to IndexOption struct
• Implemented
formatting support for async keyword in index creation
• Enhanced
index option parsing to support asynchronous index creation

+4/-0     
list_builtIn.go
Register HNSW CDC update as built-in function                       

pkg/sql/plan/function/list_builtIn.go

• Added HNSW_CDC_UPDATE function definition to built-in functions list

• Implemented function signature with varchar and int32 parameters

Added overload configuration for HNSW CDC update functionality

+21/-0   
function_id.go
Register HNSW CDC update function ID                                         

pkg/sql/plan/function/function_id.go

• Added HNSW_CDC_UPDATE function ID constant
• Updated function end
number and registered function mapping
• Enhanced function registry
with new HNSW CDC functionality

+7/-1     
iscp.go
Add logging for ISCP task operations                                         

pkg/frontend/iscp.go

• Added logging for ISCP task creation and existence checking

Enhanced visibility into ISCP task management operations

+3/-0     
types.go
Add vector type description support                                           

pkg/container/types/types.go

• Added description string support for vector array types (VECF32,
VECF64)
• Enhanced type description formatting for vector types with
dimensions

+4/-0     
keywords.go
Add async keyword to MySQL parser                                               

pkg/sql/parsers/dialect/mysql/keywords.go

• Added "async" keyword to MySQL parser keyword list
• Enhanced parser
vocabulary for asynchronous index operations

+1/-0     
secondary_index_utils.go
Add async parameter support to index utilities                     

pkg/catalog/secondary_index_utils.go

• Added Async constant and support for async parameter in index
configurations
• Implemented IsIndexAsync function to check if an
index is configured for async processing
• Enhanced index parameter
parsing and string generation with async support
• Updated fulltext
and vector index parameter handling for async operations

+38/-3   
build_show_util.go
Add ASYNC parameter support to index SQL construction       

pkg/sql/plan/build_show_util.go

• Added support for ASYNC parameter in index creation SQL construction

• Checks for catalog.Async parameter and appends " ASYNC" to index
string when present

+5/-0     
consumer.go
Add IndexSync consumer type support                                           

pkg/iscp/consumer.go

• Added new consumer type handling for ConsumerType_IndexSync

Returns NewIndexConsumer when consumer type matches IndexSync

+3/-0     
data_retriever.go
Add account and table ID getter methods                                   

pkg/iscp/data_retriever.go

• Added GetAccountID() method returning r.accountID
• Added
GetTableID() method returning r.tableID

+8/-0     
types.go
Add index algorithm parameters field to MultiTableIndex   

pkg/sql/plan/types.go

• Added IndexAlgoParams string field to MultiTableIndex struct

Reformatted struct field alignment

+3/-2     
types.go
Extend DataRetriever interface with ID getter methods       

pkg/iscp/types.go

• Added GetAccountID() uint32 method to DataRetriever interface

Added GetTableID() uint64 method to DataRetriever interface

+2/-0     
types.go
Add async parameter to fulltext parser configuration         

pkg/fulltext/types.go

• Added Async string field to FullTextParserParam struct

+1/-0     
mysql_sql.y
Add ASYNC keyword and parsing support to MySQL grammar     

pkg/sql/parsers/dialect/mysql/mysql_sql.y

• Added ASYNC token definition and parsing rules
• Added ASYNC as
non-reserved keyword
• Added index option handling for async parameter

+10/-1   
Tests
16 files
index_consumer_test.go
Add Index Consumer Test Suite                                                       

pkg/iscp/index_consumer_test.go

• Added comprehensive test suite for index consumer functionality

Implemented mock objects for testing SQL execution and data retrieval

• Added test cases for HNSW snapshot and tail processing scenarios

Included validation of generated SQL statements for CDC operations

+381/-0 
sync_test.go
Add HNSW Synchronization Test Coverage                                     

pkg/vectorindex/hnsw/sync_test.go

• Added extensive test coverage for HNSW synchronization operations

Implemented test cases for various CDC operations (upsert, delete,
insert, update)
• Added tests for multi-model scenarios and shuffled
data processing
• Included mock functions for SQL execution and
streaming operations

+370/-0 
index_sqlwriter_test.go
Add comprehensive tests for IndexSqlWriter                             

pkg/iscp/index_sqlwriter_test.go

• Added comprehensive test cases for IndexSqlWriter functionality

Implemented test helpers for creating table definitions for IVF,
fulltext, and HNSW indexes
• Added tests for different primary key
types including composite keys and binary keys
• Verified SQL
generation for insert, upsert, and delete operations across different
index types

+242/-0 
model_test.go
Add comprehensive tests for HnswModel                                       

pkg/vectorindex/hnsw/model_test.go

• Added comprehensive test suite for HnswModel functionality

Implemented tests for index loading, searching, adding, and removing
vectors
• Added validation for model state management including dirty
flag handling
• Tested SQL generation and file operations for model
persistence

+206/-0 
search_test.go
Enhance HNSW search tests with multi-file support               

pkg/vectorindex/hnsw/search_test.go

• Added mock functions for testing with multiple index files

Implemented test helpers for creating metadata and index batches with
multiple files
• Added catalog SQL mock for testing index metadata
retrieval
• Enhanced test coverage for multi-file index scenarios

+112/-0 
func_hnsw_test.go
Add tests for HNSW CDC update function                                     

pkg/sql/plan/function/func_hnsw_test.go

• Added test cases for hnswCdcUpdate function validation
• Implemented
tests for null parameter handling and invalid JSON input
• Added
validation tests for required function arguments
• Ensured proper
error handling for various edge cases

+129/-0 
mysql_sql_test.go
Add async keyword tests for index creation                             

pkg/sql/parsers/dialect/mysql/mysql_sql_test.go

• Added test cases for async keyword support in index creation

Updated test expectations for fulltext and vector index creation with
async flag
• Enhanced parser test coverage for new async syntax

+9/-1     
build_test.go
Update HNSW build tests for unified model                               

pkg/vectorindex/hnsw/build_test.go

• Updated test code to use HnswModel instead of HnswBuildIndex and
HnswSearchIndex
• Modified test function calls to match new unified
model interface
• Updated variable declarations and type assertions
for new model structure

+5/-5     
types_test.go
Add tests for vector index CDC functionality                         

pkg/vectorindex/types_test.go

• Added comprehensive test suite for VectorIndexCdc functionality

Implemented tests for insert, delete, and upsert operations
• Added
JSON serialization and state management tests
• Validated CDC data
structure behavior and edge cases

+63/-0   
function_id_test.go
Update function ID tests for HNSW CDC                                       

pkg/sql/plan/function/function_id_test.go

• Updated predefined function IDs to include HNSW_CDC_UPDATE

Adjusted function end number for new function addition

+3/-1     
vector_ivf_async.result
Test results for async IVF vector indexing                             

test/distributed/cases/vector/vector_ivf_async.result

• Test results for IVF vector index with ASYNC option
• Shows
successful creation and querying of async vector indexes

+58/-0   
vector_ivf_async.sql
Test cases for async IVF vector indexing                                 

test/distributed/cases/vector/vector_ivf_async.sql

• Test cases for creating IVF vector indexes with ASYNC option
• Tests
index creation, data insertion, and vector similarity queries

+59/-0   
vector_hnsw_async.result
Test results for async HNSW vector indexing                           

test/distributed/cases/vector/vector_hnsw_async.result

• Test results for HNSW vector index with ASYNC option
• Shows
successful creation and querying of async HNSW indexes

+66/-0   
vector_hnsw_async.sql
Test cases for async HNSW vector indexing                               

test/distributed/cases/vector/vector_hnsw_async.sql

• Test cases for creating HNSW vector indexes with ASYNC option

Tests various scenarios including empty data, updates, and deletions

+96/-0   
fulltext_async.sql
Test cases for async fulltext indexing                                     

test/distributed/cases/fulltext/fulltext_async.sql

• Test cases for creating fulltext indexes with ASYNC option
• Tests
fulltext search functionality with async indexing

+21/-0   
fulltext_async.result
Test results for async fulltext indexing                                 

test/distributed/cases/fulltext/fulltext_async.result

• Test results for fulltext index with ASYNC option
• Shows successful
creation and searching with async fulltext indexes

+19/-0   
Code refactoring
2 files
search.go
Refactor HNSW search to use unified HnswModel                       

pkg/vectorindex/hnsw/search.go

• Removed HnswSearchIndex struct and moved functionality to HnswModel

• Refactored LoadMetadata to be a standalone function instead of
method
• Simplified search implementation by delegating index loading
and management to HnswModel
• Updated error handling and resource
management for index operations

+23/-199
build.go
Refactor HNSW build to use unified HnswModel                         

pkg/vectorindex/hnsw/build.go

• Replaced HnswBuildIndex struct with HnswModel for unified index
management
• Removed duplicate index building logic and delegated to
HnswModel methods
• Updated function signatures to use HnswModel
instead of HnswBuildIndex
• Simplified build process by leveraging
shared model functionality

+11/-184
Bug fix
3 files
watermark_updater.go
Fix null check and empty list handling bugs                           

pkg/iscp/watermark_updater.go

• Fixed null check bug in queryIndexLog function by using correct
index
• Added safety check for empty table ID list in
unregisterJobsByDBName
• Enhanced error handling and edge case
management for job unregistration

+12/-8   
build_ddl.go
Fix fulltext index parameter handling                                       

pkg/sql/plan/build_ddl.go

• Fixed fulltext index parameter handling to support cases without
parser name
• Enhanced index parameter JSON generation for fulltext
indexes
• Improved error handling in fulltext index creation

+4/-4     
mock_consumer.go
Fix tenant ID handling in mock consumer                                   

pkg/iscp/mock_consumer.go

• Updated context setup to use catalog.System_Account instead of
hardcoded value
• Enhanced tenant ID handling for internal SQL
consumer operations

+1/-1     
Error handling
2 files
util.go
Add error handling to fulltext index SQL generation           

pkg/sql/compile/util.go

• Updated genInsertIndexTableSqlForFullTextIndex function signature to
return error
• Enhanced error handling for fulltext index SQL
generation

+2/-2     
iteration.go
Enhance error handling in ISCP iteration                                 

pkg/iscp/iteration.go

• Added error handling for CollectChanges function call
• Enhanced
context setup with proper tenant ID for consumer operations
• Improved
error propagation in iteration execution

+5/-1     
Miscellaneous
5 files
hnsw.go
Simplify HNSW create function validation                                 

pkg/sql/plan/hnsw.go

• Commented out table scan validation in HNSW create function

Simplified node type checking for HNSW operations

+6/-4     
task_runner.go
Add debug logging for task executor retrieval                       

pkg/taskservice/task_runner.go

• Added debug logging statement with executor code information

+1/-0     
vector_hnsw.result
Update vector dimension error message format                         

test/distributed/cases/vector/vector_hnsw.result

• Updated error message format for vector dimension mismatch

+1/-1     
vector_index.result
Update vector dimension error message format                         

test/distributed/cases/vector/vector_index.result

• Updated error message format for vector dimension mismatch

+1/-1     
array.result
Update vector dimension error message format                         

test/distributed/cases/array/array.result

• Updated error message format for vector dimension mismatch

+2/-2     
Additional files
1 files
mysql_sql.go +8607/-8632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working Possible security concern Review effort 5/5 size/XXL Denotes a PR that changes 2000+ lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants