Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 14, 2025

Implementation: "Did You Mean" Suggestions for Schema Validation

Objective

Implement fuzzy string matching to suggest correct field names when users make typos in workflow frontmatter fields using Levenshtein distance algorithm.

✅ Completed

  • Implement Levenshtein distance algorithm in pkg/parser/schema.go
  • Update findClosestMatches() to use Levenshtein distance instead of heuristics
  • Update generateFieldSuggestions() for better message formatting
  • Fix existing tests to match Levenshtein distance behavior
  • Remove duplicate/unused code from pkg/workflow/ package
  • Fix linting issues (formatting and unused function)
  • All tests pass (make test-unit)
  • All linting passes (make lint)

Implementation Details

Levenshtein Distance Algorithm:

  • Proper dynamic programming implementation in pkg/parser/schema.go
  • Handles empty strings correctly
  • Returns minimum edit distance (insertions, deletions, substitutions)
  • Single implementation used throughout the codebase (no duplication)

Suggestion Logic:

  • Returns suggestions with edit distance ≤ 3, sorted by distance
  • Single best match when only one suggestion with clear distance advantage
  • Multiple suggestions when multiple fields are equally similar
  • Exact matches are skipped (distance 0)

Integration:

  • Integrated directly into pkg/parser/schema.go
  • Replaced old heuristic-based matching with Levenshtein distance
  • Improved suggestion messages for better user experience
  • No code duplication
  • Clean code with no linting issues

Examples

Single typo:

Unknown property: permisions. Did you mean 'permissions'?

Multiple typos:

Unknown properties: permisions, engnie. Did you mean: permissions, engine

Test Coverage

  • ✅ Updated parser tests to reflect Levenshtein behavior
  • ✅ All schema validation tests passing
  • ✅ No duplicate code or unused files
  • ✅ Code passes all linting checks
Original prompt

This section details on the original issue you should resolve

<issue_title>[task] Implement "Did You Mean" Suggestions for Schema Validation</issue_title>
<issue_description>## Objective
Implement fuzzy string matching to suggest correct field names when users make typos in workflow frontmatter fields (e.g., permisionspermissions, engnieengine).

Context

Part of Discussion #3956 - Workflow Validation and Error Feedback Quality improvements.

Users frequently make typos in frontmatter fields. The compiler should help by suggesting the correct field name using Levenshtein distance-based matching.

Implementation Approach

1. Create pkg/workflow/schema_fuzzy_match.go

Implement fuzzy matching logic:

// Function signature
func suggestFieldName(invalidField string, validFields []string) []string

// Function signature  
func enhanceSchemaValidationError(err error, schema map[string]interface{}) error

Requirements:

  • Use Levenshtein distance algorithm
  • Return suggestions with edit distance ≤ 3, sorted by distance
  • Suggest single best match when distance ≤ 2
  • List multiple suggestions when distance = 3
  • Integrate with existing console.FormatErrorMessage infrastructure

2. Update pkg/workflow/schema_validation.go

Integration points:

  • Extract valid field names from JSON schema before validation
  • Wrap schema validation errors with enhanceSchemaValidationError
  • Format enhanced errors using console package

3. Create pkg/workflow/schema_fuzzy_match_test.go

Test coverage required:

  • Common typos return correct suggestions (permisions, engnie, toolz, timeout_minute, runs_on)
  • Exact matches don't trigger suggestions
  • Very different strings don't suggest anything
  • Multiple similar fields return multiple suggestions
  • Nested field suggestions work (e.g., engine.model)

Files to Create/Modify

  • Create: pkg/workflow/schema_fuzzy_match.go
  • Create: pkg/workflow/schema_fuzzy_match_test.go
  • Update: pkg/workflow/schema_validation.go

Acceptance Criteria

  • Detects typos with edit distance ≤ 3 from valid field names
  • Suggests single best match when similarity is high (distance ≤ 2)
  • Lists multiple suggestions when multiple fields are similar (distance = 3)
  • Integrates with existing console.FormatErrorMessage infrastructure
  • Unit tests cover common typos: permisions, engnie, toolz, timeout_minute, runs_on
  • Handles both top-level frontmatter fields and nested fields
  • All tests pass (make test-unit)

Priority

High - This directly impacts user experience when they make simple typos
Related to #3956

AI generated by Plan Command for discussion #3956</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI self-assigned this Nov 14, 2025
Copilot AI and others added 2 commits November 14, 2025 22:59
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan
Copy link
Contributor

@copilot Work

Update test cases to reflect actual Levenshtein distance behavior:
- Exact matches are skipped (distance 0)
- Tests now use typos that are actually close to valid field names
- Updated expected messages to match new suggestion format

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 15, 2025

@copilot Work

Completed! The fuzzy matching implementation is now fully working with all tests passing.

Changes made (commit 6451e75):

  • Fixed test expectations to match Levenshtein distance behavior
  • All unit tests now pass

Key features:

  • Single typo: permisions → "Did you mean 'permissions'?"
  • Multiple typos: permisions, toolz, engnie → "Did you mean: permissions, tools, engine"
  • Exact matches are skipped
  • Suggestions only shown for edit distance ≤ 3

Fuzzy matching demo

Example output:

✗ Unknown property: permisions. Did you mean 'permissions'?
✗ Unknown properties: permisions, toolz, engnie. Did you mean: permissions, tools, engine

Copilot finished work on behalf of pelikhan November 15, 2025 00:14
@pelikhan pelikhan marked this pull request as ready for review November 15, 2025 00:30
Copilot AI review requested due to automatic review settings November 15, 2025 00:30
Copilot finished reviewing on behalf of pelikhan November 15, 2025 00:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements "Did You Mean" suggestions for schema validation errors using the Levenshtein distance algorithm. When users make typos in workflow frontmatter fields (e.g., permisionspermissions), the compiler now suggests the correct field names.

Key Changes:

  • Replaced heuristic-based string matching with proper Levenshtein distance algorithm in pkg/parser/schema.go
  • Updated suggestion messages to be more helpful (single vs multiple suggestions)
  • Added comprehensive test coverage for fuzzy matching logic

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pkg/workflow/schema_fuzzy_match.go New file with Levenshtein distance implementation and helper functions (note: appears to contain unused code not integrated into production)
pkg/workflow/schema_fuzzy_match_test.go Comprehensive tests for fuzzy matching logic with 39+ test cases
pkg/parser/schema.go Integration of Levenshtein distance into existing schema validation with updated findClosestMatches() and generateFieldSuggestions()
pkg/parser/schema_suggestions_test.go Updated tests to reflect Levenshtein distance behavior instead of heuristic matching

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 12 to 61
func levenshteinDistance(a, b string) int {
aLen := len(a)
bLen := len(b)

// Early exit for empty strings
if aLen == 0 {
return bLen
}
if bLen == 0 {
return aLen
}

// Create a 2D matrix for dynamic programming
// We only need the previous row, so we can optimize space
previousRow := make([]int, bLen+1)
currentRow := make([]int, bLen+1)

// Initialize the first row (distance from empty string)
for i := 0; i <= bLen; i++ {
previousRow[i] = i
}

// Calculate distances for each character in string a
for i := 1; i <= aLen; i++ {
currentRow[0] = i // Distance from empty string

for j := 1; j <= bLen; j++ {
// Cost of substitution (0 if characters match, 1 otherwise)
cost := 1
if a[i-1] == b[j-1] {
cost = 0
}

// Minimum of:
// - Deletion: previousRow[j] + 1
// - Insertion: currentRow[j-1] + 1
// - Substitution: previousRow[j-1] + cost
currentRow[j] = min3(
previousRow[j]+1, // deletion
currentRow[j-1]+1, // insertion
previousRow[j-1]+cost, // substitution
)
}

// Swap rows for next iteration
previousRow, currentRow = currentRow, previousRow
}

return previousRow[bLen]
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The levenshteinDistance function is duplicated between pkg/workflow/schema_fuzzy_match.go and pkg/parser/schema.go. The implementations are nearly identical except for the minimum calculation (lines 49-53 use min3() while the parser version uses nested min() calls).

According to the coding guidelines (CodingGuidelineID 1000002), code should avoid duplication. Consider extracting this function to a shared utility package (e.g., pkg/stringutil or similar) that both packages can import.

Copilot uses AI. Check for mistakes.
pelikhan and others added 2 commits November 14, 2025 16:39
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@pelikhan
Copy link
Contributor

@copilot remove duplicate lev distance algo

Delete pkg/workflow/schema_fuzzy_match.go and associated tests as they contain
unused code that duplicates the production implementation in pkg/parser/schema.go.

The actual fuzzy matching logic is integrated directly into pkg/parser/schema.go
and is working correctly. All tests pass.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 15, 2025

@copilot remove duplicate lev distance algo

Removed duplicate Levenshtein distance algorithm (commit a8a2a34).

Deleted pkg/workflow/schema_fuzzy_match.go and its tests since they contained unused code that duplicated the production implementation in pkg/parser/schema.go.

The fuzzy matching feature continues to work correctly with the single implementation in the parser package. All tests pass.

Copilot finished work on behalf of pelikhan November 15, 2025 00:47
@pelikhan
Copy link
Contributor

@copilot Lint

- Run go fmt to fix formatting (removed trailing whitespace)
- Remove unused abs() function that was left from old heuristic-based matching
- All linting checks now pass

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 15, 2025

@copilot Lint

Linting completed (commit 3d541ce).

Fixed issues:

  • Formatted code with go fmt (removed trailing whitespace)
  • Removed unused abs() function that was left over from old heuristic-based matching

All linting checks now pass (make lint).

Copilot finished work on behalf of pelikhan November 15, 2025 04:33
@pelikhan pelikhan merged commit c7902d7 into main Nov 15, 2025
41 of 47 checks passed
@pelikhan pelikhan deleted the copilot/implement-fuzzy-suggestions-schema-validation branch November 15, 2025 04:56
github-actions bot added a commit that referenced this pull request Nov 15, 2025
This commit updates the documentation based on features merged in the last 24 hours.

## Features Documented

### 1. Cross-repository --repo flag (PR #4007)
- Added --repo option to enable, disable, and logs commands
- Enables workflow management operations across different repositories
- Updated CLI reference with examples and option descriptions

### 2. Workflow description extraction (PR #4008)
- Documents that add and trial commands now display workflow descriptions
- Descriptions are extracted from frontmatter description field
- Provides better context about workflow purpose when adding or testing

### 3. Import cache for offline compilation (PR #3981)
- Remote imports are automatically cached in .github/aw/imports/
- Cache stores imports by commit SHA for efficient reuse
- Enables offline compilation once imports have been downloaded
- Updated CLI, imports reference, and packaging guides

### 4. "Did You Mean" schema validation suggestions (PR #3999)
- Compiler suggests correct field names for typos using fuzzy matching
- Based on Levenshtein distance algorithm
- Added tip callout and new error documentation section
- Includes examples of common typos detected

## Files Modified

- docs/src/content/docs/setup/cli.md
- docs/src/content/docs/reference/imports.md
- docs/src/content/docs/guides/packaging-imports.md
- docs/src/content/docs/troubleshooting/errors.md

## Related PRs

- #4007 - Add --repo options to more commands
- #4008 - Add workflow description extraction feature
- #3981 - Add import cache for offline workflow compilation
- #3999 - Implement 'Did You Mean' suggestions for schema validation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[task] Implement "Did You Mean" Suggestions for Schema Validation

2 participants