Skip to content

Conversation

matifali
Copy link
Member

@matifali matifali commented Sep 1, 2025

Would prevents issues like the one I fixed in #404

This change should have caught that error as

2025-09-01 13:12:45.132 [erro]  ...
    msg= Error during "README parsing" phase of README validation:
         - "registry/umair/modules/digitalocean-region/README.md": incorrect source URL format: found "registry.coder.com/coder/digitalocean-region/coder", expected "registry.coder.com/umair/digitalocean-region/coder"
error: script "validate-readme" exited with code 1

diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index
53b912b..eb3cf8b 100644 --- a/.github/workflows/ci.yaml +++
b/.github/workflows/ci.yaml @@ -63,8 +63,8 @@ jobs: - name: Set up Go
uses: actions/setup-go@v5 with: - go-version: "1.23.2" - - name:
Validate contributors + go-version: "1.25.0" + - name: Validate Reademde
run: go build ./cmd/readmevalidation && ./readmevalidation - name:
Remove build file artifact run: rm ./readmevalidation

Signed-off-by: Muhammad Atif Ali <me@matifali.dev>
@matifali matifali requested a review from Parkreiner September 1, 2025 13:21
@matifali matifali self-assigned this Sep 1, 2025
@matifali matifali force-pushed the atif/validate-readme-source branch 3 times, most recently from 98efa49 to 898e773 Compare September 1, 2025 14:50
- Downgrade Go version in CI to 1.24 for consistency.
- Fix naming and path issues in `readmevalidation` code.
- Improve regex validation for module and namespace names.
- Correct typos and improve comments for clarity.
@matifali matifali force-pushed the atif/validate-readme-source branch from 898e773 to 3b6b1ba Compare September 1, 2025 14:53
with:
go-version: "1.23.2"
- name: Validate contributors
go-version: "1.24"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having weird issues with running Golang CI locally, and had to match the Go version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests generated by gemini CLI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we still want to process all the blocks in a file, I think it'd be really good to have some test cases that include 2+ blocks

@matifali matifali changed the title chore: Aadd validation for Terraform module source URLs chore: add validation for Terraform module source URLs Sep 1, 2025
@matifali matifali changed the title chore: add validation for Terraform module source URLs chore: add validation for module source URLs Sep 1, 2025
Copy link
Member

@Parkreiner Parkreiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close. I think there's a couple of changes we can make the code a bit better, though

return namespace, moduleName, nil
}

func validateModuleSourceURL(body string, filePath string) []error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I knew more about Coder templates. Do templates have a source field (or anything equivalent) that might need to be validated, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its not applicable to templates

- Make regex more specific for registry.coder.com patterns only
- Refactor to add namespace and resourceName fields to coderResourceReadme struct
- Inline path parsing logic into parseCoderResourceReadme
- Update validateModuleSourceURL to use struct fields instead of filePath parameter
- Simplify Terraform block detection logic
- Reduce nesting with early continue statements
- Add comment explaining regex pattern
- Extract registry.coder.com into a constant
- Improve test readability with extracted variables
- Remove redundant checks in tests
- Replace custom contains function with strings.Contains

Co-authored-by: matifali <matifali@users.noreply.github.com>
@matifali matifali requested a review from Parkreiner September 4, 2025 14:23
@matifali matifali removed the request for review from 35C4n0r September 13, 2025 03:13
Copy link
Member

@Parkreiner Parkreiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're looking better than before, but I'm still a confused by where the code is at now, and whether the current logic is what we want

var (
// Matches Terraform source lines with registry.coder.com URLs
// Pattern: source = "registry.coder.com/namespace/module/coder"
terraformSourceRe = regexp.MustCompile(`^\s*source\s*=\s*"` + registryDomain + `/([^/]+)/([^/]+)/coder"`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to swap the [^/]+ patterns for something even more specific? Right now, they allow any non-slash character, but don't we want to make sure the URLs don't have certain characters like underscores?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we still want to process all the blocks in a file, I think it'd be really good to have some test cases that include 2+ blocks

- Use more specific regex pattern [a-zA-Z0-9-]+ instead of [^/]+ for namespace/module names
- Process all Terraform blocks instead of just the first one
- Report correct source if found in any block, only report incorrect sources if no correct source exists
- Add comprehensive test cases for multiple Terraform blocks

Co-authored-by: matifali <10648092+matifali@users.noreply.github.com>
@matifali matifali requested a review from Parkreiner September 19, 2025 06:55
@DevelopmentCats
Copy link
Contributor

@Parkreiner I just want to poke you on this, to see if there is anything I can help unblock here.

@Parkreiner
Copy link
Member

@DevelopmentCats Sorry, I had just gotten back from vacation last week, so this got a bit lost in the shuffle. I'll do an extra pass, and let you know

@DevelopmentCats
Copy link
Contributor

@DevelopmentCats Sorry, I had just gotten back from vacation last week, so this got a bit lost in the shuffle. I'll do an extra pass, and let you know

It's all good. I know things have been crazy busy 😃

Copy link
Member

@Parkreiner Parkreiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DevelopmentCats I think I'm still wrapping my head around the code a little bit, but I feel pretty sure the validation logic is still off, since we're silently removing errors in some cases

Comment on lines +38 to +41
if strings.HasPrefix(nextLine, "```tf") {
isInsideTerraform = true
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an edge case, but I feel like we want to handle cases where someone accidentally nests Terraform snippets inside each other like this:

```tf
```tf

expectedSource := registryDomain + "/" + rm.namespace + "/" + rm.resourceName + "/coder"

trimmed := strings.TrimSpace(rm.body)
foundCorrectSource := false
Copy link
Member

@Parkreiner Parkreiner Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: I think there's something we can salvage from this comment, but I realized in the second comment in the chain that it doesn't do everything we want

I don't know if this variable makes sense. With how it's set up right now, if we have at least one correct source in the README file, we'll automatically ignore all other incorrect sources. And from a state modeling perspective, the variable also feels redundant, when a single slice should be able to give us all the info we need

What makes more sense to me is to:

  1. Have incorrectSources as the main validation state
  2. Instead of having the actualSource == expectedSource check, only have a actualSource != expectedSource check. If that triggers, push the incorrect source to the incorrectSources slice. If things match, we'll do nothing, and let the code fall through to the rest of the loop
  3. Once the main loop is done and we're done processing the lines, check if the slice is not empty. If it's not, return an error with a list of all sources that are incorrect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I'm thinking over the code more, I think I understand what the old code was trying to do, and why my suggestion doesn't actually work

I'm not super up to speed on our Terraform conventions, but if a module has a name like module "blah", would it be expected that the source URL should always have "blah" in it, too? If so, I think we'd need to check the name of each module block, and compare that against the source URL line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if a module has a name like module "blah", would it be expected that the source URL should always have "blah" in it, too?

No it can be different. We just use the same name for convenience.

})
}

func TestValidateModuleSourceURL(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test setup is pretty good. Just to account for the other comment I added, though, I feel like we should have one more test that validates what happens if you have a block with an incorrect body, and 2+ other blocks with correct bodies

var (
validModuleBody = `# Test Module
` + "```tf\n" + `module "test-module" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm just now realizing how the sample MD contents are formatted, and I think I'd prefer for each one to be a single raw string. All the + signs are a bit hard to read in the GitHub UI, and this is a case where all the spacing and lines do matter

Comment on lines +73 to +77
// If we found incorrect sources but no correct one, report the first incorrect source
if len(incorrectSources) > 0 {
errs = append(errs, xerrors.Errorf("incorrect source URL format: found %q, expected %q", incorrectSources[0], expectedSource))
return errs
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we wouldn't want to report all incorrect sources?

return []error{xerrors.Errorf("invalid module path format: %s", rm.filePath)}
}

expectedSource := registryDomain + "/" + rm.namespace + "/" + rm.resourceName + "/coder"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since this variable isn't used for a while, we can move it down, closer to where it's used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants