Skip to content

Conversation

Venefilyn
Copy link
Member

@Venefilyn Venefilyn commented Jul 28, 2025

Verifies links through Lychee within the project are alive and doesn't go to
outdated or otherwise dead links.


- name: Link Checker
id: lychee
uses: lycheeverse/lychee-action@v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if anyone watched scheduled jobs, for Cockpit we make issues. Owners will spot these issues but not sure if anyone else would.

Did you try running lychee locally? Because without any changes it generates this:

🔍 4734 Total (in 1m 3s) ✅ 4317 OK 🚫 410 Errors 👻 6 Excluded ⏳ 1 Timeouts

It also will require a GITHUB_TOKEN otherwise we run into rate limits.

@Venefilyn Venefilyn force-pushed the workflow/verify-links branch 14 times, most recently from 903c337 to 2c79ee8 Compare August 7, 2025 14:24
@Venefilyn Venefilyn requested a review from tomasmatus August 12, 2025 11:27
token: ${{ secrets.GITHUB_TOKEN }}

- name: Find the last open report issue
if: steps.lychee.outputs.exit_code != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary page shows that it found some errors - can you please explain when will lychee fail and make an issue to notify us and what kind of problems?

(Maybe I'm just missing the point of link checkers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Cockpit we have tools/urls-check which creates an issue if the link check fails. Thats what this PR attempts to also do.

We generally haven't used much external actions so I am wondering if we want too.

id: lychee
uses: lycheeverse/lychee-action@v2
with:
fail: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand right this does not affect the following step and its check if: steps.lychee.outputs.exit_code != 0

@mvollmer mvollmer moved this to improvement in Pilot tasks Aug 14, 2025
--base .
--skip-missing
--exclude-all-private
--exclude '^https://linux.die.net'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those left out, anubis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't. Anubis doesn't block curl or other user-agents, only the ones that masquerade themselves as "human" browsers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was 403 forbidden. Likely due to detecting it's from GitHub and just flatout declining it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yaml allows comments so would be good to add a reason

@jelly
Copy link
Member

jelly commented Sep 1, 2025

Alright, how do we want to proceed here? There are still tons of issues with urls, I think merging this without the issue create/update is fine for now but then urls need to be fixed.

Verifies that links within the project are alive and doesn't go to
outdated or otherwise dead links.

Signed-off-by: Freya Gustavsson <freya@venefilyn.se>
@Venefilyn Venefilyn force-pushed the workflow/verify-links branch 6 times, most recently from 94a943a to 4ac9a18 Compare September 25, 2025 13:22
@Venefilyn Venefilyn force-pushed the workflow/verify-links branch 10 times, most recently from e9685c3 to f695b34 Compare September 25, 2025 14:54
Signed-off-by: Freya Gustavsson <freya@venefilyn.se>
@Venefilyn Venefilyn force-pushed the workflow/verify-links branch 2 times, most recently from 4d690ce to 29b0609 Compare September 25, 2025 17:34
Signed-off-by: Freya Gustavsson <freya@venefilyn.se>
@Venefilyn Venefilyn force-pushed the workflow/verify-links branch from 29b0609 to 9ab36d0 Compare September 25, 2025 17:35
@Venefilyn
Copy link
Member Author

@jelly I updated the code to be a bit better. PTAL at the output and let me know if there is anything specific that stands out that needs to be changed with config 🙏

https://github.com/cockpit-project/cockpit-project.github.io/actions/runs/18015821705?pr=819

@Venefilyn Venefilyn requested a review from jelly September 26, 2025 12:36
- name: Find the last open report issue
if: steps.lychee.outputs.exit_code != 0
id: last-issue
uses: micalevisk/last-issue-action@v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems to be dead for 2 years without releases and I am slightly concerned about supply chain attacks here.

It essentially does one query which seems to be trivially rewritten in JavaScript in the workflow like we do for dependabot :

            try {
              await github.rest.issues.removeLabel({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.issue.number,
                name: 'node_modules'
              });
            } catch (e) {
              if (e.name == 'HttpError' && e.status == 404) {
                /* expected: 404 if label is unset */
              } else {
                throw e;
              }
            }

But then github.rest.issues using actions/github-script

Getting the issue might be as simple as octokit.rest.issues.listForRepo({ labels: ["link-checker"] }) and we always create an urls-check issue with a fixed label.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the hash for taking the release and update it with Renovate when there are new versions (if ever). Would at least ensure that the release itself is the one we expect.

Suggested change
uses: micalevisk/last-issue-action@v2
uses: micalevisk/last-issue-action@044e1cb7e9a4dde20e22969cb67818bfca0797be # v2

But yeah might be better to use something like that if it works

with:
title: Link Checker Report
content-filepath: ./lychee/out.md
issue-number: ${{ steps.last-issue.outputs.issue-number }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is no issue is found? Does this accept null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an optional field so yeah it would parse as not having provided any.

In this repo I see that they were successful with this approach:
tldr-pages/tldr-maintenance#129
Given their workflow:
https://github.com/tldr-pages/tldr-maintenance/blob/main/.github/workflows/check-links.yml

So it should work for us too. Their approach is to keep the issue open and update it regardless if the link checker failed or not. Whereas in this PR I've set it so it opens an issue or updates an existing one if something fails. So there is a slight difference.

They also have a step to close the issue which we could adopt (though theirs doesn't work atm due to their if-statement it seems).

@jelly
Copy link
Member

jelly commented Oct 8, 2025

@jelly I updated the code to be a bit better. PTAL at the output and let me know if there is anything specific that stands out that needs to be changed with config 🙏

https://github.com/cockpit-project/cockpit-project.github.io/actions/runs/18015821705?pr=819

I really like the idea of link checking but I see that this PR now also pushes an artifact of the build website and then in another action check the links? Why can't we do this in a normal git checkout?

@Venefilyn
Copy link
Member Author

@jelly I updated the code to be a bit better. PTAL at the output and let me know if there is anything specific that stands out that needs to be changed with config 🙏
cockpit-project/cockpit-project.github.io/actions/runs/18015821705?pr=819

I really like the idea of link checking but I see that this PR now also pushes an artifact of the build website and then in another action check the links? Why can't we do this in a normal git checkout?

Splitting it out makes it cleaner as we just have to build once. The artifact should be in GitHub runner's cache too IIRC so basically takes no time for it to "download" the artifact.

Otherwise we'd have to combine build and publish into one which isn't as clean, and we have to add the same build step for link checker as well.

We can't check our source as-is because we construct URLs and pages on build-time. Like for categories, authors, anything internal with site_url()/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: improvement
Development

Successfully merging this pull request may close these issues.

4 participants