Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions .github/workflows/verify-links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: Verify links

on:
push:
branches:
- main
- workflow/verify-links # TODO REMOVE
repository_dispatch:
workflow_dispatch:
schedule:
- cron: "08 08 * * 1"

jobs:
linkChecker:
runs-on: ubuntu-latest
permissions:
issues: write # required for peter-evans/create-issue-from-file
steps:
- uses: actions/checkout@v4

- name: Restore lychee cache
uses: actions/cache@v4
with:
path: .lycheecache
key: cache-lychee-${{ github.sha }}
restore-keys: cache-lychee-

- name: Link Checker
id: lychee
uses: lycheeverse/lychee-action@v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if anyone watched scheduled jobs, for Cockpit we make issues. Owners will spot these issues but not sure if anyone else would.

Did you try running lychee locally? Because without any changes it generates this:

🔍 4734 Total (in 1m 3s) ✅ 4317 OK 🚫 410 Errors 👻 6 Excluded ⏳ 1 Timeouts

It also will require a GITHUB_TOKEN otherwise we run into rate limits.

with:
fail: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand right this does not affect the following step and its check if: steps.lychee.outputs.exit_code != 0

# Exclude all private and local addresses in the check
# Also exclude domains that actively block GitHub to send requests
# Any Cockpit URLs it finds that have variables will be ignored too
args: |
--max-concurrency 1
--retry-wait-time 60
--base .
--skip-missing
--exclude-all-private
--exclude '^https://linux.die.net'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those left out, anubis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't. Anubis doesn't block curl or other user-agents, only the ones that masquerade themselves as "human" browsers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was 403 forbidden. Likely due to detecting it's from GitHub and just flatout declining it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yaml allows comments so would be good to add a reason

--exclude 'file:///'
--exclude 'domain.tld'
--exclude '^.*\{\{'
--exclude 'https://bodhi.fedoraproject.org/updates/cockpit-*'
--cache
--cache-exclude-status 400..=599
--max-cache-age 1d
-v
.
token: ${{ secrets.GITHUB_TOKEN }}

- name: Find the last open report issue
if: steps.lychee.outputs.exit_code != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary page shows that it found some errors - can you please explain when will lychee fail and make an issue to notify us and what kind of problems?

(Maybe I'm just missing the point of link checkers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Cockpit we have tools/urls-check which creates an issue if the link check fails. Thats what this PR attempts to also do.

We generally haven't used much external actions so I am wondering if we want too.

id: last-issue
uses: micalevisk/last-issue-action@v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems to be dead for 2 years without releases and I am slightly concerned about supply chain attacks here.

It essentially does one query which seems to be trivially rewritten in JavaScript in the workflow like we do for dependabot :

            try {
              await github.rest.issues.removeLabel({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.issue.number,
                name: 'node_modules'
              });
            } catch (e) {
              if (e.name == 'HttpError' && e.status == 404) {
                /* expected: 404 if label is unset */
              } else {
                throw e;
              }
            }

But then github.rest.issues using actions/github-script

Getting the issue might be as simple as octokit.rest.issues.listForRepo({ labels: ["link-checker"] }) and we always create an urls-check issue with a fixed label.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the hash for taking the release and update it with Renovate when there are new versions (if ever). Would at least ensure that the release itself is the one we expect.

Suggested change
uses: micalevisk/last-issue-action@v2
uses: micalevisk/last-issue-action@044e1cb7e9a4dde20e22969cb67818bfca0797be # v2

But yeah might be better to use something like that if it works

with:
state: open
labels: link-checker

- name: Update or create issue report
if: steps.lychee.outputs.exit_code != 0 && steps.last-issue.outputs.has-found == 'true'
uses: peter-evans/create-issue-from-file@v5
with:
title: Link Checker Report
content-filepath: ./lychee/out.md
issue-number: ${{ steps.last-issue.outputs.issue-number }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is no issue is found? Does this accept null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an optional field so yeah it would parse as not having provided any.

In this repo I see that they were successful with this approach:
tldr-pages/tldr-maintenance#129
Given their workflow:
https://github.com/tldr-pages/tldr-maintenance/blob/main/.github/workflows/check-links.yml

So it should work for us too. Their approach is to keep the issue open and update it regardless if the link checker failed or not. Whereas in this PR I've set it so it opens an issue or updates an existing one if something fails. So there is a slight difference.

They also have a step to close the issue which we could adopt (though theirs doesn't work atm due to their if-statement it seems).

labels: link-checker