Add GitHub Actions workflow #3492

bdach · 2020-04-25T13:38:25Z

This PR contains an alternative method of automatically reviewing Markdown style, intended to replace the existing setup with Codacy/remark initially introduced as part of #3310.

Rationale

I've already described some of the issues with Codacy once in this comment, but to reiterate:

Codacy's main downfall for our use case is the fact that it seems to mis-use remark, leading to the set of active rules being different to what is actually intended by the configuration. While I can only hazard a guess as to what is really wrong in their setup, it seems it wrongly globally installs both the remark-cli package and remark linting rules, which due to npm local/global foibles leads to our overrides of rule presets not working.

The primary example of this behaviour is the 80-character line length inspection still being active globally, despite being disabled both in the global .remarkrc.js and the news subconfig. It is likely that other overrides also completely fail to work due to the same reason.
Another concern is that for some reason, some of the time Codacy will mis-report the actual PR status, seemingly at random. This can be seen for example in this PR, wherein Codacy claims that the pull request is "positive" despite once having reported 11 issues that have not actually been addressed (not to mention, all of which were actually bogus due to the previous point) and just 1 issue resolved.

Therefore, as I cannot fix Codacy's issues and I don't want to try and fix remark so that it can work with its rules installed globally, the solution that I'm proposing is to do this ourselves by leveraging GitHub Actions.

Technical details

The workflow pipeline can be seen in the included .yml file. As far as I can tell this is the most optimised the pipeline can be. The steps are as follows:

First, the repository is cloned using read-only credentials provided by GitHub Actions. To save time and bandwidth, the clone is partial - git sparse-checkout lets me only fetch the actual markdown files necessary, plus other required files (package.json, and all remark configuration files).

The reason for blanket-including all .json files and not only package.json is to enable rapidly switching to markdownlint in the future if deemed necessary - all of the pipeline will remain the same, save for the final command.
Then, the pipeline sets up a node install and runs npm install to get remark dependencies. This is why this PR also includes a package.json - the added benefit of which is being able to relatively quickly and painlessly install the required prerequisites in case someone wants to run linting locally. I had a major WTF moment when I had to fetch the odd twentysomething packages manually.

Note that package.json specifies version: latest and has no associated lockfile. This was intentional on my part, because we're using remark as a tool and not an actual dependency, and I assume we don't want to be tracking/bumping version numbers every time. I don't know if this is how things are done in JS land; if this is a grand transgression, then I'll include it.

The reason why the install always runs instead of just skipping if there are no changed files is sort of aesthetic - I could not find a good way to early-exit out of a workflow pipeline, and if I did that just as a normal shell script, then the actual linting output would get mixed with npm install logs, and I don't want to either mix or silence that.
Finally, remark is ran through npx only on the changed files. As for the flags:
- -q makes sure only warnings and errors are output,
- -f ensures that the exit code is nonzero even with warnings, therefore failing the workflow,
- --no-stdout is specified because remark will spew the linted file out there, which we don't want,
- --report and --color are used so that the output is somewhat nice-looking and accessible (see examples lower down).

Experimental results

I have run a few test PRs on my fork to see how the new pipeline behaves, borrowing some of the outstanding upstream PRs (sorry @cl8n @Pennek!). Here's a couple of example comparisons:

PR	Codacy	GH Actions
no changes to markdown files	✔️ pass	✔️ pass
changes to image files only	✔️ pass	✔️ pass
#3468	❓ pass/fail depending on who knows what, still shows wrong warnings	✔️ pass
#3488	❌ incorrect fail (some warnings wrong)	❌ correct fail, then ✔️ pass after addressing

Of course this is far from a comprehensive summary, but I think it's a convincing enough start.

As a side note, when it comes to execution time the GitHub actions workflow seems to blow Codacy out of the water - while it may take up to 3 minutes for codacy to be done, the workflow is usually done in ~30 secs.

I encourage you to check the error output of both tools - GH Actions is unfortunately worse, but I did manage to get it to show context lines for easier understanding of where the actual error lies.

Tradeoffs and summary

To sum up, with this PR we gain the following benefits:

actually working CI that we can configure ourselves,
which is faster than the existing solution,
which we can switch from remark to markdownlint on a dime,
and which should be enabled on users' forks automatically,

at the cost of:

having to maintain the pipeline in the future,
somewhat less accessible output to non-technical users.

A possible future gain of doing this through GH Actions ourselves is that possibly we could get better integration in the way of automated review comments or even auto-committing fixes to user changes. While that is somewhat feasible in some use cases even now (there are pre-existing actions that commit and push fixes) they do not currently work with the fork/PR workflow due to the worry that a malicious actor could leak secrets if they had access to a non-read-only access token.

Since the decision is not trivial and there are potentially a few kinks to iron out before this is ready, I've decided to open this as draft and put it up to discussion. If it's deemed necessary, I can also run a "parallel repository" and evaluate the quality of the pipeline on my fork by rebasing PRs that come into upstream on top of the changes here, so that more evidence can be collected as to whether this is a feasible way forward.

.gitignore

package.json

cl8n · 2020-05-03T23:57:48Z

the sparse checkout is neat but I'm not sure it'll be the best option forever, there are CI checks I could imagine that involve images at least (making sure links to images from markdown point to valid files, checking that no files are >1MB, ..)

for now it's cool though 👍 didn't know git has options for this

bdach · 2020-05-04T06:22:54Z

I mostly went for the sparse checkout as a normal clone ends up fetching ~300 MB of data which seems super wasteful to do every CI run. It takes time too, but not that much (half a minute-ish, github seems to give good bandwidth)

When we get to wanting to run inspections on the blobs then I'll evaluate possibilities to hack past actually needing to fetch them (first thought is inspecting the list of files somehow without actually getting them, which feels like it should be possible).

sr229

Not pinning versions is very unsafe. I vote against merging this until deps are pinned.

bdach · 2020-05-04T16:38:02Z

I'm aware of potential consequences of not locking versions. As I explained in the PR description I didn't really want to introduce a 1.8k line file and the need to track dependency versions for what should essentially be a CLI tool for a text wiki that runs mostly on a github VM, but I agree you're correct in that someone might for some reason actually try to install npm on their machine and run linting locally, so I have a responsibility in limiting funny business with dependencies to as much as npm allows me to.

To that end I've added the lockfile, hard-pinned all direct dependencies to latest version (and latest version only, no semver carets - although npm still put carets and even wildcards (!!!!) into the lockfile but I assume that is sort of expected), and also added an explicit audit step before installing (I'm aware that audit runs as part of install too, I wasn't sure if it was going to fail the install though, so I did that as I'd want it to definitely always fail).

sr229

LGTM

sr229 · 2020-05-04T16:43:27Z

As for dependency updates, we already have a lot of integrations like WhiteSource Renovate to handle automatic dependency upgrades. So let's not worry so much about the integrations.

cl8n · 2020-05-04T19:55:16Z

looks like this is easy to extend to better ui (via github api) thanks to --reporter so let's just see if it works for now. did some similar tests to yours and I got good results

🙏

bdach added 3 commits April 25, 2020 13:50

Add package.json to setup easy remark runs

3900e4f

Ignore leftover files from local remark runs

6b6435b

Add CI GitHub workflow for remark

0c436ea

pull-request-size bot added the size/M label Apr 25, 2020

TPGPL added area:meta non-article files high priority labels Apr 25, 2020

cl8n reviewed May 3, 2020

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

package.json Outdated Show resolved Hide resolved

sr229 suggested changes May 4, 2020

View reviewed changes

bdach added 2 commits May 4, 2020 18:12

Hard-lock npm package versions to current latest

7ccad84

Explicitly run npm audit before installing

f1da818

bdach marked this pull request as ready for review May 4, 2020 16:38

sr229 previously approved these changes May 4, 2020

View reviewed changes

Make package private

adefb0a

cl8n dismissed sr229’s stale review via adefb0a May 4, 2020 19:42

cl8n approved these changes May 4, 2020

View reviewed changes

cl8n merged commit 081cdfb into ppy:master May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GitHub Actions workflow #3492

Add GitHub Actions workflow #3492

bdach commented Apr 25, 2020

cl8n commented May 3, 2020

bdach commented May 4, 2020

sr229 left a comment

bdach commented May 4, 2020 •

edited

Loading

sr229 left a comment

sr229 commented May 4, 2020

cl8n commented May 4, 2020

Add GitHub Actions workflow #3492

Add GitHub Actions workflow #3492

Conversation

bdach commented Apr 25, 2020

Rationale

Technical details

Experimental results

Tradeoffs and summary

cl8n commented May 3, 2020

bdach commented May 4, 2020

sr229 left a comment

Choose a reason for hiding this comment

bdach commented May 4, 2020 • edited Loading

sr229 left a comment

Choose a reason for hiding this comment

sr229 commented May 4, 2020

cl8n commented May 4, 2020

bdach commented May 4, 2020 •

edited

Loading