Skip to content

Commit 284283d

Browse files
committed
chore: wip changes - squash before merging!
1 parent 2884800 commit 284283d

File tree

1 file changed

+49
-37
lines changed

1 file changed

+49
-37
lines changed

site/content/docs/rfds/0012-report-caching.md

Lines changed: 49 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -48,16 +48,8 @@ config files that are referenced by the policy file ought also to invalidate a
4848
cached repo. For example, a policy file using the `mitre/binary` plugin might
4949
configure that plugin to use `Binary.kdl`. If the user changes the content of
5050
`Binary.kdl`, the analysis configuration is technically different, but the
51-
policy file path and hash have not changed. We do not currently have a strategy
52-
to address this because the policy file by design does not have contextual
53-
understanding of the configuration key/value pairs sent to each plugin. We
54-
therefore cannot know which config keys represent files that need to be checked
55-
as part of the policy keying process.
56-
57-
Thus, it will be important for us to document clearly to users this limitation,
58-
and provide them with CLI tools for invalidating cache entries based on
59-
particular elements of the key tuple `(policy file path, policy file hash, hc
60-
binary hash, repository, commit/tag)`.
51+
policy file path and hash have not changed. We have addressed overcoming this
52+
limitation in a subsequent [RFD][rfd-13].
6153

6254
## Inserting Report Caching into the Analysis Workflow
6355

@@ -99,8 +91,29 @@ to store a table containing all the key elements plus the filename of the
9991
report. We would likely store all the reports directly in `report/` as
10092
compressed files, plus the one database file. We also propose a simple unique ID
10193
column for the table so that entries can be referenced without specifying every
102-
key element. This unique ID would be auto-generated and assigned by the SQLite
103-
database as each entry is created.
94+
key element. We discuss the generation strategy and format of the unique ID
95+
below.
96+
97+
### Report Cache Entry Unique ID
98+
99+
Since the multi-key that is used to cache elements is burdensome to specify when
100+
manipulating entries, we propose to add a unique ID scheme to simplify referring
101+
to reports.
102+
103+
To generate this hash, we will do the following (all hash operations performed
104+
using SHA256):
105+
106+
1. Generate the combined hash of the policy file and plugins, as described in
107+
[RFD13][rfd-13].
108+
2. Generate the Hipcheck binary hash.
109+
3. Hash the concatenation of the repository name and specific commit ID being
110+
analyzed.
111+
4. Concatenate the hashes of 1-3 in a consistent order and hash the result,
112+
which will be the unique ID.
113+
114+
As a hash shortcode, we can allow `<REPO_NAME>-<SHORT_HASH>`, where
115+
`<SHORT_HASH>` is the shortest hash prefix necessary to distinguish the report
116+
from other reports on the same repository.
104117

105118
### Report Cache Synchronization
106119

@@ -219,32 +232,31 @@ We propose Hipcheck to act as a GitHub CI action that fails if any provided
219232
targets need investigation. This means that, provided subsequent workflow steps
220233
aren't explicitly configured to run in spite of a previous failed step, the
221234
entire job will also fail. The failure message will report to users what
222-
dependencies/targets need investigation, and each dependency's unique ID.
223-
224-
Once the user decides the dependency is acceptable, they will use a separate
225-
manual CI action to manipulate the cache. This Hipcheck review action will take
226-
as input a comma-separated list of dependencies as a string. The review action
227-
will then split the dependency list into separate calls to `hc cache report
228-
reviewed --id <ID>` to mark the specific dependencies as reviewed.
229-
230-
The key element to make this approach work is that the main check workflow and
231-
the manual review workflow need to pull from the same `HC_CACHE` so that the
232-
unique IDs match and "reviewed" markings in the report database from the manual
233-
review workflow are reflected in the next `hc check` run. According to the
234-
GitHub [actions documentation][workspace-docs], it appears that the path
235-
specified by the `GITHUB_WORKSPACE` environment variable is consistent between
236-
runs, in which case we can point all Hipcheck actions to the same location in
237-
this directory with the `HC_CACHE` variable.
238-
239-
If this strategy does not work, we propose both actions to use the
240-
`actions/cache@v3` [action][cache-action] with a common key that saves and loads
241-
the contents of the designated cache directory, and ensures that `HC_CACHE` is
242-
set to that cache when `hc` is run in both actions.
243-
244-
If users are not interested in this more strict workflow, they can mark the
245-
check action as `continue-on-error` so that an "investigate" determination on
246-
any number of dependencies does not cause the whole job to fail.
235+
dependencies/targets need investigation, and each dependency's unique ID. As a
236+
side note: if users are not interested in this more strict workflow, they can
237+
mark the check action as `continue-on-error` so that an "investigate"
238+
determination on any number of dependencies does not cause the whole job to
239+
fail.
240+
241+
Once the user decides the dependency is acceptable, they will update a file they
242+
keep in their repository called `reviewed.txt`, which will be a newline-
243+
separated list of unique IDs or report shortcodes. This list will be read in by
244+
Hipcheck at runtime during an `hc check` and will be referenced when reports are
245+
emitted to determine whether to return a failure status code when Hipcheck
246+
encounters a report in need of investigation.
247+
248+
This addresses how to let Hipcheck pass CI, but not necessarily how to actually
249+
cache report generation in a GitLab runner. For this, we propose to use the
250+
`actions/cache` GitHub action to cache the `HC_CACHE` directory between runs.
251+
Keys in the cache crated by this action are deleted after 7 days without use; so as a workaround to
252+
allow people who don't run Hipcheck consistently in that timeframe, we could
253+
offer a hack GitHub action that simply loads the key. Users can set up this hack
254+
action as a cron job to ensure it gets set consistently. This strategy has the
255+
limitation that when two Hipcheck actions run simultaenously, the second
256+
instance to complete will trample the writes to the report cache directory and
257+
SQL file that was made by the first.
247258

248259
[sqlite-blobs]: https://www.sqlite.org/intern-v-extern-blob.html
249260
[workspace-docs]: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#default-environment-variables
250261
[cache-action]: https://github.com/marketplace/actions/cache
262+
[rfd-13]: @docs/rfds/0013-plugin-config-hash.md

0 commit comments

Comments
 (0)