@@ -48,16 +48,8 @@ config files that are referenced by the policy file ought also to invalidate a
48
48
cached repo. For example, a policy file using the ` mitre/binary ` plugin might
49
49
configure that plugin to use ` Binary.kdl ` . If the user changes the content of
50
50
` Binary.kdl ` , the analysis configuration is technically different, but the
51
- policy file path and hash have not changed. We do not currently have a strategy
52
- to address this because the policy file by design does not have contextual
53
- understanding of the configuration key/value pairs sent to each plugin. We
54
- therefore cannot know which config keys represent files that need to be checked
55
- as part of the policy keying process.
56
-
57
- Thus, it will be important for us to document clearly to users this limitation,
58
- and provide them with CLI tools for invalidating cache entries based on
59
- particular elements of the key tuple `(policy file path, policy file hash, hc
60
- binary hash, repository, commit/tag)`.
51
+ policy file path and hash have not changed. We have addressed overcoming this
52
+ limitation in a subsequent [ RFD] [ rfd-13 ] .
61
53
62
54
## Inserting Report Caching into the Analysis Workflow
63
55
@@ -99,8 +91,29 @@ to store a table containing all the key elements plus the filename of the
99
91
report. We would likely store all the reports directly in ` report/ ` as
100
92
compressed files, plus the one database file. We also propose a simple unique ID
101
93
column for the table so that entries can be referenced without specifying every
102
- key element. This unique ID would be auto-generated and assigned by the SQLite
103
- database as each entry is created.
94
+ key element. We discuss the generation strategy and format of the unique ID
95
+ below.
96
+
97
+ ### Report Cache Entry Unique ID
98
+
99
+ Since the multi-key that is used to cache elements is burdensome to specify when
100
+ manipulating entries, we propose to add a unique ID scheme to simplify referring
101
+ to reports.
102
+
103
+ To generate this hash, we will do the following (all hash operations performed
104
+ using SHA256):
105
+
106
+ 1 . Generate the combined hash of the policy file and plugins, as described in
107
+ [ RFD13] [ rfd-13 ] .
108
+ 2 . Generate the Hipcheck binary hash.
109
+ 3 . Hash the concatenation of the repository name and specific commit ID being
110
+ analyzed.
111
+ 4 . Concatenate the hashes of 1-3 in a consistent order and hash the result,
112
+ which will be the unique ID.
113
+
114
+ As a hash shortcode, we can allow ` <REPO_NAME>-<SHORT_HASH> ` , where
115
+ ` <SHORT_HASH> ` is the shortest hash prefix necessary to distinguish the report
116
+ from other reports on the same repository.
104
117
105
118
### Report Cache Synchronization
106
119
@@ -219,32 +232,31 @@ We propose Hipcheck to act as a GitHub CI action that fails if any provided
219
232
targets need investigation. This means that, provided subsequent workflow steps
220
233
aren't explicitly configured to run in spite of a previous failed step, the
221
234
entire job will also fail. The failure message will report to users what
222
- dependencies/targets need investigation, and each dependency's unique ID.
223
-
224
- Once the user decides the dependency is acceptable, they will use a separate
225
- manual CI action to manipulate the cache. This Hipcheck review action will take
226
- as input a comma-separated list of dependencies as a string. The review action
227
- will then split the dependency list into separate calls to `hc cache report
228
- reviewed --id <ID >` to mark the specific dependencies as reviewed.
229
-
230
- The key element to make this approach work is that the main check workflow and
231
- the manual review workflow need to pull from the same ` HC_CACHE ` so that the
232
- unique IDs match and "reviewed" markings in the report database from the manual
233
- review workflow are reflected in the next ` hc check ` run. According to the
234
- GitHub [ actions documentation] [ workspace-docs ] , it appears that the path
235
- specified by the ` GITHUB_WORKSPACE ` environment variable is consistent between
236
- runs, in which case we can point all Hipcheck actions to the same location in
237
- this directory with the ` HC_CACHE ` variable.
238
-
239
- If this strategy does not work, we propose both actions to use the
240
- ` actions/cache@v3 ` [ action] [ cache-action ] with a common key that saves and loads
241
- the contents of the designated cache directory, and ensures that ` HC_CACHE ` is
242
- set to that cache when ` hc ` is run in both actions.
243
-
244
- If users are not interested in this more strict workflow, they can mark the
245
- check action as ` continue-on-error ` so that an "investigate" determination on
246
- any number of dependencies does not cause the whole job to fail.
235
+ dependencies/targets need investigation, and each dependency's unique ID. As a
236
+ side note: if users are not interested in this more strict workflow, they can
237
+ mark the check action as ` continue-on-error ` so that an "investigate"
238
+ determination on any number of dependencies does not cause the whole job to
239
+ fail.
240
+
241
+ Once the user decides the dependency is acceptable, they will update a file they
242
+ keep in their repository called ` reviewed.txt ` , which will be a newline-
243
+ separated list of unique IDs or report shortcodes. This list will be read in by
244
+ Hipcheck at runtime during an ` hc check ` and will be referenced when reports are
245
+ emitted to determine whether to return a failure status code when Hipcheck
246
+ encounters a report in need of investigation.
247
+
248
+ This addresses how to let Hipcheck pass CI, but not necessarily how to actually
249
+ cache report generation in a GitLab runner. For this, we propose to use the
250
+ ` actions/cache ` GitHub action to cache the ` HC_CACHE ` directory between runs.
251
+ Keys in the cache crated by this action are deleted after 7 days without use; so as a workaround to
252
+ allow people who don't run Hipcheck consistently in that timeframe, we could
253
+ offer a hack GitHub action that simply loads the key. Users can set up this hack
254
+ action as a cron job to ensure it gets set consistently. This strategy has the
255
+ limitation that when two Hipcheck actions run simultaenously, the second
256
+ instance to complete will trample the writes to the report cache directory and
257
+ SQL file that was made by the first.
247
258
248
259
[ sqlite-blobs ] : https://www.sqlite.org/intern-v-extern-blob.html
249
260
[ workspace-docs ] : https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#default-environment-variables
250
261
[ cache-action ] : https://github.com/marketplace/actions/cache
262
+ [ rfd-13 ] : @docs/rfds/0013-plugin-config-hash.md
0 commit comments