You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tests/robustness/README.md
+28-27Lines changed: 28 additions & 27 deletions
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ The purpose of these tests is to rigorously validate that etcd maintains its [KV
39
39
40
40
## How Robustness Tests Work
41
41
42
-
Robustness tests compare etcd cluster behavior against a simplified model of its expected behavior.
42
+
Robustness tests compare the etcd cluster behavior against a simplified model of its expected behavior.
43
43
These tests cover various scenarios, including:
44
44
45
45
***Different etcd cluster setups:** Cluster sizes, configurations, and deployment topologies.
@@ -52,8 +52,8 @@ These tests cover various scenarios, including:
52
52
2.**Traffic and Failures:** Client traffic is generated and sent to the cluster while failures are injected.
53
53
3.**History Collection:** All client operations and their results are recorded.
54
54
4.**Validation:** The collected history is validated against the etcd model and a set of validators to ensure consistency and correctness.
55
-
5.**Report Generation:** If a failure is detected and a detailed report is generated to help diagnose the issue.
56
-
This report includes information about the client operations, etcd data directories.
55
+
5.**Report Generation:** If a failure is detected then a detailed report is generated to help diagnose the issue.
56
+
This report includes information about the client operations and etcd data directories.
57
57
58
58
## Key Concepts
59
59
@@ -96,26 +96,25 @@ Etcd provides strict serializability for KV operations and eventual consistency
96
96
make gofail-disable
97
97
```
98
98
2. Run the tests
99
-
100
99
```bash
101
100
make test-robustness
102
101
```
103
102
104
-
Optionally you can pass environment variables:
103
+
Optionally, you can pass environment variables:
105
104
*`GO_TEST_FLAGS` - to pass additional arguments to `go test`.
106
105
It is recommended to run tests multiple times with failfast enabled. this can be done by setting `GO_TEST_FLAGS='--count=100 --failfast'`.
107
106
*`EXPECT_DEBUG=true` - to get logs from the cluster.
108
-
*`RESULTS_DIR` - to change location where results report will be saved.
107
+
*`RESULTS_DIR` - to change the location where the results report will be saved.
109
108
*`PERSIST_RESULTS` - to persist the results report of the test. By default this will not be persisted in the case of a successful run.
110
109
111
110
## Re-evaluate existing report
112
111
113
112
Robustness test validation is constantly changing and improving.
114
-
Errors in etcd model could be causing false positives, which makes the ability to re-evaluate the reports after we fix the issue important.
113
+
Errors inthe etcd model could be causing false positives, which makes the ability to re-evaluate the reports after we fix the issue important.
115
114
116
115
> Note: Robustness test report format is not stable, and it's expected that not all old reports can be re-evaluated using the newest version.
117
116
118
-
1. Identify location of the robustness test report.
117
+
1. Identify the location of the robustness test report.
119
118
120
119
> Note: By default robustness test report is only generated for failed test.
121
120
@@ -124,7 +123,7 @@ Errors in etcd model could be causing false positives, which makes the ability t
124
123
logger.go:146: 2024-04-08T09:45:27.734+0200 INFO Saving robustness test report {"path": "/tmp/TestRobustnessExploratory_Etcd_HighTraffic_ClusterOfSize1"}
125
124
```
126
125
127
-
* **For remote runs on CI:** you need to go to the [Prow Dashboard](https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-etcd-robustness-amd64), go to a build, download one of the Artifacts (`artifacts/results.zip`), and extract it locally.
126
+
* **For remote runs on CI:** you need to go to the [Prow Dashboard](https://testgrid.k8s.io/sig-etcd-robustness#Summary), go to a build, download one of the Artifacts (`artifacts/results.zip`), and extract it locally.
128
127
129
128

130
129
@@ -144,14 +143,14 @@ Errors in etcd model could be causing false positives, which makes the ability t
144
143
145
144
The `testdata` directory can contain multiple robustness test reports.
146
145
The name of the report directory doesn't matter, as long as it's unique to prevent clashing with reports already present in `testdata` directory.
147
-
For example path for `history.html` file could look like `$REPO_ROOT/tests/robustness/testdata/v3.5_failure_24_April/history.html`.
146
+
For example, the path for `history.html` file could look like `$REPO_ROOT/tests/robustness/testdata/v3.5_failure_24_April/history.html`.
148
147
149
148
3. Run `make test-robustness-reports` to validate all reports in the `testdata` directory.
150
149
151
150
## Analysing failure
152
151
153
-
If robustness tests fails we want to analyse the report to confirm if the issue is on etcd side. Location of the directory with the report
154
-
is mentioned `Saving robustness test report` log. Logs from report generation should look like:
152
+
If robustness tests fail, we want to analyse the report to confirm if the issue is on etcd side. The location of the directory with the report
153
+
is mentioned in the `Saving robustness test report` log. Logs from report generation should look like:
155
154
```
156
155
logger.go:146: 2024-05-08T10:42:54.429+0200 INFO Saving robustness test report {"path": "/tmp/TestRobustnessRegression_Issue14370/1715157774429416550"}
157
156
logger.go:146: 2024-05-08T10:42:54.429+0200 INFO Saving member data dir {"member": "TestRobustnessRegressionIssue14370-test-0", "path": "/tmp/TestRobustnessRegression_Issue14370/1715157774429416550/server-TestRobustnessRegressionIssue14370-test-0"}
@@ -178,21 +177,21 @@ is mentioned `Saving robustness test report` log. Logs from report generation sh
178
177
logger.go:146: 2024-05-08T10:42:54.441+0200 INFO Saving visualization {"path": "/tmp/TestRobustnessRegression_Issue14370/1715157774429416550/history.html"}
179
178
```
180
179
181
-
Report follows the hierarchy:
180
+
The report follows the hierarchy:
182
181
* `server-*` - etcd server data directories, can be used to verify disk/memory corruption.
183
182
* `member`
184
183
* `wal` - Write Ahead Log (WAL) directory, that can be analysed using `etcd-dump-logs` command line tool available in `tools` directory.
185
184
* `snap` - Snapshot directory, includes the bbolt database file `db`, that can be analysed using `etcd-dump-db` command line tool available in `tools` directory.
186
185
* `client-*` - Client request and response dumps in json format.
187
-
* `watch.jon` - Watch requests and responses, can be used to validate [watch API guarantees].
186
+
* `watch.json` - Watch requests and responses, can be used to validate [watch API guarantees].
188
187
* `operations.json` - KV operation history
189
188
* `history.html` - Visualization of KV operation history, can be used to validate [KV API guarantees].
190
189
191
-
### Example analysis of linearization issue
190
+
### Example analysis of a linearization issue
192
191
193
192
Let's reproduce and analyse robustness test report for issue [#14370].
194
193
To reproduce the issue by yourself run `make test-robustness-issue14370`.
195
-
After a couple of tries robustness tests should fail with a log `Linearization failed` and save report locally.
194
+
After a couple of tries robustness tests should fail with a log `Linearization failed` and save the report locally.
196
195
197
196
Example:
198
197
```
@@ -211,14 +210,14 @@ Jump to the error in linearization by clicking `[ jump to first error ]` on the
211
210
You should see a graph similar to the one on the image below.
212
211

213
212
214
-
Last correct request (connected with grey line) is a `Put` request that succeeded and got revision `168`.
213
+
The last correct request (connected with the grey line) is a `Put` request that succeeded and got revision `168`.
215
214
All following requests are invalid (connected with red line) as they have revision `167`.
216
-
Etcd guarantee that revision is non-decreasing, so this shows a bug in etcd as there is no way revision should decrease.
217
-
This is consistent with the root cause of [#14370] as it was issue with process crash causing last write to be lost.
215
+
Etcd guarantees that revision is non-decreasing, so this shows a bug in etcd as there is no way revision should decrease.
216
+
This is consistent with the root cause of [#14370] as it was an issue with the process crash causing the last write to be lost.
0 commit comments