You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -10,18 +10,19 @@ them before sending through the rendezvous server to the peer).
10
10
11
11
## Application version
12
12
13
-
The main key in the `app_version` object is called `abilities`, which is an array of strings. The known values are: `["transfer-v1", "transfer-v2"]`. Unknown values and keys have to be accepted by every client. An ability may specify additional hints to store in the object as well. If the value is empty (`{}`), `{abilities = ["transfer-v1"];}` must be assumed for backwards compatibility. `transfer-v1`SHOULD always be supported.
13
+
The main key in the `app_version` object is called `abilities`, which is an array of strings. The known values are: `["transfer-v1", "transfer-v2"]`. Unknown values and keys have to be accepted by every client. An ability may specify additional hints to store in the object as well. If the value is empty (`{}`), `{abilities = ["transfer-v1"];}` must be assumed for backwards compatibility. `transfer-v1`should always be supported.
14
14
15
-
The sender gets to pick a protocol version and capabilities based on the version information of the peer. The receiver distinguishes which protocol is used on the first incoming message.
15
+
The sender gets to pick a protocol version and capabilities based on the version information of the peer. The receiver distinguishes which protocol is used on the first incoming message. (Therefore, different protocol versions must be distinguishable on the first message.)
@@ -112,103 +113,205 @@ the Transit connection. The final ack of the received data is sent through
112
113
the Transit object, as a UTF-8-encoded JSON-encoded dictionary with `ack: ok`
113
114
and `sha256: HEXHEX` containing the hash of the received data.
114
115
115
-
## Transfer v2 (proposal)
116
+
## Transfer v2
116
117
117
-
A v2 of the file transfer protocol got invented to add the following features:
118
+
Version 2 of the file transfer protocol got invented to add the following features:
118
119
119
120
- Resumable transfers after a connection interruption
120
121
- No need to build a temporary zip file; for both speed and space efficiency reasons. Also zip has a lot of other subtle limitations.
122
+
<!-- - Allow for multiple transfer from both sides using a single connection -->
121
123
122
-
The feature of sending text messages (without a transit connection), on the other hand, got removed.
123
-
124
-
### Basic protocol
124
+
The feature of sending text messages (without a transit connection), on the other hand, got removed (version 1 serves us well for that purpose).
125
+
All transfers may contain multiple files: This covers both the "single file" use
126
+
case as well as the "folder" use case.
125
127
126
-
The sender sends an offer, which contains a list of all the files, their size, modification time, and a transfer identifier that can be used to resume connections. The attempt to send the same files twice should use with the same identifier. How it is generated is an implementation detail, the suggested method is to either store it locally or to use the hash of the absolute path of the folder being sent.
128
+
### Application version
127
129
128
-
The receiver responses either with either a `"transfer rejected"` error of with an acknowledgement. The acknowledgement may contain a list of byte offsets, one for each file, which will tell the sender from where to resume the transfer.
130
+
Setting the `transfer-v2` ability also requires providing a `transfer-v2` dictionary with the following values:
131
+
`supported-formats` (see below) and `transit-abilities`, which is the same as `abilities-v1` in the version 1 specification. The transit abilities are exchanged earlier than in version 1 so that the `transit` message may
132
+
only contain the hints for abilities both sides support, which avoids wasting effort.
129
133
130
-
Both do the negotiation to open a transit relay. The process to doing so is slightly different from the one in the first version. The set of supported abilities is already delivered during the file offer/ack. Thus, the `transit` message only contains the hints for methods both sides support. Both side try to connect to every hint of the other side, the sender will then confirm the first one that succeeded.
134
+
#### Supported formats
131
135
132
-
The sender then sends the requested bytes over the relay using one of the supported formats. Afterwards, it sends a message with checksums. The receiver then closes the connections, optionally with sending an error message on a checksum mismatch.
136
+
Known formats are `plain` and `zst`. The former indicates uncompressed data and
137
+
must be supported by all clients; all other formats are optional. TODO
138
+
At the moment, the only supported format is `zst`. The details are up to the sender; a low compression level is recommended.
133
139
134
-
#### Supported formats
140
+
###Overview
135
141
136
-
At the moment, the only supported format is `tar.zst`. The files are sent bundled as a tar ball, compressed with zstd. The details are up to the sender; a low compression level is recommended. Only the files requested by the sender must be sent, and only the bytes starting from the requested offset must be contained.
142
+
Both sides immediately negotiate a transit connection. Once established, they start communicating over it and close
143
+
the rendezvous connection. All messages over the relay connection are encoded using [msgpack](https://msgpack.org/) instead of JSON
144
+
to allow binary payloads. (All protocol examples in this document will use JSON for readability.)
137
145
138
-
### The structs in detail
146
+
- The sender starts by sending an offer. The receiver accepts it and receives the bytes.
147
+
- The receiver rejects the offer by closing the connection with an error.
148
+
- The connection is closed once all accepted files have been transferred (and checked).
139
149
140
-
#### Send offer
150
+
###Transit hints
141
151
142
-
File paths must be normalized and relative to the root of the sent folder. If the sender's file system does not support modification times, `mtime` must be constant (preferably `0`). Sending a file is the same as sending a directory with a single file. `directory-name` is the name of the directory being sent. It must be present unless `files` contains exactly one item. `files` must not be empty.
152
+
This is the first and (usually) also last message sent over the Wormhole connection.
153
+
As the first message, it is the distinguisher for version 2 file transfer. As the last message, all following communication uses the transit connection, encoded using `msgpack`.
154
+
Both sides then close their Wormhole connection as soon as transit is established.
155
+
The message type is `transit-v2` and it is equivalent to the v1 `transit` message,
156
+
except that it only contains the hints (the abilities have already been sent earlier).
143
157
144
158
```json
145
159
{
146
-
"offer-v2": {
147
-
"directory-name": "<string, optional>",
148
-
"files": [
149
-
{
150
-
"path": "<string>",
151
-
"size": "<integer>",
152
-
"mtime": "<integer>"
153
-
}
154
-
],
155
-
"transit-abilities": "<list, subset of ['direct-tcp-v1', 'relay-v1', 'tor-tcp-v1']>"
156
-
};
160
+
"transit-v2": {
161
+
"hints-v1": [ … ]
162
+
}
157
163
}
158
164
```
159
165
160
-
#### Receive ack
166
+
### Send offer
167
+
168
+
A send offer has only one entry, but which may contain a recursive directory
169
+
structure. If the top level entry is not a file, receiving clients may display
170
+
the offer either as single folder or as a list of files.
171
+
172
+
File names may be *arbitrary* (but UTF-8 encoded), it is up to the receiver to
173
+
sanitize them. Handling of unsupported file names is implementation speficit,
174
+
but could for example be realized through escaping or rejection of the offer.
161
175
162
-
`files` contains a mapping from file (index) to offset (bytes). If omitted, all files must be sent.
176
+
If the sender's file system does not support modification times, `mtime` must be constant (preferably `0`).
177
+
`files` must not be empty. If there are multiple files, `directory-name` may be set to mark
178
+
this transfer as directory instead of a loose collection of files. If it is not present, `path`
179
+
must have a depth of one, i.e. only contain the file name.
180
+
The `format` must be one that both sides support.
181
+
182
+
`type` must be one of `"regular-file"`, `"directory"` and `"symlink"`. Regular
183
+
files have an additional `size` field (in bytes) and a transfer `id`. Directories have a
184
+
`content` field, which contains a list of direct children. Symlinks have a
185
+
`target` path.
163
186
164
187
```json
165
188
{
166
-
"answer-v2": {
167
-
"files": {
168
-
"<integer>": "<integer>"
169
-
},
170
-
"transit-abilities": "<list of ability strings>"
171
-
}
189
+
"offer-v2": {
190
+
//"transfer-name": "<string, optional>",
191
+
"content": {
192
+
"type": "<string>",
193
+
"name": "<string>",
194
+
"mtime": "<integer>",
195
+
"format": "<string>",
196
+
…
197
+
},
198
+
}
172
199
}
173
200
```
174
201
175
-
#### Transit hints
202
+
If a transfer fails mid way, we don't want to re-transmit unnecessary data when
203
+
a second attempt is made. The idea is that when a transfer fails, the sender
204
+
stores the IDs along with the partially transferred data. On the second attempt,
205
+
the sender should reuse the trnasfer IDs so that the sender can tell it already
206
+
has part of the data, therefore only requesting what it does not yet have.
207
+
208
+
Transfer IDs are opaque strings to the receiver, how they are generated is an
209
+
implementation detail of the sender. However the following points should be taken
210
+
into consideration:
211
+
212
+
- Sending the same files or folder twice results in the same identifiers
213
+
- When making transfer IDs content adressed, they should not leak any information
214
+
about the data to anybody except the receiver.
215
+
- All hashes in use should be salted, the salt should be kept private by the
216
+
sender and rotate regularly.
217
+
- The transfer ID should have sufficiently high entropy to avoid collisions.
218
+
- At least 256 bits are recommended
219
+
- Due to the purpose of allowing retransfers, no data
220
+
- Since the goal is to facilitate retransfers after a failure, no further
221
+
information needs to be stored on success.
222
+
- Retransfers after failure are expected to happen more or less immediately. The
223
+
data needs not be kept around longer than a few hours, at most days.
224
+
- False negatives lead to additional retransfer of data, while false positives
225
+
result in a transfer failure due to hash mismatch. Therefore, try to keep the
226
+
ID generation as conservative as possible.
227
+
- Simply using fresh random IDs for everything is an acceptable strategy.
228
+
229
+
### Receive ack
230
+
231
+
`files` contains a mapping from transfer ID to offset (bytes).
232
+
An offer may be rejected using an `error` message.
176
233
177
-
Note that the hints for abilities added in the future might follow a different schema. The discriminant is `type`.
234
+
```json
235
+
{
236
+
"answer-v2": {
237
+
"files": {
238
+
"<string>": "<integer>"
239
+
},
240
+
}
241
+
}
242
+
```
243
+
244
+
### Payload transfer
245
+
246
+
After receiving the ack, the sender transfers the payload according to the `format`. For each file, the data stream
247
+
must start at the offset requested by the receiver. A `payload-v2` message contains only the (compressed) bytes as value.
178
248
179
249
```json
180
250
{
181
-
"transit-v2": [
182
-
{
183
-
"type": "<ability string>",
184
-
"hostname": "<string>",
185
-
"port": "<tcp port>",
186
-
"priority": "<number, usually [0..1], optional, default 0.5>"
187
-
},
188
-
]
251
+
"payload-v2": {
252
+
"id": "<string>",
253
+
"payload": "<bytes>",
254
+
}
189
255
}
190
256
```
191
257
192
-
#### Checksums
258
+
The payload must not exceed 64kiB per message. The sender keeps track of the received bytes (after
259
+
decompression according to the format), and errors out if the sender exceeds the announced amount by more than 5%. Note that due to
260
+
file system smear, sending a different amount of bytes than announced is rather common (hence
261
+
the 5%). Errors will be caught using checksums later on.
262
+
263
+
### Checksums
193
264
194
-
`tar-file-sha256` is the lowerhex-encoded sha256sum of all transferred bytes of the tar file.
265
+
At the end of the transfer, *both* sides send their checksums. That way, they do not need to communicate any further
266
+
to exchange their opinion: they can both calculate themselves whether things went wrong or not and only need to notify
267
+
the user. Once the checksums are exchanged, the transfer is complete and the connection is closed.
195
268
196
-
TODO maybe some per file integrity check?
269
+
There is a per file integrity check. `wire-sha256` is the (binary) sha256sum of all transferred payload bytes (i.e. before decompression). `sha256` is the sha256sum of the *entire* file, including bytes before the resumption offset.
197
270
198
271
```json
199
272
{
200
-
"transfer-ack-v2": {
201
-
"tar-file-sha256": "<string>"
202
-
}
273
+
"transfer-ack-v2": {
274
+
"wire-sha256": "<bytes>",
275
+
"files": [
276
+
{
277
+
"id": "<string>",
278
+
"size": "<integer>",
279
+
"sha256": "<bytes>",
280
+
}
281
+
],
282
+
}
203
283
}
204
284
```
205
285
286
+
### A note about file system handling
287
+
288
+
File systems are hard. To achieve consistent and sane behavior across implementations and
289
+
systems, applications should pay attention to the following details:
290
+
291
+
- Symlinks are preserved by default when sending directories
292
+
- Hardlinks and reflinks may be resolved/duplicated at any point
293
+
- Permissions are not preserved by default (use rsync for that instead).
294
+
- The sender's mtime should be preserved, unless it is zero
295
+
- Extended file attributes (xattrs) are not preserved
296
+
- Files may have been modified between transfers. Checking the modification time
297
+
is necessary, but not sufficient.
298
+
- To avoid file system hacking: The receiver must check for malicious file paths
299
+
and invalid/unsupported character sequences. Symlinks *must not* be followed.
300
+
301
+
### When to resume
302
+
303
+
On a failed attempt, the receiver may decide to keep the partially transferred data in the
304
+
anticipation of the transfer being tried again soon. The receiver can use the `answer` message
305
+
to exert some control over which bytes the sender will send again. It is also free to decide
306
+
when a transfer should be resumed instead of being started anew. However, not every failure
307
+
may be recovered from, forcing a full retransfer:
308
+
309
+
-
310
+
311
+
### Random notes
312
+
206
313
## Future Extensions
207
314
208
-
* "command mode": establish the connection, *then* figure out what we want to
209
-
use it for, allowing multiple files to be exchanged, in either direction.
210
-
This is to support a GUI that lets you open the wormhole, then drop files
211
-
into it on either end.
212
315
* some Transit messages being sent early, so ports and Onion services can be
213
316
spun up earlier, to reduce overall waiting time
214
317
* transit messages being sent in multiple phases: maybe the transit
0 commit comments