Description
The following is a draft of a protocol extension.
Versions:
- 2013-04-25: initial draft
File-Transfer Protocol (info-dict extension)
The Magic Wormhole File-Transfer Protocol involves two stages. In the first, a Wormhole connection is mediated between the sender and the receiver by a third party Rendezvous Server. The connection is established by a PAKE which results in encrypted communications not readable by a third party, including the server.
In the first stage, at present, the sender provides an offer to the receiver. This offer is currently one of three types:
message
for a text messagefile
for sending a single filedirectory
for sending a directory of files compressed into a single archive file
If the receiver accepts the offer, the protocol moves into the second stage. The transit protocol involves the transfer and validation of the message, file, or archived directory (as appropriate) over a different connection, which is created using connection hints sent over the Wormhole. Once this transit connection is created, the Wormhole is typically closed.
This extension to the File-Transfer protocol involves two components:
- The addition of an
info
key (and associated values) to theoffer
message - A specification for transit messages for managing transfers related to
info
offers
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
The info-dict offer
The info
dictionary offer is based on the info dictionary from the BitTorrent protocol, and it is intended that info dictionaries from compliant BitTorrent v1 torrents also be valid under this specification, when encoded as JSON.
An info-dict offer enables three features that are not available in other modes:
- Sending whole directories directly between sender and receiver without requiring archiving or compression
- Partial downloads for the case where the receiver only wants some of the files that the sender is offering
- Trivial resumption of partially complete downloads by comparing existing piece hashes to those made available by the sender
To offer an info-dict for download, the sender SHALL include an info
key in the offer
dictionary. This info
key SHALL include the following top level keys:
name
is the suggested file name (if the offer contains a single file) or directory name (if the offer contains multiple files).piece length
is an integer and is the base 2 logarithm of the piece size (as defined below) in bytes.pieces
is a string containing the concatenated hashes of every piece in the offer. The order of the hashes is the order of the files (if the offer contains multiple files), and then in-order through each file from beginning to end. The pieces are the offered file set split into chunks (which can span across files) of size equal to the piece size, which is2 ^ piece length
.
The info
key SHALL also contain either but not both of the following keys:
length
is the exact number of bytes in the file, if the offer contains only one file (with the name given by thename
key).files
specifies a list of files provided by the offer, if the offer contains multiple files (with the top level directory given by thename
key). The ordering of the files is significant as it specifies the relationship between the pieces and the files. Each item in the list is a dictionary that SHALL contain the following two keys:length
is the number of bytes in the filepath
is a list of strings providing the path to the file under the top level directory. In this list, the last string is the suggested name of the file. Each string before this final string (if any) is the name of a directory contained by the directory immediately preceding it, and the first directory in the list (if any) is contained by the top level directory.
Receivers SHALL either ignore or replace characters in the path
that are invalid on their operating system or file system, or reject the offer if it contains these characters. In addition, receivers SHALL NOT interpret any path
component that would cause directory traversal (such as a ".." component on some systems) or placing files outside the top level directory.
The info
key SHOULD also contain the following key:
hashtype
specifies the hash that is used for the piece hashes provided in thepieces
string.
Receivers compliant with this specification SHALL support the following values for hashtype
: "sha256", "blake", "blake160", and "sha1". Receivers MAY support other hash types. The "sha256", "blake", and "sha1" values indicate that the corresponding hashes are standard SHA-256, BLAKE2b, and SHA-1 hashes (respectively) with the default digest size. The value "blake160" is the BLAKE2b hash function with a digest size of 20 bytes. This was chosen to correspond to the hash size of the SHA-1 function (which is used by the BitTorrent protocol), while retaining excellent resistance to collision attacks.
An info-dict offer with no provided hashtype
SHALL be interpreted to have a hash type of SHA-1 for historical compatibility. However, senders SHOULD provide a hashtype
value and SHOULD NOT use the SHA-1 hash.
Senders and receivers SHOULD NOT limit the piece size beyond the expected limitations of the hardware they run on. It is RECOMMENDED that senders default to a 64 MiB piece size (2^26 bytes). Where the final piece of the last offered file does not coincide with the exact piece size boundary, the hash for the piece SHALL be the hash of the actual data, with no padding.
The receiver SHALL indicate acceptance of an info-dict offer in the same way as for other offers under the File-Transfer Protocol.
Transit protocol extensions for info-dict support
This specification is intentionally opaque about the nature of the transit protocol. The only requirement is that the protocol support both transfer of binary data as well as JSON-encoded control messages, and that both the sender and receiver be able to distinguish the two.
In particular, it is not specified whether the connection happens directly or through a relay server, whether the connection is TCP or UDP, or whether a single stream or multiple simultaneous streams are used.
However, typical connections will be established using connection hints as specified in the File-Transfer Protocol specification, and they are expected to be encrypted using secrets exchanged through the rendezvous connection. See the Wormhole Transit Protocol specification for more information.
Clients compatible with this specification add support for several message types over the transit protocol.
Receiver size hints
Immediately upon establishing a transit connection, a receiver SHOULD send a message containing a wants
key. If provided, this key MUST contain a value indicating the exact number of bytes from the offer that the receiver expects to request. A client on the sending side SHOULD use this information to provide an accurate indication of progress, if the client provides progress indicators.
If for any reason (except for checksum validation errors) the number of bytes the receiver expects to download changes, the receiver SHOULD send an updated wants
message. These messages MUST contain the total number of bytes the receiver expects from the entire transfer, including from pieces already downloaded. Receivers that send this message MUST NOT double-count bytes from pieces that fail checksum validation or are otherwise downloaded multiple times.
Receiver requested pieces
Pieces are sent by the sender only when they are requested by the receiver. Receivers queue up pieces to be sent with a request message. Receivers SHOULD keep enough requests queued up that they are not left waiting for data between downloading pieces. Requests messages SHALL take the following form:
{
"req": [
0,
1
]
}
Here, the numbers indicate the (zero-indexed) offset to the pieces provided in the offer. Note that the sender and receiver can determine both the byte offset (in the set of offer files) and the hash offset (in the pieces
string), because both the piece size and hash digest size are defined in the offer
.
Receivers SHOULD always request the pieces they want in numerical order. Requesting data sequentially through the files allows for more efficient, predictable i/o on many systems.
Upon receiving a request for a piece, the sender SHALL send it through the transit protocol in the appropriate manner for binary data.
Accepting / rejecting / re-sending pieces
The sender SHALL check the hash digest given in the offer for each piece as it comes in, and SHALL reject any piece that does not match, unless strong mitigating circumstances prevail. Examples of such circumstances include that the sender has an incorrect or incomplete copy of the file, and the user / operator of the receiver has actively requested to accept data that fails a checksum error. If such circumstances are expected to occur, receiver software MAY choose to implement support for ignoring checksum failures, with an appropriate warning.
When a piece fails a check, a receiver MAY choose to request the same piece again. Senders are RECOMMENDED to provide a piece again if requested. Either side MAY choose to hang up the connection if a request repeatedly fails.
Acknowledgements
When a piece succeeds, the receiver SHALL send an acknowledgement in the following form:
{
"ack": [
0,
1
]
}
Note that the receiver MAY send individual acknowledgements for each piece separately, but if multiple pieces enter the finished state before it sends an acknowledgement, it MAY acknowledge both at once as shown above.
Ending the connection
At any point, the receiver MAY hang up the connection with a success indication by sending
{
"ack": "ok"
}
FAQ
-
Why include support for legacy hashes like SHA-1?
The intention is to make adding support for this protocol extension as easy as possible for implementers. A large quantity of software already exists for creating handling BitTorrent format info dictionaries, and using this software is likely to be the quickest way to implement support in many cases. Furthermore, collision resistance is rarely relevant to file transfer cases. Preimage resistance is far more important, and SHA-1 retains this. Other than the
blake160
hash with a non-standard digest length, SHA-1 also has the shortest digest of any hash with REQUIRED support in this specification. Shorter hashes make for more efficientinfo
dictionaries. -
Is this implementing BitTorrent support for Magic Wormhole?
No. This protocol extension provides a new
offer
format that allows sending a set of files between a single sender and (usually) one receiver, where the metadata provided for the offer is compatible with that used by the BitTorrent info-dict specification, but the protocols are otherwise unrelated. -
What does this achieve that Magic Wormhole cannot achieve without it? Is using BitTorrent a better choice for this use case?
As mentioned above, this allows sending multiple files without involving the overhead of an archive format, as well as partial downloads and updates to previously shared data. BitTorrent is not a plausible alternative to this use case. In particular, with this extension, Magic Wormhole implements:
-
a highly secure connection between a sender and receiver. BitTorrent does not support modern, secure forms of encryption between clients
-
an efficient transport mechanism for one-to-one and one-to-many transfers, thanks the opacity of the file transfer protocol to the underlying transit protocol. BitTorrent is optimized for the many-to-many case, and only creates a single TCP or UDP connection between pairs of peers.
-
a conversation establishing mechanism for two peers who want to talk to each other, and no one else, via the mailbox protocol. BitTorrent would require two peers to know each others' IP addresses and does not provide any mechanism for authentication.
-
-
Why emphasize the opacity of the transit protocol so much?
This feature gives Wormhole clients a lot of flexibility and potential for speed. The author of this extension specification is also working on a transit protocol extension that would allow two clients to keep open multiple transit connections between them and use them simultaneously when exchanging binary data (e.g. the pieces in this specification). Parallel transfers frequently offer an enormous speedup over sequential ones. Hopefully, with both extensions in place, Wormhole clients will be capable of multiple-Gbps transfers on commodity hardware.