Skip to content

Proposal: info-dict extension to the File-Transfer Protocol #39

Open
@afontenot

Description

@afontenot

The following is a draft of a protocol extension.

Versions:

  • 2013-04-25: initial draft

File-Transfer Protocol (info-dict extension)

The Magic Wormhole File-Transfer Protocol involves two stages. In the first, a Wormhole connection is mediated between the sender and the receiver by a third party Rendezvous Server. The connection is established by a PAKE which results in encrypted communications not readable by a third party, including the server.

In the first stage, at present, the sender provides an offer to the receiver. This offer is currently one of three types:

  • message for a text message
  • file for sending a single file
  • directory for sending a directory of files compressed into a single archive file

If the receiver accepts the offer, the protocol moves into the second stage. The transit protocol involves the transfer and validation of the message, file, or archived directory (as appropriate) over a different connection, which is created using connection hints sent over the Wormhole. Once this transit connection is created, the Wormhole is typically closed.

This extension to the File-Transfer protocol involves two components:

  • The addition of an info key (and associated values) to the offer message
  • A specification for transit messages for managing transfers related to info offers

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

The info-dict offer

The info dictionary offer is based on the info dictionary from the BitTorrent protocol, and it is intended that info dictionaries from compliant BitTorrent v1 torrents also be valid under this specification, when encoded as JSON.

An info-dict offer enables three features that are not available in other modes:

  • Sending whole directories directly between sender and receiver without requiring archiving or compression
  • Partial downloads for the case where the receiver only wants some of the files that the sender is offering
  • Trivial resumption of partially complete downloads by comparing existing piece hashes to those made available by the sender

To offer an info-dict for download, the sender SHALL include an info key in the offer dictionary. This info key SHALL include the following top level keys:

  • name is the suggested file name (if the offer contains a single file) or directory name (if the offer contains multiple files).
  • piece length is an integer and is the base 2 logarithm of the piece size (as defined below) in bytes.
  • pieces is a string containing the concatenated hashes of every piece in the offer. The order of the hashes is the order of the files (if the offer contains multiple files), and then in-order through each file from beginning to end. The pieces are the offered file set split into chunks (which can span across files) of size equal to the piece size, which is 2 ^ piece length.

The info key SHALL also contain either but not both of the following keys:

  • length is the exact number of bytes in the file, if the offer contains only one file (with the name given by the name key).
  • files specifies a list of files provided by the offer, if the offer contains multiple files (with the top level directory given by the name key). The ordering of the files is significant as it specifies the relationship between the pieces and the files. Each item in the list is a dictionary that SHALL contain the following two keys:
    • length is the number of bytes in the file
    • path is a list of strings providing the path to the file under the top level directory. In this list, the last string is the suggested name of the file. Each string before this final string (if any) is the name of a directory contained by the directory immediately preceding it, and the first directory in the list (if any) is contained by the top level directory.

Receivers SHALL either ignore or replace characters in the path that are invalid on their operating system or file system, or reject the offer if it contains these characters. In addition, receivers SHALL NOT interpret any path component that would cause directory traversal (such as a ".." component on some systems) or placing files outside the top level directory.

The info key SHOULD also contain the following key:

  • hashtype specifies the hash that is used for the piece hashes provided in the pieces string.

Receivers compliant with this specification SHALL support the following values for hashtype: "sha256", "blake", "blake160", and "sha1". Receivers MAY support other hash types. The "sha256", "blake", and "sha1" values indicate that the corresponding hashes are standard SHA-256, BLAKE2b, and SHA-1 hashes (respectively) with the default digest size. The value "blake160" is the BLAKE2b hash function with a digest size of 20 bytes. This was chosen to correspond to the hash size of the SHA-1 function (which is used by the BitTorrent protocol), while retaining excellent resistance to collision attacks.

An info-dict offer with no provided hashtype SHALL be interpreted to have a hash type of SHA-1 for historical compatibility. However, senders SHOULD provide a hashtype value and SHOULD NOT use the SHA-1 hash.

Senders and receivers SHOULD NOT limit the piece size beyond the expected limitations of the hardware they run on. It is RECOMMENDED that senders default to a 64 MiB piece size (2^26 bytes). Where the final piece of the last offered file does not coincide with the exact piece size boundary, the hash for the piece SHALL be the hash of the actual data, with no padding.

The receiver SHALL indicate acceptance of an info-dict offer in the same way as for other offers under the File-Transfer Protocol.

Transit protocol extensions for info-dict support

This specification is intentionally opaque about the nature of the transit protocol. The only requirement is that the protocol support both transfer of binary data as well as JSON-encoded control messages, and that both the sender and receiver be able to distinguish the two.

In particular, it is not specified whether the connection happens directly or through a relay server, whether the connection is TCP or UDP, or whether a single stream or multiple simultaneous streams are used.

However, typical connections will be established using connection hints as specified in the File-Transfer Protocol specification, and they are expected to be encrypted using secrets exchanged through the rendezvous connection. See the Wormhole Transit Protocol specification for more information.

Clients compatible with this specification add support for several message types over the transit protocol.

Receiver size hints

Immediately upon establishing a transit connection, a receiver SHOULD send a message containing a wants key. If provided, this key MUST contain a value indicating the exact number of bytes from the offer that the receiver expects to request. A client on the sending side SHOULD use this information to provide an accurate indication of progress, if the client provides progress indicators.

If for any reason (except for checksum validation errors) the number of bytes the receiver expects to download changes, the receiver SHOULD send an updated wants message. These messages MUST contain the total number of bytes the receiver expects from the entire transfer, including from pieces already downloaded. Receivers that send this message MUST NOT double-count bytes from pieces that fail checksum validation or are otherwise downloaded multiple times.

Receiver requested pieces

Pieces are sent by the sender only when they are requested by the receiver. Receivers queue up pieces to be sent with a request message. Receivers SHOULD keep enough requests queued up that they are not left waiting for data between downloading pieces. Requests messages SHALL take the following form:

{
    "req": [
        0,
        1
    ]
}

Here, the numbers indicate the (zero-indexed) offset to the pieces provided in the offer. Note that the sender and receiver can determine both the byte offset (in the set of offer files) and the hash offset (in the pieces string), because both the piece size and hash digest size are defined in the offer.

Receivers SHOULD always request the pieces they want in numerical order. Requesting data sequentially through the files allows for more efficient, predictable i/o on many systems.

Upon receiving a request for a piece, the sender SHALL send it through the transit protocol in the appropriate manner for binary data.

Accepting / rejecting / re-sending pieces

The sender SHALL check the hash digest given in the offer for each piece as it comes in, and SHALL reject any piece that does not match, unless strong mitigating circumstances prevail. Examples of such circumstances include that the sender has an incorrect or incomplete copy of the file, and the user / operator of the receiver has actively requested to accept data that fails a checksum error. If such circumstances are expected to occur, receiver software MAY choose to implement support for ignoring checksum failures, with an appropriate warning.

When a piece fails a check, a receiver MAY choose to request the same piece again. Senders are RECOMMENDED to provide a piece again if requested. Either side MAY choose to hang up the connection if a request repeatedly fails.

Acknowledgements

When a piece succeeds, the receiver SHALL send an acknowledgement in the following form:

{
    "ack": [
        0,
        1
    ]
}

Note that the receiver MAY send individual acknowledgements for each piece separately, but if multiple pieces enter the finished state before it sends an acknowledgement, it MAY acknowledge both at once as shown above.

Ending the connection

At any point, the receiver MAY hang up the connection with a success indication by sending

{
    "ack": "ok"
}

FAQ

  1. Why include support for legacy hashes like SHA-1?

    The intention is to make adding support for this protocol extension as easy as possible for implementers. A large quantity of software already exists for creating handling BitTorrent format info dictionaries, and using this software is likely to be the quickest way to implement support in many cases. Furthermore, collision resistance is rarely relevant to file transfer cases. Preimage resistance is far more important, and SHA-1 retains this. Other than the blake160 hash with a non-standard digest length, SHA-1 also has the shortest digest of any hash with REQUIRED support in this specification. Shorter hashes make for more efficient info dictionaries.

  2. Is this implementing BitTorrent support for Magic Wormhole?

    No. This protocol extension provides a new offer format that allows sending a set of files between a single sender and (usually) one receiver, where the metadata provided for the offer is compatible with that used by the BitTorrent info-dict specification, but the protocols are otherwise unrelated.

  3. What does this achieve that Magic Wormhole cannot achieve without it? Is using BitTorrent a better choice for this use case?

    As mentioned above, this allows sending multiple files without involving the overhead of an archive format, as well as partial downloads and updates to previously shared data. BitTorrent is not a plausible alternative to this use case. In particular, with this extension, Magic Wormhole implements:

    • a highly secure connection between a sender and receiver. BitTorrent does not support modern, secure forms of encryption between clients

    • an efficient transport mechanism for one-to-one and one-to-many transfers, thanks the opacity of the file transfer protocol to the underlying transit protocol. BitTorrent is optimized for the many-to-many case, and only creates a single TCP or UDP connection between pairs of peers.

    • a conversation establishing mechanism for two peers who want to talk to each other, and no one else, via the mailbox protocol. BitTorrent would require two peers to know each others' IP addresses and does not provide any mechanism for authentication.

  4. Why emphasize the opacity of the transit protocol so much?

    This feature gives Wormhole clients a lot of flexibility and potential for speed. The author of this extension specification is also working on a transit protocol extension that would allow two clients to keep open multiple transit connections between them and use them simultaneously when exchanging binary data (e.g. the pieces in this specification). Parallel transfers frequently offer an enormous speedup over sequential ones. Hopefully, with both extensions in place, Wormhole clients will be capable of multiple-Gbps transfers on commodity hardware.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions