VIP 17: Enable Unix domain sockets for listen and backend addresses

Synopsis

Allow Unix Domain Sockets (UDS) as listen addresses for Varnish (-a, -T and -M options) and as addresses for backends. Ideally also obtain credentials of the peer process connected on a UDS, such as uid and gid, for use in VCL.

Named listen addresses

This is not directly related to UDS, but this change would solve some of the problems and mitigate some the complexity raised by the original draft. Because this change has already been accepted, there is no VIP to link to, and no documentation to refer to until it is implemented. For convenience it is described here.

Influence

This feature is similar to how storage backends are exposed in VCL, they have a name that can then be used in VCL, and when a name is omitted, generic names are attributed (s0, s1, sN etc).

Example: varnishd -s malloc,10G -s video=malloc,100G [...]

You end up with 3 storage backends called s0, video and Transient, and as such have access in VCL to the following symbols and their respective fields:

storage.s0
storage.video
storage.Transient
(and storage.<name>.*, see man vcl)

You can then have this kind of logic in VCL:

sub vcl_backend_response {
    if (beresp.http.content-type ~ "video") {
        set beresp.storage = storage.video;
    } else {
        set beresp.storage = storage.s0;
    }
}

The advantage of beresp.storage over beresp.storage_hint is the strong typing guaranteeing that VCL won't compile if there is a typo in the storage name.

Implementation

Named listen addresses will work like storage backends in that regard (generic names being a0, a1, aN etc).

Example: varnishd -a public_http=:80 -a public_https=:8443,PROXY -a admin=:1234 [...]

You can then use the logical names in your VCL:

sub vcl_recv {
    if (local.address == listen_address.public_http) {
        # do an https redirect for example
    }
    if (req.method == "PURGE") {
        if (local.address != listen_address.admin) {
            return (synth(405));
        }
        return (purge);
    }
}

Actual names of the variables used to access this information in VCL hasn't been decided yet.

The benefits are the ability to reuse the same VCL when all varnishd instances in a cluster may not be able to provide consistent listen interfaces or port numbers.

String conversion

the string conversion section hasn't been discussed yet

Objects of type listen_address could be used where strings are expected and be converted to the address part of the -a option (that is, excluding the parameters).

Example: varnishd -a public_http=:80 -a public_https=:8443,PROXY -a admin=:1234 [...]

sub vcl_deliver {
    set resp.http.Address = local.address;
}

A non-synthetic response may contain one of the following headers:

Address: :80
Address: :8443

In the case of unix domain sockets, automatic conversion to a string could be used for regular expression matching of the paths for example:

sub vcl_recv {
    if (req.method == "PURGE") {
        # there may be more than one admin UDS
        if (local.address !~ "admin\.sock$") {
            return (synth(405));
        }
        return (purge);
    }

}

Security concerns

This is not a security feature despite what all the examples above may suggest. Using this as a security measures implies the assumption that the network is actually secured before traffic hits Varnish on the admin listen address for example (firewalls and all that jazz).

phk: I don't agree entirely, the root@ may want to restrict the paths to backends.

dridi: I'm not sure I understand, this is not about UDS yet, only named listen addresses in general.

Testing

We can expose additional macros for listen addresses. For example with a v1 varnish instance:

v1_addr: the first listen address
v1_port: the first listen port
v1_sock: the first listen address+port
v1_addr_a0: a0's listen address
v1_port_a0: a0's listen port
v1_sock_a0: a0's listen address+port

Benefits

Once again strong typing, because port numbers in VCL and in the varnishd command line may get out of sync without being noticed. Here a typo in the name prevents the VCL from compiling. It's also a transport-independent alternative to ACLs, as shown in the purge example above.

Being transport-independent, it also means that it can accommodate future transports, like for example unix domain sockets described below.

Why?

The main reason to use a UDS is that it works like TCP sockets (reliable bidirectional byte stream behind a file descriptor) and would likely not be too intrusive in the existing code base.

Other noteworthy reasons:

Eliminate the overhead of TCP/loopback for connections with peers that are co-located on a host with Varnish
The possibility to query the peer process credentials and restrict access using regular filesystem permissions

A common case for co-locating Varnish with a peer is the need of a TLS proxy for HTTPS. On both client and backend sides, a UDS should work seamlessly with the PROXY protocol.

How?

Listen address notation

On the listen side, expecting an absolute path would prevent ambiguity with IP addresses or ports:

varnishd -a /path/to/http.sock -T /path/to/cli.sock [...]

As it is common with other varnishd options, we can pass additional parameters:

varnishd -T /path/to/cli.sock,uid=varnish,gid=varnish

However this introduces an ambiguity for PROXY protocol in the -a option. The syntax can be changed to:

varnishd -a /path/to/http.sock,proto=<proto>,uid=varnish,mode=0600 [...]

The -M option being of the connect persuasion, it wouldn't take additional parameters to the absolute path.

Backend address notation

On the backend side we can avoid ambiguity by introducing a new .path field:

backend local {
    .path = "/path/to/backend.sock";
    # or maybe .unix or .uds instead?
}

The .path field would be enough in itself to declare a backend (like .host) and would be mutual exclusive with .host and .port.

By adding a parameter (for example uds_path) akin to vcl_path and vmod_path to maintain a PATH where to look sockets up we could allow relative paths on the backend side.

Peer credentials

Getting the peer credentials is not portable, and the least common denominator seems to be the euid and egid. We probably want to extract them both as names and numbers. See Geoff's draft for the technical details.

VCL/VRT

The backend notation was already described above, but filed under the "notation" category rather than VCL. This section is more about the VCL changes in the context of a transaction.

IP addresses

The obvious implication of a UDS listen address is the lack of values for the *.ip variables (same on the backend side for beresp.backend.ip).

This could be solved by making all uses of VCL_IP gracefully fail in the presence of a NULL IP address. So an ACL match '~' would always fail and a negative match '!~' would always succeed.

What happens when a UDS gets IP addresses from a PROXY header? One solution could be to set server.ip and client.ip accordingly and leave the local.ip and remote.ip variables NULL. It would preserve this pattern:

sub vcl_recv {
    if (local.ip != server.ip) {
        # PROXY protocol detected
        set req.X-Forwarded-Proto = "https"; # for instance
    }
}

port, euid, egid

Much like we may access port numbers via *.ip variables, we want to access credentials of a UDS peer. We can do that using the std VMOD.

In the case of std.port, it could fail gracefully like ACLs when a NULL IP address is submitted by returning -1.

The std VMOD could then learn new functions:

std.uid
std.gid
std.uid_name
std.gid_name

Example:

import std;

sub vcl_recv {
    std.log("euid: " + std.uid(local.address));
}

If local.address is not a UDS, numeric variants could also return -1 and name variants could return NULL. The functions could also take fallback parameters, possibly with a default value to the ones suggested (-1 and NULL).

The consensus seems to lean towards naming functions by omitting the "effective" e from e[ug]id.

beresp.backend.ip

This variable should obviously be NULL in the case of a UDS backend if we follow the rules described above. However it is already possible to write a backend implementation not based on TCP/IP (see fsbackend for example) and NULL seems to already be the way to go.

The question here is more whether we need something like beresp.backend.path in addition to the ip field. Same question for peer credentials, they probably don't make sense for backends (and that would keep the new std functions limited to the listen addresses type).

local.address == listen_address.<name>

For std.uid to provide anything useful, we need a peer that a static listen_address.<name> has no reason to have. To enable strong typing, the == operator should be backed by a VRT function that checks for equivalence except for the peer. The structs behind listen_address.<name> could have a negative file descriptor for the peer for example.

Another possible useful VRT function would be to find the corresponding listen_address.<name> of a local.address for VMODs looking for a safe pointer outliving a transaction.

Needs further discussions

phk: What happens to struct suckaddr ? We added that to avoid lugging around sockaddr_storage all over the place and it shaves something like 4x96 bytes off the size of a session ?

dridi: In the case of a UDS, we can keep track of the sockaddr_un with the rest of the -a parameters and use a pointer to that "pseudo-static" struct in the suckaddr union. That shouldn't increase the overall size.

phk: On the VCL side, what happens if in the future a jail performs a chroot? Users would have similar problems with today's std.fileread.

dridi: That would indeed be a problem for backends.

phk: During the first planning session for Varnish 6 we agreed that UDS addresses would be kept separate from suckaddr. (How?)

dridi: See question 1, then we can figure what to do in code branching on the suckaddr type.

dridi: Is the question of naming from the original draft still relevant?
phk: What happens if the VCL asks for remote.ip.port() ?

dridi: I'm supposed to answer that in the VCL/VRT section but I haven't yet. I need to browse the planning session logs because I think we agreed that with the lack of IP address, *.ip should be NULL and IP-related facilities (eg. ACLs, std.port...) should gracefully fail if they encounter NULL.

phk: What happens if the VCL asks for remote.ip.uid() on a IPv4/6 socket ?

dridi: Same as question 5, although with subtle differences. In both questions the syntax is wrong anyway.

dridi: the section on beresp.backend.ip needs further discussion too.

VIP 17: Enable Unix domain sockets for listen and backend addresses

Synopsis

Named listen addresses

Influence

Implementation

String conversion

Security concerns

Testing

Benefits

Why?

How?

Listen address notation

Backend address notation

Peer credentials

VCL/VRT

IP addresses

port, euid, egid

beresp.backend.ip

local.address == listen_address.<name>

Needs further discussions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally