A pipeline system

I'm going to sketch out a pipeline/middleware system for `quick_xml` to explore whether there is any interest in taking this further. I started to build this but realized that it's difficult to implement without access to `quick_xml` internals, in particular the `NamespaceResolver` infrastructure. Pipelines are still possible without this infrastructure, but it requires writing to intermediate XML strings and parsing them again.

Here's what I envision:

## Use cases

- a `quick_xml` reader write canonicalized XML output. It'd like to layer that over a quick_xml Serde serializer without having to write to a string first.

- I want to take just an element and all its descendants in XML and post-process it (for instance applying canonicalization). I'd like to do that without having to serialize that to a separate XML string as an intermediate step.

- `quick_xml` deserializer isn't namespace aware. You can use prefixes in annotations in Rust types, but what if our input uses other prefixes? We could inject middleware that rewrites the prefixes to the set known by our types.

## Traits

### Reader trait

There's an abstract Reader trait. In reality we may need multiple ones to cover all the bases - a slice reader, a buffered reader, a slice ns reader, and a buffered ns reader.

The idea is that you can use this instead of a concrete reader.

I'm most interested in the most complex scenario, I have access to namespace prefixes and the like. In the rest of the story I will pretend there's only a single reader just for convenience (and it might a place to start anyway).

The ns reader trait should implement `prefixes`, and the various `resolve_` methods.

### Writer trait

Similarly, there's a  Writer trait. This can be much simpler - it takes write_event. I think it also needs something to set up initialize known namespace prefixes (more about why later), and perhaps it's handy to be able to ask whether writing has already begun so we know whether to initialize or not.

## Interesting implementations

### (Ns)Reader

(Ns)Reader implements the Reader trait.

### Writer

The Writer implements the Writer. This is so that our pipeline can end if we want XML output.

It may have a special feature to take the prefixes it gets and declares them on the outer element it writes if it doesn't already have such declarations. (or this may be in a little middleware).

### Pipeline step

A pipeline step takes an Reader, a Writer (trait implementations) and pulls in stuff from the reader and writes to the writer. Very similar to what you'd do now from a concrete reader to a writer.

### Buffer

A pipeline step may take a single event and ignore it, or split it into multiple events. We don't want complicated state management inside of pipeline steps; they should just deal with readers and writer. So we need something that implements both Reader and Writer. This buffers events in a deque. 

When a pipeline pulls from a buffer, the buffered events will be returned first, until the buffer is empty. Then the buffer invokes its reader to put more events into its buffer.

It also manages namespaces separately using `NamespaceResolver`. Because pipeline steps could do interesting things to namespaces (this is in fact one of my use cases in canonicalization and prefix rewriting). A buffer can also be initialized with prefixes when writing starts, because a buffer may apply to a subset of the whole document.

### Splitter

This splits a single stream into events into multiple streams of events, based on some criterion on `BytesStart` (and namespace info). All events until the end tag will be streamed to a specific pipeline. This way you can efficiently select one or more parts of the document for further processing.

This takes a hashmap of pipeline names to Writer implementations (a hashmap as how many pipelines should exist can in many cases only be determined at runtime) and a function that given a `BytesStart` and namespace information can determine which pipeline name it belongs to (or should not be piped through at all). By taking Writer implementation we can put in a Buffer with a pipeline step under it.

### Namespace resolver

Right now `NsReader` already is a bit of a pipeline step on top of `Reader`. If we had a "namespace resolving" pipeline step we could generalize that. You could start a pipeline without namepaces, and add it as needed. I'm not entirely sure this is worth it, as I still think you need a reader trait that supports `prefixes` and `resolve_` as you'd want to write your pipeline steps against those. 

## Related topics

This relates to #611 and #881 as those would enable pipeline support for Serde (de)serialization.
 
## Next steps

We need to answer a bunch of questions:

- Do want to support these use cases with `quick_xml` at all?

- Would it make sense to have this implemented in `quick_xml` or by another crate?

- If by another crate, can we make `quick_xml` open up its APIs sufficiently to support this? The big blocker is `NamespaceResolver`, as without it, it becomes really difficult to implement `Buffer`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A pipeline system #890

Use cases

Traits

Reader trait

Writer trait

Interesting implementations

(Ns)Reader

Writer

Pipeline step

Buffer

Splitter

Namespace resolver

Related topics

Next steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A pipeline system #890

Description

Use cases

Traits

Reader trait

Writer trait

Interesting implementations

(Ns)Reader

Writer

Pipeline step

Buffer

Splitter

Namespace resolver

Related topics

Next steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions