-
Notifications
You must be signed in to change notification settings - Fork 260
Description
I'm going to sketch out a pipeline/middleware system for quick_xml
to explore whether there is any interest in taking this further. I started to build this but realized that it's difficult to implement without access to quick_xml
internals, in particular the NamespaceResolver
infrastructure. Pipelines are still possible without this infrastructure, but it requires writing to intermediate XML strings and parsing them again.
Here's what I envision:
Use cases
-
a
quick_xml
reader write canonicalized XML output. It'd like to layer that over a quick_xml Serde serializer without having to write to a string first. -
I want to take just an element and all its descendants in XML and post-process it (for instance applying canonicalization). I'd like to do that without having to serialize that to a separate XML string as an intermediate step.
-
quick_xml
deserializer isn't namespace aware. You can use prefixes in annotations in Rust types, but what if our input uses other prefixes? We could inject middleware that rewrites the prefixes to the set known by our types.
Traits
Reader trait
There's an abstract Reader trait. In reality we may need multiple ones to cover all the bases - a slice reader, a buffered reader, a slice ns reader, and a buffered ns reader.
The idea is that you can use this instead of a concrete reader.
I'm most interested in the most complex scenario, I have access to namespace prefixes and the like. In the rest of the story I will pretend there's only a single reader just for convenience (and it might a place to start anyway).
The ns reader trait should implement prefixes
, and the various resolve_
methods.
Writer trait
Similarly, there's a Writer trait. This can be much simpler - it takes write_event. I think it also needs something to set up initialize known namespace prefixes (more about why later), and perhaps it's handy to be able to ask whether writing has already begun so we know whether to initialize or not.
Interesting implementations
(Ns)Reader
(Ns)Reader implements the Reader trait.
Writer
The Writer implements the Writer. This is so that our pipeline can end if we want XML output.
It may have a special feature to take the prefixes it gets and declares them on the outer element it writes if it doesn't already have such declarations. (or this may be in a little middleware).
Pipeline step
A pipeline step takes an Reader, a Writer (trait implementations) and pulls in stuff from the reader and writes to the writer. Very similar to what you'd do now from a concrete reader to a writer.
Buffer
A pipeline step may take a single event and ignore it, or split it into multiple events. We don't want complicated state management inside of pipeline steps; they should just deal with readers and writer. So we need something that implements both Reader and Writer. This buffers events in a deque.
When a pipeline pulls from a buffer, the buffered events will be returned first, until the buffer is empty. Then the buffer invokes its reader to put more events into its buffer.
It also manages namespaces separately using NamespaceResolver
. Because pipeline steps could do interesting things to namespaces (this is in fact one of my use cases in canonicalization and prefix rewriting). A buffer can also be initialized with prefixes when writing starts, because a buffer may apply to a subset of the whole document.
Splitter
This splits a single stream into events into multiple streams of events, based on some criterion on BytesStart
(and namespace info). All events until the end tag will be streamed to a specific pipeline. This way you can efficiently select one or more parts of the document for further processing.
This takes a hashmap of pipeline names to Writer implementations (a hashmap as how many pipelines should exist can in many cases only be determined at runtime) and a function that given a BytesStart
and namespace information can determine which pipeline name it belongs to (or should not be piped through at all). By taking Writer implementation we can put in a Buffer with a pipeline step under it.
Namespace resolver
Right now NsReader
already is a bit of a pipeline step on top of Reader
. If we had a "namespace resolving" pipeline step we could generalize that. You could start a pipeline without namepaces, and add it as needed. I'm not entirely sure this is worth it, as I still think you need a reader trait that supports prefixes
and resolve_
as you'd want to write your pipeline steps against those.
Related topics
This relates to #611 and #881 as those would enable pipeline support for Serde (de)serialization.
Next steps
We need to answer a bunch of questions:
-
Do want to support these use cases with
quick_xml
at all? -
Would it make sense to have this implemented in
quick_xml
or by another crate? -
If by another crate, can we make
quick_xml
open up its APIs sufficiently to support this? The big blocker isNamespaceResolver
, as without it, it becomes really difficult to implementBuffer
.