-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-5295: KYAML #5296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
KEP-5295: KYAML #5296
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: thockin The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@thockin: GitHub didn't allow me to request PR reviews from the following users: sftim. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
typos fixed and small changes to clarify strings |
keps/sig-cli/5295-kyaml/README.md
Outdated
|
||
The simplest thing to do is to write a new implementation of k8s.io/cli-runtime/pkg/printers.ResourcePrinter (in staging/src/k8s.io/cli-runtime/pkg/printers). However, this isn’t the most obviously reusable place for it. | ||
|
||
A more ecosystem-friendly solution is to write a new function or package under sigs.k8s.io/yaml](sigs.k8s.io/yaml) which can be used by other tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nicer to provide the encoder in a module with no dependencies on anything outside of the standard library. We discussed using KYAML as pretty-printer for logging and test output. A dependency on the encoder will be easier to justify if it does not pull in other dependencies.
Alternatively, is there a way to avoid the github.com/google/go-cmp and gopkg.in/check.v1 dependencies, for example by moving tests into a separate module? Presumably they are only used for testing, but a consumer of the module cannot be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, is there a way to avoid the github.com/google/go-cmp and gopkg.in/check.v1 dependencies, for example by moving tests into a separate module? Presumably they are only used for testing, but a consumer of the module cannot be sure.
We already have these in sigs.k8s.io/yaml by way of the go-yaml forks. They are test-only so they are part of your module resolution graph but not part of your actual dependency set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the decisions around yamlfmt, it's easier or harder to modularize. If I have to write all the logic to encode/decode/encode, I am loathe to do it twice.
That said, maybe I am wrong? The pure encoder for kyaml is only about 600 LOC and some of that could probably go away if we don't need to staisfy the PrintObj interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have these in sigs.k8s.io/yaml by way of the go-yaml forks.
I'm thinking of code which does not currently import sigs.k8s.io/yaml
.
They are test-only so they are part of your module resolution graph but not part of your actual dependency set.
I know that, but dependency analysis tools don't 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of code which does not currently import sigs.k8s.io/yaml.
If that code wants to import Kubernetes style YAML formatting, I think it's reasonable to have the same requirements as sigs.k8s.io/yaml
I know that, but dependency analysis tools don't 😉
Eh? They only don't for resolving the graph, it shouldn't show up in your own go.mod / vendor.
Do we have a concrete blocker here, other than "it would be nice"? I agree that having none is nice, but it may be impractical, and I don't think the KEP is the right place to determine the dependencies of the implementing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our own dependency tracking in Kubernetes is graph-based. It doesn't matter what a dependency is used for (test or non-test code), if an unwanted dependency shows up, it gets flagged as a problem. I'm sure other projects do the same.
This is not entirely theoretic. I would like to move k8s.io/kubernetes/test/utils/ktesting
into a staging repo. We talked about that a while ago, I just had to put it on hold. Currently, that ktesting package depends on sigs.k8s.io/yaml as a pretty-printer. I doubt that sigs.k8s.io/yaml and thus github.com/google/go-cmp + gopkg.in/check.v1 would be accepted as new dependency for some of our staging repos. I could switch to KYAML and if the KYAML encoder was in a module with no other dependencies, then it would be easier to argue for allowing it as new dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My other use case is as an on-demand pretty-printer in klog (klog.Dump(obj any)
), with a klog.Diff(old, new any)
as replacement for diff := cmp.Diff(old, new); klog.Info(...., "diff", diff)
.
- Would we be okay to have a simple
sigs.k8s.io/kyaml
with the output code as dependency of klog? Maybe. - Would
sigs.k8s.io/yaml
as it stands today be okay? I doubt it.
/cc @dims
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some value in having a encoder which is really stand-alone, and I don't think I would object to having 2 implementations as long as we have solid tests?
|
||
Pro: Easy and well-defined | ||
Pro: Can be pasted into most C-like languages to get the original string. | ||
Con: Hard to read and write by humans when it gets long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is not an option, in particular with an eye towards using KYAML for logging and test output. The original klog structured text log output used this and we promptly got bug reports and reverts because it made the log output less readable. I then added multi-line output.
The problem is that none of the options below seem particularly appealing either 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 2) is tolerable. All multi-line strings are a bit awkward in yaml, the "block" style strings have different problems we didn't discuss here because they can't be used with "flow" style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that option 2 is okayish. That this:
and is:
in example 2 are shifted by one character (the \
before is:
) is a pity.
BTW, you seem to use the opposite definition of "flow" and "block" than the KEP? Option 1 is said to be "not available in YAML flow-style" with the string using block style, so options 2 onward must be flow-style strings. Or am I mixing things up? I hadn't seen this terminology before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flow-style is with {}
and []
block-style is with whitespace
Section 2.1 says:
"""
YAML also has flow styles, using explicit indicators rather than indentation to denote scope. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curly braces.
"""
Section 3.2.3.1 says:
"""
There are two groups of styles. Block styles use indentation to denote structure. In contrast, flow styles rely on explicit indicators.
"""
If we got that wrong anywere, please flag it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KEP has it right, but Ben seems to use it the other way around in his comments - or I am misreading them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[I transposed flow and block in the comment above, I will edit it]
I agree that option 2 is okayish. That this: and is: in example 2 are shifted by one character (the \ before is:) is a pity.
Yeah, though if you're depending on significant leading whitespace inside of a yaml multi-line string there's simply no non-error prone format for that particular aspect.
Some of them require less characters, but they're not any less confusing or error prone.
In general if you want to guarantee leading whitespace you need to escape it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That this: and is: in example 2 are shifted by one character (the \ before is:) is a pity.
Variation B of option 2 addresses this nicely - it's my favorite solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
fc2ba3c
to
0b92cc3
Compare
keps/sig-cli/5295-kyaml/README.md
Outdated
KYAML. That extra conversion is ugly, but this is not generally | ||
performance-sensitive code and is not run on the server. | ||
|
||
[https://github.com/goccy/go-yaml](https://github.com/goccy/go-yaml) offers an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the impact of this dependency? is this going to impact k8s.io/{api, apimachinery or client-go}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a significant decision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least currently it has no non-stdlib dependencies, but it is another for us.
The go yaml landscape is unfortunately a bit uncertain at the moment.
goccy has other popular encode/decode libraries, this yaml library seems like a nice solution to our problems ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we predicate this KEP on adopting goccy I am afraid it will take longer than a single release :)
The amount of fear and defense-in-depth we have established around go-yaml is significant. @liggitt
keps/sig-cli/5295-kyaml/README.md
Outdated
Structs are rendered with curly-braces, and each field on a line of its own. | ||
Field names are not quoted unless they are not obviously strings (which | ||
includes prefixed label keys) or are one of the known type-ambiguous words | ||
(e.g. `no`). Field values are rendered as per their type. Embedded structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(e.g. `no`). Field values are rendered as per their type. Embedded structs | |
(e.g. `no`). Field values are rendered as per their type in the order in which they appear there. Embedded structs |
This is a nice improvement over our current YAML encoder: because of the intermediate conversion to a map, the original field order gets lost. Not so with KYAML 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that makes it sound like kubectl
would handle it properly, but it doesn't. It is converted to Unstructured before we ever see it (it might be a CRD) and the order of fields is not retained.
Strictly speaking, the encoder I have written will retain most of the order (types that implement json.Unmarshall may be routed through a map) but kubectl does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware that it won't work for kubectl. I was thinking of the klog.Dump(<some API type>)
scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be confusing to have a different order for the same data depending on the execution context ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what we have today - if you just json.Marshal
a kube object it will retain field ordering, but if you run it through kubectl it will not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the most reliable way to make sure we perfectly match json typing / omitempty / omitzero / custom marshaler semantics is to use a pipeline like stdlib json.Marshal(obj) --> yaml order-preserving AST unmarshal → kyaml emitter
For multi-line strings I added a variation of option 2 with slightly different indenting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't dig into the multi-line strings section, swept the rest
keps/sig-cli/5295-kyaml/README.md
Outdated
itself does not guarantee this (it converts to a map internally, which loses | ||
the type schema), so the order may be different. | ||
|
||
Fields which have a `json` struct tag will be rendered according to that tag: | ||
* the tag’s field name will be used | ||
* if the tag’s name is “-”, the field will be skipped | ||
* if the tag specifies “omitempty” it will be respected | ||
* if the tag specifies “omitzero” it will be respected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering (of order-independent bits) and formatting can differ, but the content and structure of the output must be identical to -o json
and -o yaml
, right?
Rendering as json using the standard json encoder and then transforming into kyaml seems like the most bullet-proof way to accomplish this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl turns everything into unstructured
long before the final Printer
sees the data. Unstructured implements json.Marshaler
, which my POC supports directly, so everything gets bounced thru JSON.
I suppose a different approach would be to just bounce EVERYTHING thru JSON, which means I don't have to implement the tag logic (already done, but still). It means we ALWAYS pessimize the field ordering, no matter where you apply this code. And it would always be as slow as possible.
I could go either way. It's EFFECTIVELY what is happening already in kubectl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the generic kubectl commands all use unstructured, but some of the commands do use typed objects as output. The more we can do to auto-inherit JSON serialization semantics and make this just do yaml reflowing, the better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl turns everything into
unstructured
long before the final Printer sees the data.
Random thought (unrelated to this KEP): what if kubectl used something else than unstructured
which implements the same interfaces (for example, meta data access) but preserves field ordering? I'm tempted to try it out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet you could make unstructured use a list instead of a map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but it would be a major breaking change of our Go API. That it's a map is exposed: https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#Unstructured
I suppose someone noticed that this might not be a good idea:
TODO: make the serialization part of this type distinct from the field accessors.
This is likely to end up as a change we accept and code up and release. Then, people will use it. Then they will want a reference for this KYAML thing in their own code. I have, and some readers might have guessed what I'm about to say, switched from "this almost needs a KEP, but it's fine" …to a new opinion. I want us to do what we can to make ambiguity unlikely. I also want us to make snags (such as someone implementing "almost-KYAML") easier to resolve. This should be two related KEPs. One to define what KYAML is, and another that covers changing Having a KEP just about what KYAML is gives us a place to point people in the further reading section of docs. It lets us fix specification bugs without getting in the way of work to adopt that specification. I recommend two KEPs for this. |
I think treating the format spec and the details of how kubectl will output it as two facets that are distinct is a good idea. I think two entirely separate KEPs is process overkill and will confuse / dilute / fragment discussion rather than clarify. |
Middle ground? I could (once we resolve comments and UNRESOLVEDs) commit the spec as a distinct file here or even in community somewhere? |
Yeah, I think when we're moving forward we absolutely want to document KYAML-the-format in a way that is distinct from the internals, but I also don't think we should do two KEPs. We should also probably blog about it when it's ready in kubectl, and point to the other reference materials etc then. |
New push is up which detail JSON conversion and comments. Removed a couple UNRESOLVED and added 1 new. |
This KEP proposes to add a new output format for kubectl, tentatively called “KYAML” (as in `kubectl get -o kyaml ...`). This format is a strict subset (aka “dialect”) of standard YAML, and so should be parseable by the existing ecosystem. This dialect seeks to emphasize syntactical choices which avoid many of the most common traps in YAML. For example, unlike standard YAML output, this dialect is not whitespace-sensitive, which makes it vastly easier to patch correctly in things like Helm charts. This KEP further proposes to make KYAML the standard format for all project-owned documentation and examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite all this feedback, we should definitely focus on finding a minimal set of changes to let this merge as provisional.
status: provisional #|implementable|implemented|deferred|rejected|withdrawn|replaced | ||
creation-date: 2025-05-09 | ||
reviewers: | ||
- "@sftim" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now @lmktfy
# The milestone at which this feature was, or is targeted to be, at each stage. | ||
milestone: | ||
beta: "v1.34" | ||
stable: "v1.35" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted multiple KEPs because I think we can have a beta spec but part of the support (eg in kubectl) might go in as alpha.
|
||
More serialization “languages” dilutes knowledge. | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could support KYAML on the wire (including generating it). I really don't think we should, but we could.
|
||
May 09, 2025: KEP draft v0 | ||
|
||
## Drawbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more standard; countered by people being able to use vanilla YAML everywhere that we accept KYAML (is this true?)
|
||
### Upgrade / Downgrade Strategy | ||
|
||
This is a client-side feature. Users should not use this in automation until |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering explaining why we don't need a phased roll out (ie, that kubectl already accepts KYAML).
|
||
To make this change, we will: | ||
1. Create new code (see details below) which serializes any object into KYAML. | ||
2. Introduce a kubectl output format, `-o=kyaml`, which uses this new package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal opinion: we should start off with a kubectl release that adds this option behind an environment-variable safety catch.
I don't feel comfortable we'll get everything right first time. An initial off-by-default mechanism lets us tweak things more than instantly offering support to everyone.
However, a kubectl plugin for reformatting can go instantly beta.
|
||
YAML is a very large and complex specification. | ||
We expect that we will find places where KYAML may be less readable than conventional YAML. | ||
We should revisit this specification when that happens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How easy (or not) is it to revise KYAML? I don't think there are big barriers, but let's comment on that.
No affordance is made to ensure that keys of a map or struct are all quoted or | ||
not-quoted similarly. Some keys can be quoted while others are not. | ||
|
||
### Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: omit the colon
startTime: "2025-05-09T21:14:39Z", | ||
}, | ||
}], | ||
kind: "List", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this could appear at the start rather than the end, it'll make the whole thing easier to read.
These would require choosing between them, taking new dependencies, and | ||
learning new grammars. All of them are very similar to KYAML, which has none | ||
of those drawbacks. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I wouldn't do for beta, but might be nice to bear in mind: we could add support for kubectl (or a plugin) to add field-specific comments.
For example:
spec: {
clusterIP: "10.0.162.160", # IP address in primary address family
clusterIPs: [
"10.0.162.160", # list of IP addresses
],
internalTrafficPolicy: "Cluster", # see documentation for internal traffic policy
ipFamilies: [
"IPv4, # alternative: "IPv6"
],
Assume that these are based on the same source data that backs kubectl explain
.
This KEP proposes to add a new output format for kubectl, tentatively called “KYAML” (as in
kubectl get -o kyaml ...
). This format is a strict subset (aka “dialect”) of standard YAML, and so should be parseable by the existing ecosystem. This dialect seeks to emphasize syntactical choices which avoid many of the most common traps in YAML. For example, unlike standard YAML output, this dialect is not whitespace-sensitive, which makes it vastly easier to patch correctly in things like Helm charts.This KEP further proposes to make KYAML the standard format for all project-owned documentation and examples.
One-line PR description: First draft
Issue link: KYAML #5295
Other comments: