Skip to content

KEP-5295: KYAML #5296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

KEP-5295: KYAML #5296

wants to merge 1 commit into from

Conversation

thockin
Copy link
Member

@thockin thockin commented May 10, 2025

This KEP proposes to add a new output format for kubectl, tentatively called “KYAML” (as in kubectl get -o kyaml ...). This format is a strict subset (aka “dialect”) of standard YAML, and so should be parseable by the existing ecosystem. This dialect seeks to emphasize syntactical choices which avoid many of the most common traps in YAML. For example, unlike standard YAML output, this dialect is not whitespace-sensitive, which makes it vastly easier to patch correctly in things like Helm charts.

This KEP further proposes to make KYAML the standard format for all project-owned documentation and examples.

  • One-line PR description: First draft

  • Issue link: KYAML #5295

  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: thockin
Once this PR has been reviewed and has the lgtm label, please assign soltysh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels May 10, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG CLI May 10, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 10, 2025
@thockin
Copy link
Member Author

thockin commented May 10, 2025

/cc @BenTheElder @sftim @liggitt @soltysh

@k8s-ci-robot
Copy link
Contributor

@thockin: GitHub didn't allow me to request PR reviews from the following users: sftim.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @BenTheElder @sftim @liggitt @soltysh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@thockin thockin mentioned this pull request May 10, 2025
4 tasks
@thockin
Copy link
Member Author

thockin commented May 11, 2025

typos fixed and small changes to clarify strings


The simplest thing to do is to write a new implementation of k8s.io/cli-runtime/pkg/printers.ResourcePrinter (in staging/src/k8s.io/cli-runtime/pkg/printers). However, this isn’t the most obviously reusable place for it.

A more ecosystem-friendly solution is to write a new function or package under sigs.k8s.io/yaml](sigs.k8s.io/yaml) which can be used by other tools.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer to provide the encoder in a module with no dependencies on anything outside of the standard library. We discussed using KYAML as pretty-printer for logging and test output. A dependency on the encoder will be easier to justify if it does not pull in other dependencies.

Alternatively, is there a way to avoid the github.com/google/go-cmp and gopkg.in/check.v1 dependencies, for example by moving tests into a separate module? Presumably they are only used for testing, but a consumer of the module cannot be sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, is there a way to avoid the github.com/google/go-cmp and gopkg.in/check.v1 dependencies, for example by moving tests into a separate module? Presumably they are only used for testing, but a consumer of the module cannot be sure.

We already have these in sigs.k8s.io/yaml by way of the go-yaml forks. They are test-only so they are part of your module resolution graph but not part of your actual dependency set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the decisions around yamlfmt, it's easier or harder to modularize. If I have to write all the logic to encode/decode/encode, I am loathe to do it twice.

That said, maybe I am wrong? The pure encoder for kyaml is only about 600 LOC and some of that could probably go away if we don't need to staisfy the PrintObj interface.

Copy link
Contributor

@pohly pohly May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have these in sigs.k8s.io/yaml by way of the go-yaml forks.

I'm thinking of code which does not currently import sigs.k8s.io/yaml.

They are test-only so they are part of your module resolution graph but not part of your actual dependency set.

I know that, but dependency analysis tools don't 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of code which does not currently import sigs.k8s.io/yaml.

If that code wants to import Kubernetes style YAML formatting, I think it's reasonable to have the same requirements as sigs.k8s.io/yaml

I know that, but dependency analysis tools don't 😉

Eh? They only don't for resolving the graph, it shouldn't show up in your own go.mod / vendor.

Do we have a concrete blocker here, other than "it would be nice"? I agree that having none is nice, but it may be impractical, and I don't think the KEP is the right place to determine the dependencies of the implementing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our own dependency tracking in Kubernetes is graph-based. It doesn't matter what a dependency is used for (test or non-test code), if an unwanted dependency shows up, it gets flagged as a problem. I'm sure other projects do the same.

This is not entirely theoretic. I would like to move k8s.io/kubernetes/test/utils/ktesting into a staging repo. We talked about that a while ago, I just had to put it on hold. Currently, that ktesting package depends on sigs.k8s.io/yaml as a pretty-printer. I doubt that sigs.k8s.io/yaml and thus github.com/google/go-cmp + gopkg.in/check.v1 would be accepted as new dependency for some of our staging repos. I could switch to KYAML and if the KYAML encoder was in a module with no other dependencies, then it would be easier to argue for allowing it as new dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My other use case is as an on-demand pretty-printer in klog (klog.Dump(obj any)), with a klog.Diff(old, new any) as replacement for diff := cmp.Diff(old, new); klog.Info(...., "diff", diff).

  • Would we be okay to have a simple sigs.k8s.io/kyaml with the output code as dependency of klog? Maybe.
  • Would sigs.k8s.io/yaml as it stands today be okay? I doubt it.

/cc @dims

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some value in having a encoder which is really stand-alone, and I don't think I would object to having 2 implementations as long as we have solid tests?


Pro: Easy and well-defined
Pro: Can be pasted into most C-like languages to get the original string.
Con: Hard to read and write by humans when it gets long.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is not an option, in particular with an eye towards using KYAML for logging and test output. The original klog structured text log output used this and we promptly got bug reports and reverts because it made the log output less readable. I then added multi-line output.

The problem is that none of the options below seem particularly appealing either 😞

Copy link
Member

@BenTheElder BenTheElder May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 2) is tolerable. All multi-line strings are a bit awkward in yaml, the "block" style strings have different problems we didn't discuss here because they can't be used with "flow" style.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that option 2 is okayish. That this: and is: in example 2 are shifted by one character (the \ before is:) is a pity.

BTW, you seem to use the opposite definition of "flow" and "block" than the KEP? Option 1 is said to be "not available in YAML flow-style" with the string using block style, so options 2 onward must be flow-style strings. Or am I mixing things up? I hadn't seen this terminology before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow-style is with {} and []

block-style is with whitespace

https://yaml.org/spec/1.2.2/

Section 2.1 says:

"""
YAML also has flow styles, using explicit indicators rather than indentation to denote scope. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curly braces.
"""

Section 3.2.3.1 says:

"""
There are two groups of styles. Block styles use indentation to denote structure. In contrast, flow styles rely on explicit indicators.
"""

If we got that wrong anywere, please flag it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KEP has it right, but Ben seems to use it the other way around in his comments - or I am misreading them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I transposed flow and block in the comment above, I will edit it]

I agree that option 2 is okayish. That this: and is: in example 2 are shifted by one character (the \ before is:) is a pity.

Yeah, though if you're depending on significant leading whitespace inside of a yaml multi-line string there's simply no non-error prone format for that particular aspect.

Some of them require less characters, but they're not any less confusing or error prone.

In general if you want to guarantee leading whitespace you need to escape it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That this: and is: in example 2 are shifted by one character (the \ before is:) is a pity.

Variation B of option 2 addresses this nicely - it's my favorite solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@github-project-automation github-project-automation bot moved this from Needs Triage to In Progress in SIG CLI May 12, 2025
@thockin thockin force-pushed the kyaml branch 2 times, most recently from fc2ba3c to 0b92cc3 Compare May 12, 2025 19:41
@k8s-ci-robot k8s-ci-robot requested a review from dims May 13, 2025 06:01
KYAML. That extra conversion is ugly, but this is not generally
performance-sensitive code and is not run on the server.

[https://github.com/goccy/go-yaml](https://github.com/goccy/go-yaml) offers an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the impact of this dependency? is this going to impact k8s.io/{api, apimachinery or client-go}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a significant decision

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least currently it has no non-stdlib dependencies, but it is another for us.
The go yaml landscape is unfortunately a bit uncertain at the moment.

goccy has other popular encode/decode libraries, this yaml library seems like a nice solution to our problems ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we predicate this KEP on adopting goccy I am afraid it will take longer than a single release :)

The amount of fear and defense-in-depth we have established around go-yaml is significant. @liggitt

Structs are rendered with curly-braces, and each field on a line of its own.
Field names are not quoted unless they are not obviously strings (which
includes prefixed label keys) or are one of the known type-ambiguous words
(e.g. `no`). Field values are rendered as per their type. Embedded structs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(e.g. `no`). Field values are rendered as per their type. Embedded structs
(e.g. `no`). Field values are rendered as per their type in the order in which they appear there. Embedded structs

This is a nice improvement over our current YAML encoder: because of the intermediate conversion to a map, the original field order gets lost. Not so with KYAML 😄

Copy link
Member Author

@thockin thockin May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that makes it sound like kubectl would handle it properly, but it doesn't. It is converted to Unstructured before we ever see it (it might be a CRD) and the order of fields is not retained.

Strictly speaking, the encoder I have written will retain most of the order (types that implement json.Unmarshall may be routed through a map) but kubectl does not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware that it won't work for kubectl. I was thinking of the klog.Dump(<some API type>) scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be confusing to have a different order for the same data depending on the execution context ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what we have today - if you just json.Marshal a kube object it will retain field ordering, but if you run it through kubectl it will not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the most reliable way to make sure we perfectly match json typing / omitempty / omitzero / custom marshaler semantics is to use a pipeline like stdlib json.Marshal(obj) --> yaml order-preserving AST unmarshal → kyaml emitter

@thockin
Copy link
Member Author

thockin commented May 13, 2025

For multi-line strings I added a variation of option 2 with slightly different indenting.

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't dig into the multi-line strings section, swept the rest

Comment on lines 360 to 371
itself does not guarantee this (it converts to a map internally, which loses
the type schema), so the order may be different.

Fields which have a `json` struct tag will be rendered according to that tag:
* the tag’s field name will be used
* if the tag’s name is “-”, the field will be skipped
* if the tag specifies “omitempty” it will be respected
* if the tag specifies “omitzero” it will be respected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering (of order-independent bits) and formatting can differ, but the content and structure of the output must be identical to -o json and -o yaml, right?

Rendering as json using the standard json encoder and then transforming into kyaml seems like the most bullet-proof way to accomplish this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl turns everything into unstructured long before the final Printer sees the data. Unstructured implements json.Marshaler, which my POC supports directly, so everything gets bounced thru JSON.

I suppose a different approach would be to just bounce EVERYTHING thru JSON, which means I don't have to implement the tag logic (already done, but still). It means we ALWAYS pessimize the field ordering, no matter where you apply this code. And it would always be as slow as possible.

I could go either way. It's EFFECTIVELY what is happening already in kubectl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the generic kubectl commands all use unstructured, but some of the commands do use typed objects as output. The more we can do to auto-inherit JSON serialization semantics and make this just do yaml reflowing, the better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl turns everything into unstructured long before the final Printer sees the data.

Random thought (unrelated to this KEP): what if kubectl used something else than unstructured which implements the same interfaces (for example, meta data access) but preserves field ordering? I'm tempted to try it out...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet you could make unstructured use a list instead of a map

Copy link
Contributor

@pohly pohly May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but it would be a major breaking change of our Go API. That it's a map is exposed: https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured#Unstructured

I suppose someone noticed that this might not be a good idea:

TODO: make the serialization part of this type distinct from the field accessors.

@lmktfy
Copy link

lmktfy commented May 13, 2025

This is likely to end up as a change we accept and code up and release.

Then, people will use it. Then they will want a reference for this KYAML thing in their own code.
Docs are good, we should make sure, but people do refer directly to KEPs.

I have, and some readers might have guessed what I'm about to say, switched from "this almost needs a KEP, but it's fine" …to a new opinion.

I want us to do what we can to make ambiguity unlikely. I also want us to make snags (such as someone implementing "almost-KYAML") easier to resolve.


This should be two related KEPs. One to define what KYAML is, and another that covers changing kubectl to emit KYAML and the docs to prefer it. I don't think we need KYAML on the wire, and of course our YAML parser already accepts YAML.

Having a KEP just about what KYAML is gives us a place to point people in the further reading section of docs. It lets us fix specification bugs without getting in the way of work to adopt that specification. I recommend two KEPs for this.

@liggitt
Copy link
Member

liggitt commented May 13, 2025

I think treating the format spec and the details of how kubectl will output it as two facets that are distinct is a good idea. I think two entirely separate KEPs is process overkill and will confuse / dilute / fragment discussion rather than clarify.

@thockin
Copy link
Member Author

thockin commented May 14, 2025

Middle ground? I could (once we resolve comments and UNRESOLVEDs) commit the spec as a distinct file here or even in community somewhere?

@BenTheElder
Copy link
Member

Middle ground? I could (once we resolve comments and UNRESOLVEDs) commit the spec as a distinct file here or even in community somewhere?

Yeah, I think when we're moving forward we absolutely want to document KYAML-the-format in a way that is distinct from the internals, but I also don't think we should do two KEPs.

We should also probably blog about it when it's ready in kubectl, and point to the other reference materials etc then.

@thockin
Copy link
Member Author

thockin commented May 15, 2025

New push is up which detail JSON conversion and comments. Removed a couple UNRESOLVED and added 1 new.

This KEP proposes to add a new output format for kubectl, tentatively
called “KYAML” (as in `kubectl get -o kyaml ...`). This format is a
strict subset (aka “dialect”) of standard YAML, and so should be
parseable by the existing ecosystem. This dialect seeks to emphasize
syntactical choices which avoid many of the most common traps in YAML.
For example, unlike standard YAML output, this dialect is not
whitespace-sensitive, which makes it vastly easier to patch correctly in
things like Helm charts.

This KEP further proposes to make KYAML the standard format for all
project-owned documentation and examples.
Copy link

@lmktfy lmktfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite all this feedback, we should definitely focus on finding a minimal set of changes to let this merge as provisional.

status: provisional #|implementable|implemented|deferred|rejected|withdrawn|replaced
creation-date: 2025-05-09
reviewers:
- "@sftim"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now @lmktfy

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
beta: "v1.34"
stable: "v1.35"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted multiple KEPs because I think we can have a beta spec but part of the support (eg in kubectl) might go in as alpha.


More serialization “languages” dilutes knowledge.

## Alternatives
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could support KYAML on the wire (including generating it). I really don't think we should, but we could.


May 09, 2025: KEP draft v0

## Drawbacks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more standard; countered by people being able to use vanilla YAML everywhere that we accept KYAML (is this true?)


### Upgrade / Downgrade Strategy

This is a client-side feature. Users should not use this in automation until
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering explaining why we don't need a phased roll out (ie, that kubectl already accepts KYAML).


To make this change, we will:
1. Create new code (see details below) which serializes any object into KYAML.
2. Introduce a kubectl output format, `-o=kyaml`, which uses this new package.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal opinion: we should start off with a kubectl release that adds this option behind an environment-variable safety catch.

I don't feel comfortable we'll get everything right first time. An initial off-by-default mechanism lets us tweak things more than instantly offering support to everyone.

However, a kubectl plugin for reformatting can go instantly beta.


YAML is a very large and complex specification.
We expect that we will find places where KYAML may be less readable than conventional YAML.
We should revisit this specification when that happens.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How easy (or not) is it to revise KYAML? I don't think there are big barriers, but let's comment on that.

No affordance is made to ensure that keys of a map or struct are all quoted or
not-quoted similarly. Some keys can be quoted while others are not.

### Examples:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: omit the colon

startTime: "2025-05-09T21:14:39Z",
},
}],
kind: "List",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this could appear at the start rather than the end, it'll make the whole thing easier to read.

These would require choosing between them, taking new dependencies, and
learning new grammars. All of them are very similar to KYAML, which has none
of those drawbacks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I wouldn't do for beta, but might be nice to bear in mind: we could add support for kubectl (or a plugin) to add field-specific comments.

For example:

  spec: {
    clusterIP: "10.0.162.160", # IP address in primary address family
    clusterIPs: [
      "10.0.162.160", # list of IP addresses
    ],
    internalTrafficPolicy: "Cluster", # see documentation for internal traffic policy
    ipFamilies: [
      "IPv4, # alternative: "IPv6"
    ],

Assume that these are based on the same source data that backs kubectl explain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

7 participants