Skip to content

pivot on a single file a-la fold, except into columns. #320

@xxxserxxx

Description

@xxxserxxx

I'd like to be able to merge rows sharing a (possibly complex) key within a single (optionally multiple) file.

Example data:

1,a
1,b
2,x
2,y

This is a common, normalized representation of data. I'd like to be able to run:

$ echo "1,a,A\n1,b,B\n2,x,X\n2,y,Y\n" | csvtk fold -H -f 1 -C -v 2

where -C stands in for the new argument that creates new columns for each row:

1,a,b,A,B
2,x,y,X,Y

or

1,a,A,b,B
2,x,X,y,Y

Whether row columns are appended sequentially, or interleved, is unimportant as transforming column orders after the operation is trivial.

It would be acceptable if, when csvtk found that the result would produce rows of different lengths -- for example, an ID has more or less than the usual number of rows, that it would generate an error.

You can almost get there with fold and sep, except there's no way to include additional columns in fold, so it only works on two columns:

$ echo "1,a,A\n1,b,B\n2,x,X\n2,y,Y\n" | csvtk fold -H -f 1 -s ';' -v 2 | csvtk sep -H -s ';' -f 2 -R
1,a,b
2,x,y

The data in the additional rows is lost.

One IRL use case for this is with ledger csv, which produces output like this:

transaction uuid,date,payee,amount
aaa123,2024-03-01,Bill's Garage,50
aaa123,2024-03-01,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,200
456abc,2024-03-05,Credit card,200

This is a double-entry system, so there's always a credit line and a corresponding debit line. What I want is:

transaction uuid,date,payee,source,amount
aaa123,2024-03-01,Bill's Garage,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,Credit card,200

However, an MVP would produce:

transaction uuid,date,payee 1,payee 2,amount 1,amount 2
aaa123,2024-03-01,Bill's Garage,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,Credit card,200

I tried to not present an X:Y problem, but I think a solution would be allowing fold and sep to operate on multiple -v (and -f, respectively) columns,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions