-
-
Notifications
You must be signed in to change notification settings - Fork 95
Description
I'd like to be able to merge rows sharing a (possibly complex) key within a single (optionally multiple) file.
Example data:
1,a
1,b
2,x
2,y
This is a common, normalized representation of data. I'd like to be able to run:
$ echo "1,a,A\n1,b,B\n2,x,X\n2,y,Y\n" | csvtk fold -H -f 1 -C -v 2
where -C
stands in for the new argument that creates new columns for each row:
1,a,b,A,B
2,x,y,X,Y
or
1,a,A,b,B
2,x,X,y,Y
Whether row columns are appended sequentially, or interleved, is unimportant as transforming column orders after the operation is trivial.
It would be acceptable if, when csvtk found that the result would produce rows of different lengths -- for example, an ID has more or less than the usual number of rows, that it would generate an error.
You can almost get there with fold
and sep
, except there's no way to include additional columns in fold
, so it only works on two columns:
$ echo "1,a,A\n1,b,B\n2,x,X\n2,y,Y\n" | csvtk fold -H -f 1 -s ';' -v 2 | csvtk sep -H -s ';' -f 2 -R
1,a,b
2,x,y
The data in the additional rows is lost.
One IRL use case for this is with ledger csv
, which produces output like this:
transaction uuid,date,payee,amount
aaa123,2024-03-01,Bill's Garage,50
aaa123,2024-03-01,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,200
456abc,2024-03-05,Credit card,200
This is a double-entry system, so there's always a credit line and a corresponding debit line. What I want is:
transaction uuid,date,payee,source,amount
aaa123,2024-03-01,Bill's Garage,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,Credit card,200
However, an MVP would produce:
transaction uuid,date,payee 1,payee 2,amount 1,amount 2
aaa123,2024-03-01,Bill's Garage,Bank account,50
456abc,2024-03-05,Solar Power Monopoly,Credit card,200
I tried to not present an X:Y problem, but I think a solution would be allowing fold
and sep
to operate on multiple -v
(and -f
, respectively) columns,