Skip to content

Commit 9da19d0

Browse files
Merge pull request #1274 from Kotlin/column_selectors_types
columns selector type
2 parents 58012fc + 7977ff5 commit 9da19d0

File tree

1 file changed

+71
-46
lines changed

1 file changed

+71
-46
lines changed

docs/StardustDocs/topics/ColumnSelectors.md

Lines changed: 71 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -45,33 +45,34 @@ df.move { name.firstName and name.lastName }.after { city }
4545
`first {}`, `firstCol()`, `last {}`, `lastCol()`, `single {}`, `singleCol()`
4646

4747
Returns the first, last, or single column from the top-level, specified [column group](DataColumn.md#columngroup),
48-
or `ColumnSet` that adheres to the optional given condition. If no column adheres to the given condition,
48+
or [`ColumnSet`](#column-resolvers) that adheres to the optional given condition. If no column adheres to the given condition,
4949
`NoSuchElementException` is thrown.
5050

5151
##### Col {collapsible="true"}
5252
`col(name)`, `col(5)`
5353

54-
Creates a [ColumnAccessor](DataColumn.md) (or `SingleColumn`) for a column with the given
54+
Creates a [`ColumnAccessor`](#column-resolvers) (or [`SingleColumn`](#column-resolvers)) for a column with the given
5555
argument from the top-level or specified [column group](DataColumn.md#columngroup). The argument can be either an
56-
index (`Int`) or a reference to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`;
56+
index (`Int`) or a reference to a column (`String`, [`ColumnPath`](#column-resolvers), or
57+
[`ColumnAccessor`](#column-resolvers);
5758
any [AccessApi](apiLevels.md)).
5859

5960
##### Value Col, Frame Col, Col Group {collapsible="true"}
6061
`valueCol(name)`, `valueCol(5)`, `frameCol(name)`, `frameCol(5)`, `colGroup(name)`, `colGroup(5)`
6162

62-
Creates a [ColumnAccessor](DataColumn.md) (or `SingleColumn`) for a
63+
Creates a [`ColumnAccessor`](DataColumn.md) (or `SingleColumn`) for a
6364
[value column](DataColumn.md#valuecolumn) / [frame column](DataColumn.md#framecolumn) /
6465
[column group](DataColumn.md#columngroup) with the given argument from the top-level or
6566
specified [column group](DataColumn.md#columngroup). The argument can be either an index (`Int`) or a reference
66-
to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`; any [AccessApi](apiLevels.md)).
67-
The functions can be both typed and untyped (in case you're supplying a column name, -path, or index).
67+
to a column (`String`, [`ColumnPath`](#column-resolvers), or [`ColumnAccessor`](#column-resolvers); any [AccessApi](apiLevels.md)).
68+
The functions can be both typed and untyped (in case you're supplying a column name, path, or index).
6869
These functions throw an `IllegalArgumentException` if the column found is not the right kind.
6970

7071
##### Cols {collapsible="true"}
7172
`cols {}`, `cols()`, `cols(colA, colB)`, `cols(1, 5)`, `cols(1..5)`, `[{}]`, `colSet[1, 3]`
7273

73-
Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
74-
or `ColumnSet`.
74+
Creates a subset of columns ([`ColumnSet`](#column-resolvers)) from the top-level, specified [column group](DataColumn.md#columngroup),
75+
or [`ColumnSet`](#column-resolvers).
7576
You can use either a `ColumnFilter`, or any of the `vararg` overloads for any [AccessApi](apiLevels.md).
7677
The function can be both typed and untyped (in case you're supplying a column name, -path, or index (range)).
7778

@@ -80,36 +81,36 @@ Note that you can also use the `[]` operator for most overloads of `cols` to ach
8081
##### Range of Columns {collapsible="true"}
8182
`colA.."colB"`
8283

83-
Creates a `ColumnSet` containing all columns from `colA` to `colB` (inclusive) from the top-level.
84+
Creates a [`ColumnSet`](#column-resolvers) containing all columns from `colA` to `colB` (inclusive) from the top-level.
8485
Columns inside [column groups](DataColumn.md#columngroup) are also supported
8586
(as long as they share the same direct parent), as well as any combination of [AccessApi](apiLevels.md).
8687

8788
##### Value Columns, Frame Columns, Column Groups {collapsible="true"}
8889
`valueCols {}`, `valueCols()`, `frameCols {}`, `frameCols()`, `colGroups {}`, `colGroups()`
8990

90-
Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
91-
or `ColumnSet` containing only [value columns](DataColumn.md#valuecolumn) / [frame columns](DataColumn.md#framecolumn) /
91+
Creates a subset of columns ([`ColumnSet`](#column-resolvers)) from the top-level, specified [column group](DataColumn.md#columngroup),
92+
or [`ColumnSet`](#column-resolvers) containing only [value columns](DataColumn.md#valuecolumn) / [frame columns](DataColumn.md#framecolumn) /
9293
[column groups](DataColumn.md#columngroup) that adhere to the optional condition.
9394

9495
##### Cols of Kind {collapsible="true"}
9596
`colsOfKind(Value, Frame) {}`, `colsOfKind(Group, Frame)`
9697

97-
Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
98-
or `ColumnSet` containing only columns of the specified kind(s) that adhere to the optional condition.
98+
Creates a subset of columns ([`ColumnSet`](#column-resolvers)) from the top-level, specified [column group](DataColumn.md#columngroup),
99+
or [`ColumnSet`](#column-resolvers) containing only columns of the specified kind(s) that adhere to the optional condition.
99100

100101
##### All (Cols) {collapsible="true"}
101102
`all()`, `allCols()`
102103

103-
Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
104-
or `ColumnSet`. This is the opposite of [`none()`](ColumnSelectors.md#none) and equivalent to
104+
Creates a [`ColumnSet`](#column-resolvers) containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
105+
or [`ColumnSet`](#column-resolvers). This is the opposite of [`none()`](ColumnSelectors.md#none) and equivalent to
105106
[`cols()`](ColumnSelectors.md#cols) without filter.
106107
Note, on [column groups](DataColumn.md#columngroup), `all` is named `allCols` instead to avoid confusion.
107108

108109
##### All (Cols) After, -Before, -From, -Up To {collapsible="true"}
109110
`allAfter(colA)`, `allBefore(colA)`, `allColsFrom(colA)`, `allColsUpTo(colA)`
110111

111-
Creates a `ColumnSet` containing a subset of columns from the top-level,
112-
specified [column group](DataColumn.md#columngroup), or `ColumnSet`.
112+
Creates a [`ColumnSet`](#column-resolvers) containing a subset of columns from the top-level,
113+
specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers).
113114
The subset includes:
114115
- `all(Cols)Before(colA)`: All columns before the specified column, excluding that column.
115116
- `all(Cols)After(colA)`: All columns after the specified column, excluding that column.
@@ -123,10 +124,10 @@ On `ColumnSets` they are a `ColumnFilter` instead.
123124
##### Cols at any Depth {collapsible="true"}
124125
`colsAtAnyDepth {}`, `colsAtAnyDepth()`
125126

126-
Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
127-
or `ColumnSet` at any depth if they satisfy the optional given predicate. This means that columns (of all three kinds!)
127+
Creates a [`ColumnSet`](#column-resolvers) containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
128+
or [`ColumnSet`](#column-resolvers) at any depth if they satisfy the optional given predicate. This means that columns (of all three kinds!)
128129
nested inside [column groups](DataColumn.md#columngroup) are also included.
129-
This function can also be followed by another `ColumnSet` filter-function like `colsOf<>()`, `single()`,
130+
This function can also be followed by another [`ColumnSet`](#column-resolvers) filter-function like `colsOf<>()`, `single()`,
130131
or `valueCols()`.
131132

132133
**For example:**
@@ -165,8 +166,8 @@ All value columns at any depth nested under a column group named "myColGroup":
165166
##### Cols in Groups {collapsible="true"}
166167
`colsInGroups {}`, `colsInGroups()`
167168

168-
Creates a `ColumnSet` containing all columns that are nested in the [column groups](DataColumn.md#columngroup) at
169-
the top-level, specified [column group](DataColumn.md#columngroup), or `ColumnSet` adhering to an optional predicate.
169+
Creates a [`ColumnSet`](#column-resolvers) containing all columns that are nested in the [column groups](DataColumn.md#columngroup) at
170+
the top-level, specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers) adhering to an optional predicate.
170171
This is useful if you want to select all columns that are "one level down".
171172

172173
This function used to be called `children()` in the past.
@@ -186,28 +187,28 @@ or with filter:
186187

187188
`df.select { colsInGroups { "user" in it.name } }`
188189

189-
Similarly, you can take the columns inside all [column groups](DataColumn.md#columngroup) in a `ColumnSet`:
190+
Similarly, you can take the columns inside all [column groups](DataColumn.md#columngroup) in a [`ColumnSet`](#column-resolvers):
190191

191192
`df.select { colGroups { "my" in it.name }.colsInGroups() }`
192193

193194
##### Take (Last) (Cols) (While) {collapsible="true"}
194195
`take(5)`, `takeLastCols(2)`, `takeLastWhile {}`, `takeColsWhile {}`,
195196

196-
Creates a `ColumnSet` containing the first / last `n` columns from the top-level,
197-
specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition.
197+
Creates a [`ColumnSet`](#column-resolvers) containing the first / last `n` columns from the top-level,
198+
specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers) or those that adhere to the given condition.
198199
Note, to avoid ambiguity, `take` is called `takeCols` when called on a [column group](DataColumn.md#columngroup).
199200

200201
##### Drop (Last) (Cols) (While) {collapsible="true"}
201202
`drop(5)`, `dropLastCols(2)`, `dropLastWhile {}`, `dropColsWhile {}`
202203

203-
Creates a `ColumnSet` without the first / last `n` columns from the top-level,
204-
specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition.
204+
Creates a [`ColumnSet`](#column-resolvers) without the first / last `n` columns from the top-level,
205+
specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers) or those that adhere to the given condition.
205206
Note, to avoid ambiguity, `drop` is called `dropCols` when called on a [column group](DataColumn.md#columngroup).
206207

207208
##### Select from [Column Group](DataColumn.md#columngroup) {collapsible="true"}
208209
`colGroupA.select {}`, `"colGroupA" {}`
209210

210-
Creates a `ColumnSet` containing the columns selected by a `ColumnsSelector` relative to the specified
211+
Creates a [`ColumnSet`](#column-resolvers) containing the columns selected by a `ColumnsSelector` relative to the specified
211212
[column group](DataColumn.md#columngroup). In practice, this means you're opening a new selection DSL scope inside a
212213
[column group](DataColumn.md#columngroup) and selecting columns from there.
213214
The selected columns are referenced individually and "unpacked" from their parent
@@ -242,14 +243,14 @@ This function is best explained in parts:
242243

243244
**On Column Sets:** `except {}`
244245

245-
This function can be explained the easiest with a `ColumnSet`.
246+
This function can be explained the easiest with a [`ColumnSet`](#column-resolvers).
246247
Let's say we want all `Int` columns apart from `age` and `height`.
247248

248249
We can do:
249250

250251
`df.select { colsOf<Int>() except (age and height) }`
251252

252-
which will 'subtract' the `ColumnSet` created by `age and height` from the `ColumnSet` created by
253+
which will 'subtract' the [`ColumnSet`](#column-resolvers) created by `age and height` from the [`ColumnSet`](#column-resolvers) created by
253254
[`colsOf<Int>()`](ColumnSelectors.md#cols-of).
254255

255256
This operation can also be used to exclude columns that are originally in [column groups](DataColumn.md#columngroup).
@@ -261,7 +262,7 @@ For instance, excluding `userData.age`:
261262
Note that the selection of columns to exclude from column sets is always done relative to the outer scope.
262263
Use the [Extension Properties API](extensionPropertiesApi.md) to prevent scoping issues if possible.
263264

264-
> Special case: If a column that needs to be removed appears multiple times in the `ColumnSet`,
265+
> Special case: If a column that needs to be removed appears multiple times in the [`ColumnSet`](#column-resolvers),
265266
> it is excepted each time it is encountered (including inside [Column Groups](DataColumn.md#columngroup)).
266267
> You could say the receiver `ColumnSet` is [simplified](ColumnSelectors.md#simplify) before the operation is performed:
267268
>
@@ -319,24 +320,24 @@ or:
319320
##### Column Name Filters {collapsible="true"}
320321
`nameContains()`, `colsNameContains()`, `nameStartsWith()`, `colsNameEndsWith()`
321322

322-
Creates a `ColumnSet` containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
323-
or `ColumnSet` that have names that satisfy the given function. These functions accept a `String` as argument, as
323+
Creates a [`ColumnSet`](#column-resolvers) containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
324+
or [`ColumnSet`](#column-resolvers) that have names that satisfy the given function. These functions accept a `String` as argument, as
324325
well as an optional `ignoreCase` parameter. For the `nameContains` variant, you can also pass a `Regex` as an argument.
325326
Note, on [column groups](DataColumn.md#columngroup), the functions have names starting with `cols` to avoid
326327
ambiguity.
327328

328329
##### (Cols) Without Nulls {collapsible="true"}
329330
`withoutNulls()`, `colsWithoutNulls()`
330331

331-
Creates a `ColumnSet` containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
332-
or `ColumnSet` that have no `null` values. This is a shorthand for `cols { !it.hasNulls() }`.
332+
Creates a [`ColumnSet`](#column-resolvers) containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
333+
or [`ColumnSet`](#column-resolvers) that have no `null` values. This is a shorthand for `cols { !it.hasNulls() }`.
333334
Note, to avoid ambiguity, `withoutNulls` is called `colsWithoutNulls` when called on a
334335
[column group](DataColumn.md#columngroup).
335336

336337
##### Distinct {collapsible="true"}
337338
`colSet.distinct()`
338339

339-
Returns a new `ColumnSet` from the specified `ColumnSet` containing only distinct columns (by path).
340+
Returns a new [`ColumnSet`](#column-resolvers) from the specified [`ColumnSet`](#column-resolvers) containing only distinct columns (by path).
340341
This is useful when you've selected the same column multiple times but only want it once.
341342

342343
This does not cover the case where a column is selected individually and through its enclosing
@@ -348,30 +349,30 @@ For this, you'll need to [rename](ColumnSelectors.md#rename) one of the columns.
348349
##### None {collapsible="true"}
349350
`none()`
350351

351-
Creates an empty `ColumnSet`, essentially selecting no columns at all.
352+
Creates an empty [`ColumnSet`](#column-resolvers), essentially selecting no columns at all.
352353
This is the opposite of [`all()`](ColumnSelectors.md#all-cols).
353354

354355
This function mostly exists for completeness, but can be useful in some very specific cases.
355356

356357
##### Cols Of {collapsible="true"}
357358
`colsOf<T>()`, `colsOf<T> {}`
358359

359-
Creates a `ColumnSet` containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
360-
or `ColumnSet` that are a subtype of the specified type `T` and adhere to the optional condition.
360+
Creates a [`ColumnSet`](#column-resolvers) containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
361+
or [`ColumnSet`](#column-resolvers) that are a subtype of the specified type `T` and adhere to the optional condition.
361362

362363
##### Simplify {collapsible="true"}
363364
`colSet.simplify()`
364365

365-
Returns a new `ColumnSet` from the specified `ColumnSet` in 'simplified' form.
366-
This function simplifies the structure of the `ColumnSet` by removing columns that are already present in
366+
Returns a new [`ColumnSet`](#column-resolvers) from the specified [`ColumnSet`](#column-resolvers) in 'simplified' form.
367+
This function simplifies the structure of the [`ColumnSet`](#column-resolvers) by removing columns that are already present in
367368
[column groups](DataColumn.md#columngroup), returning only these groups,
368369
plus columns not belonging in any of the groups.
369370

370-
In other words, this means that if a column in the `ColumnSet` is inside a [column group](DataColumn.md#columngroup)
371-
in the `ColumnSet`, it will not be included in the result.
371+
In other words, this means that if a column in the [`ColumnSet`](#column-resolvers) is inside a [column group](DataColumn.md#columngroup)
372+
in the [`ColumnSet`](#column-resolvers), it will not be included in the result.
372373

373374
It's useful in combination with [`colsAtAnyDepth {}`](ColumnSelectors.md#cols-at-any-depth), as that function can
374-
create a `ColumnSet` containing both a column and the [column group](DataColumn.md#columngroup) it's in.
375+
create a [`ColumnSet`](#column-resolvers) containing both a column and the [column group](DataColumn.md#columngroup) it's in.
375376

376377
In the past, was named `top()` and `roots()`, but these names have been deprecated.
377378

@@ -382,13 +383,13 @@ In the past, was named `top()` and `roots()`, but these names have been deprecat
382383
##### Filter {collapsible="true"}
383384
`colSet.filter {}`
384385

385-
Returns a new `ColumnSet` from the specified `ColumnSet` containing only columns that satisfy the given condition.
386+
Returns a new [`ColumnSet`](#column-resolvers) from the specified [`ColumnSet`](#column-resolvers) containing only columns that satisfy the given condition.
386387
This function behaves the same as [`cols {}` and `[{}]`](ColumnSelectors.md#cols), but only exists on column sets.
387388

388389
##### And {collapsible="true"}
389390
`colSet and colB`
390391

391-
Creates a `ColumnSet` containing the columns from both the left and right side of the function. This allows
392+
Creates a [`ColumnSet`](#column-resolvers) containing the columns from both the left and right side of the function. This allows
392393
you to combine selections or simply select multiple columns at once.
393394

394395
Any combination of [AccessApi](apiLevels.md) can be used on either side of the `and` operator.
@@ -595,3 +596,27 @@ df.select { (colsOf<Int>() and age).distinct() }
595596

596597
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectorsModifySet.html" width="100%"/>
597598
<!---END-->
599+
600+
### Column Resolvers
601+
602+
`ColumnsResolver` is the base type used to resolve columns within the **Columns Selection DSL**,
603+
as well as the return type of columns selection expressions.
604+
605+
All functions described above for selecting columns in various ways return a `ColumnResolver` of a specific kind:
606+
607+
- **`SingleColumn`** — resolves to a single [`DataColumn`](DataColumn.md).
608+
- **`ColumnAccessor`** — a specialized `SingleColumn` with a defined path and type argument.
609+
It can also be renamed during selection.
610+
- **`ColumnPath`** — a wrapper for a [`DataColumn`](DataColumn.md) path
611+
in a [`DataFrame`](DataFrame.md) also can serve as a `ColumnAccessor`.
612+
```kotlin
613+
// Select all columns from the group by path "group2"/"info":
614+
df.select { pathOf("group2", "info").allCols() }
615+
// For each selected column, place it under its ancestor group
616+
// from two levels up in the column path hierarchy:
617+
df.group { colsAtAnyDepth().colsOf<String>() }
618+
.into { it.path.dropLast(2) }
619+
```
620+
- **`ColumnSet`** — resolves to an ordered list of [`DataColumn`s](DataColumn.md).
621+
622+

0 commit comments

Comments
 (0)