Skip to content

Breaking: Materialize DimArray or DimStack From a Table #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 64 commits into from
Aug 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
60256a0
Table Materializer Methods
JoshuaBillson Jun 18, 2024
3526b96
Merged Main
JoshuaBillson Jun 18, 2024
eab2fa0
Made col Optional for DimArray
JoshuaBillson Jun 18, 2024
d4892df
Apply suggestions from code review
JoshuaBillson Jun 20, 2024
ea6751a
Handle coordinates with different loci
JoshuaBillson Jun 20, 2024
13c80da
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Jun 20, 2024
6a9d26e
replaced At() with Contains() in _coords_to_ords
JoshuaBillson Jun 20, 2024
9164c22
Added optional selectors and public methods for table materializer
JoshuaBillson Jun 25, 2024
2ebec1c
Updated table constructors for DimArray and DimStack
JoshuaBillson Jun 25, 2024
8e791bf
Updated DimArray and DimStack docs to include table materializer methods
JoshuaBillson Jul 5, 2024
4cd5f9d
Table materializer test cases
JoshuaBillson Jul 5, 2024
0c1991a
export table materializer methods
JoshuaBillson Jul 5, 2024
8758ba9
Merge branch 'rafaqz:main' into materialize
JoshuaBillson Jul 5, 2024
4534de5
Added Random to tables.jl test cases
JoshuaBillson Jul 5, 2024
119fa30
Merge branch 'rafaqz:main' into materialize
JoshuaBillson Aug 8, 2024
ed395ca
Update src/array/array.jl
JoshuaBillson Aug 8, 2024
00336af
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
532f887
Removed exports
JoshuaBillson Aug 8, 2024
c98dcb0
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Aug 8, 2024
06a2c91
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
3bacf33
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
4ced6f7
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
c846dfd
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
fe2c871
Update src/table_ops.jl
JoshuaBillson Aug 8, 2024
61f8220
Replaced selector type with instance.
JoshuaBillson Aug 8, 2024
3d28b43
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Aug 8, 2024
dbe7b99
Table materializer can now infer dimensions from the coordinates.
JoshuaBillson Aug 12, 2024
f410988
Update src/stack/stack.jl
JoshuaBillson Sep 18, 2024
a17f069
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
9bdded9
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
5451087
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
faf4d76
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
02f60a3
Update src/table_ops.jl
JoshuaBillson Sep 18, 2024
fafd357
Update src/table_ops.jl
JoshuaBillson Sep 22, 2024
d7f15f5
Update src/array/array.jl
JoshuaBillson Sep 25, 2024
34a0a69
Update src/table_ops.jl
JoshuaBillson Sep 26, 2024
d0b9eb7
Added support for guessing the dimension ordering and span for Dates …
JoshuaBillson Sep 26, 2024
32b0c00
Merge branch 'materialize' of github.com:JoshuaBillson/DimensionalDat…
JoshuaBillson Sep 26, 2024
0ea72a0
Replaced LinRange with StepRangeLen in _build_dim
JoshuaBillson Sep 27, 2024
bc62932
Added Tables.istable check to DimArray constructor
JoshuaBillson Oct 15, 2024
76f8805
Update src/array/array.jl
rafaqz Mar 19, 2025
a223fe8
Merge branch 'main' into materialize
tiemvanderdeure Mar 20, 2025
ae13b26
merge materialize2
tiemvanderdeure May 5, 2025
0aae0c3
Merge branch 'main' into materialize
tiemvanderdeure May 6, 2025
95fe3f6
fix scuffed merge
tiemvanderdeure May 6, 2025
81a32e1
filter instead of indexing in test for clarity
tiemvanderdeure May 7, 2025
f08ba1f
Merge branch 'breaking' into materialize
rafaqz Jun 28, 2025
0523a60
Merge branch 'breaking' into materialize
rafaqz Jun 28, 2025
7b0f5e8
fix DimSlices doc
rafaqz Jun 28, 2025
eea6d07
Merge branch 'breaking' into materialize
rafaqz Jun 28, 2025
5ba06ce
fix ambiguities
rafaqz Jun 28, 2025
b2e99a5
bugfixes
rafaqz Jun 28, 2025
12daf7a
do checks and call Tables.columns before constructing stack from table
tiemvanderdeure Jun 28, 2025
df37668
test dimensions are automatically detected when constructing dimstack
tiemvanderdeure Jun 28, 2025
1573bc2
comments not docstrings for internals
tiemvanderdeure Jun 30, 2025
28a252f
check for columnaccess if dims are passed
tiemvanderdeure Jun 30, 2025
aec86a9
add type argument to dimarray_from_table
tiemvanderdeure Jun 30, 2025
305ab4d
allow passing name to DimStack
tiemvanderdeure Jul 4, 2025
dad3bc8
add a section to the documentation
tiemvanderdeure Jul 4, 2025
056c1e8
use Tables.columnnames instead of keys
tiemvanderdeure Jul 5, 2025
74578a5
make DimArray work with all tables that are abstractarrays
tiemvanderdeure Jul 5, 2025
e32037f
do not treat dimvectors as tables
tiemvanderdeure Jul 5, 2025
a9ebc20
simplify get_column
tiemvanderdeure Jul 5, 2025
55f8017
Merge branch 'breaking' into materialize
rafaqz Aug 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 70 additions & 2 deletions docs/src/tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,22 @@

[Tables.jl](https://github.com/JuliaData/Tables.jl) provides an ecosystem-wide interface to tabular data in Julia, ensuring interoperability with [DataFrames.jl](https://dataframes.juliadata.org/stable/), [CSV.jl](https://csv.juliadata.org/stable/), and hundreds of other packages that implement the standard.

## Dimensional data are tables
DimensionalData.jl implements the Tables.jl interface for `AbstractDimArray` and `AbstractDimStack`. `DimStack` layers are unrolled so they are all the same size, and dimensions loop to match the length of the largest layer.

Columns are given the [`name`](@ref) of the array or stack layer, and the result of `DD.name(dimension)` for `Dimension` columns.

Looping of dimensions and stack layers is done _lazily_,
and does not allocate unless collected.
Looping of dimensions and stack layers is done _lazily_, and does not allocate unless collected.

## Materializing tables to DimArray or DimStack
`DimArray` and `DimStack` have fallback methods to materialize any `Tables.jl`-compatible table.

By default, it will treat columns such as X, Y, Z, and Band as dimensions, and other columns as data.
Pass a `name` keyword argument to determine which column(s) are used.

You have full control over which columns are dimensions - and what those dimensions look like exactly. If you pass a `Tuple` of `Symbol` or dimension types (e.g. `X`) as the second argument, those columns are treated as dimensions. Passing a `Tuple` of dimensions preserves these dimensions - with values matched to the corresponding columns.

Materializing tables will worked even if the table is not ordered, and can handle missing values.

## Example

Expand Down Expand Up @@ -89,3 +99,61 @@ using CSV
CSV.write("dimstack.csv", st)
readlines("dimstack.csv")
````

## Converting a DataFrame to a DimArray or DimStack

The Dataframe we use will have 5 columns: X, Y, category, data1, and data2

````@ansi dataframe
df = DataFrame(st)
````

::: tabs

== Create a `DimArray`

Converting this DataFrame to a DimArray without other arguments will read the `category` columns as data and ignore data1 and data2:

````@ansi dataframe
DimArray(df)
````

Specify dimenion names to ensure these get treated as dimensions. Now data1 is read in instead.
````@ansi dataframe
DimArray(df, (X,Y,:category))
````

You can also pass in the actual dimensions.
````@ansi dataframe
DimArray(df, dims(st))
````

Pass in a name argument to read in data2 instead.
````@ansi dataframe
DimArray(df, dims(st); name = :data2)
````

== Create a `DimStack`

Converting the DataFrame to a `DimStack` will by default read category, data1, and data2 as layers
````@ansi dataframe
DimStack(df)
````


Specify dimenion names to ensure these get treated as dimensions. Now data1 and data2 are layers.
````@ansi dataframe
DimStack(df, (X,Y,:category))
````

You can also pass in the actual dimensions.
````@ansi dataframe
DimStack(df, dims(st))
````

Pass in a tuple of column names to control which columns are read.
````@ansi dataframe
DimStack(df, dims(st); name = (:data2,))
````

:::
1 change: 1 addition & 0 deletions src/DimensionalData.jl
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ const DD = DimensionalData
# Common
include("interface.jl")
include("name.jl")
include("table_ops.jl")

# Arrays
include("array/array.jl")
Expand Down
1 change: 1 addition & 0 deletions src/Dimensions/dimension.jl
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ lookup(dim::Union{DimType,Val{<:Dimension}}) = NoLookup()
name(dim::Dimension) = name(typeof(dim))
name(dim::Val{D}) where D = name(D)
name(dim::Type{D}) where D<:Dimension = nameof(D)
name(s::Symbol) = s

label(x) = string(name(x))

Expand Down
59 changes: 56 additions & 3 deletions src/array/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,8 @@ function Base.NamedTuple(A1::AbstractDimArray, As::AbstractDimArray...)
end

# undef constructor for all AbstractDimArray
(::Type{A})(x::UndefInitializer, dims::Dimension...; kw...) where {A<:AbstractDimArray{<:Any}} = A(x, dims; kw...)
(::Type{A})(x::UndefInitializer, dims::Dimension...; kw...) where {A<:AbstractDimArray{T}} where T =
A(x, dims; kw...)
function (::Type{A})(x::UndefInitializer, dims::DimTuple; kw...) where {A<:AbstractDimArray{T}} where T
basetypeof(A)(Array{T}(undef, size(dims)), dims; kw...)
end
Expand Down Expand Up @@ -410,13 +411,14 @@ moves dimensions to reference dimension `refdims` after reducing operations

## Arguments

- `data`: An `AbstractArray`.
- `data`: An `AbstractArray` or a table with coordinate columns corresponding to `dims`.
- `gen`: A generator expression. Where source iterators are `Dimension`s the dim args or kw is not needed.
- `dims`: A `Tuple` of `Dimension`
- `name`: A string name for the array. Shows in plots and tables.
- `refdims`: refence dimensions. Usually set programmatically to track past
slices and reductions of dimension for labelling and reconstruction.
- `metadata`: `Dict` or `Metadata` object, or `NoMetadata()`
- `selector`: The coordinate selector type to use when materializing from a table.

Indexing can be done with all regular indices, or with [`Dimension`](@ref)s
and/or [`Selector`](@ref)s.
Expand Down Expand Up @@ -512,6 +514,57 @@ function DimArray(A::AbstractBasicDimArray;
newdata = collect(data)
DimArray(newdata, format(dims, newdata); refdims, name, metadata)
end
# Tables
# Write a single column from a table with one or more coordinate columns to a DimArray
function DimArray(table, dims; kw...)
# Confirm that the Tables interface is implemented
Tables.istable(table) || throw(ArgumentError("`obj` must be an `AbstractArray` or satisfy the `Tables.jl` interface."))
table = Tables.columnaccess(table) ? table : Tables.columns(table)
dimarray_from_table(DimArray, table, guess_dims(table, dims); kw...)
end
# Same as above, but guess dimension names from scratch
function DimArray(table; kw...)
# Confirm that the Tables interface is implemented
Tables.istable(table) || throw(ArgumentError("`table` must satisfy the `Tables.jl` interface."))
table = Tables.columnaccess(table) ? table : Tables.columns(table)
# Use default dimension
return dimarray_from_table(DimArray, table, guess_dims(table; kw...); kw...)
end
# Special-case for AbstractVectors - these might be tables
function DimArray(data::AbstractVector, dims::Tuple;
refdims=(), name=NoName(), metadata=NoMetadata(), kw...
)
if !(data isa AbstractBasicDimArray) && Tables.istable(data) &&
all(map(d -> Dimensions.name(d) in Tables.schema(data).names, dims))
table = Tables.columns(data)
dims = guess_dims(table, dims; kw...)
return dimarray_from_table(DimArray, table, dims; refdims, name, metadata, kw...)
else
return DimArray(data, format(dims, data), refdims, name, metadata)
end
end

function dimarray_from_table(::Type{T}, table, dims;
name=NoName(),
selector=nothing,
precision=6,
missingval=missing,
kw...
) where T <: AbstractDimArray
# Determine row indices based on coordinate values
indices = coords_to_indices(table, dims; selector, atol=10.0^-precision)

# Extract the data column correspondong to `name`
col = name == NoName() ? data_col_names(table, dims) |> first : Symbol(name)
data = Tables.getcolumn(table, col)

# Restore array data
array = restore_array(data, indices, dims, missingval)

# Return DimArray
return T(array, dims, name=col; kw...)
end

"""
DimArray(f::Function, dim::Dimension; [name])

Expand All @@ -520,7 +573,7 @@ Apply function `f` across the values of the dimension `dim`
the given dimension. Optionally provide a name for the result.
"""
function DimArray(f::Function, dim::Dimension; name=Symbol(nameof(f), "(", name(dim), ")"))
DimArray(f.(val(dim)), (dim,); name)
DimArray(f.(val(dim)), (dim,); name)
end

DimArray(itr::Base.Generator; kwargs...) = rebuild(collect(itr); kwargs...)
Expand Down
64 changes: 61 additions & 3 deletions src/stack/stack.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ const AbstractVectorDimStack = AbstractDimStack{K,T,1} where {K,T}
const AbstractMatrixDimStack = AbstractDimStack{K,T,2} where {K,T}

(::Type{T})(st::AbstractDimStack; kw...) where T<:AbstractDimArray =
dimarray_from_dimstack(T, st; kw...)
# For ambiguity
DimArray(st::AbstractDimStack; kw...) = dimarray_from_dimstack(DimArray, st; kw...)

dimarray_from_dimstack(T, st; kw...) =
T([st[D] for D in DimIndices(st)]; dims=dims(st), metadata=metadata(st), kw...)

data(s::AbstractDimStack) = getfield(s, :data)
Expand Down Expand Up @@ -101,14 +106,16 @@ and an existing stack.

# Keywords

Keywords are simply the fields of the stack object:
Keywords are simply the common fields of an `AbstractDimStack` object:

- `data`
- `dims`
- `refdims`
- `metadata`
- `layerdims`
- `layermetadata`

There is no promise that these keywords will be used in all cases.
"""
function rebuild_from_arrays(
s::AbstractDimStack{Keys}, das::Tuple{Vararg{AbstractBasicDimArray}}; kw...
Expand Down Expand Up @@ -340,6 +347,7 @@ end
"""
DimStack <: AbstractDimStack

DimStack(table, [dims]; kw...)
DimStack(data::AbstractDimArray...; kw...)
DimStack(data::Union{AbstractArray,Tuple,NamedTuple}, [dims::DimTuple]; kw...)
DimStack(data::AbstractDimArray; layersfrom, kw...)
Expand Down Expand Up @@ -512,14 +520,17 @@ function DimStack(das::NamedTuple{<:Any,<:Tuple{Vararg{AbstractDimArray}}};
end
DimStack(data::Union{Tuple,AbstractArray,NamedTuple}, dim::Dimension; name=uniquekeys(data), kw...) =
DimStack(NamedTuple{Tuple(name)}(data), (dim,); kw...)
DimStack(data::Union{Tuple,AbstractArray}, dims::Tuple; name=uniquekeys(data), kw...) =
DimStack(data::Union{Tuple,AbstractArray{<:AbstractArray}}, dims::Tuple; name=uniquekeys(data), kw...) =
DimStack(NamedTuple{Tuple(name)}(data), dims; kw...)
function DimStack(data::NamedTuple{K}, dims::Tuple;
refdims=(),
metadata=NoMetadata(),
layermetadata=nothing,
layerdims=nothing
) where K
if length(data) > 0 && Tables.istable(data) && all(d -> name(d) in keys(data), dims)
return dimstack_from_table(DimStack, data, dims; refdims, metadata)
end
layerdims = if isnothing(layerdims)
all(map(d -> axes(d) == axes(first(data)), data)) || _stack_size_mismatch()
map(_ -> basedims(dims), data)
Expand All @@ -546,6 +557,53 @@ function DimStack(st::AbstractDimStack;
DimStack(data, dims, refdims, layerdims, metadata, layermetadata)
end

# Write each column from a table with one or more coordinate columns to a layer in a DimStack
function DimStack(data, dims::Tuple; kw...
)
if Tables.istable(data)
table = Tables.columns(data)
all(map(d -> Dimensions.name(d) in Tables.columnnames(table), dims)) || throw(ArgumentError(
"All dimensions in dims must be in the table columns."
))
dims = guess_dims(table, dims; kw...)
return dimstack_from_table(DimStack, table, dims; kw...)
else
throw(ArgumentError(
"""data must be a table with coordinate columns, an AbstractArray,
or a Tuple or NamedTuple of AbstractArrays"""
))

end
end
function DimStack(table; kw...)
if Tables.istable(table)
table = Tables.columns(table)
dimstack_from_table(DimStack, table, guess_dims(table; kw...); kw...)
else
throw(ArgumentError(
"""data must be a table with coordinate columns, an AbstractArray,
or a Tuple or NamedTuple of AbstractArrays"""
)) end
end

function dimstack_from_table(::Type{T}, table, dims;
name=nothing,
selector=nothing,
precision=6,
missingval=missing,
kw...
) where T<:AbstractDimStack
table = Tables.columnaccess(table) ? table : Tables.columns(table)
data_cols = isnothing(name) ? data_col_names(table, dims) : name
dims = guess_dims(table, dims; precision)
indices = coords_to_indices(table, dims; selector)
layers = map(data_cols) do col
d = Tables.getcolumn(table, col)
restore_array(d, indices, dims, missingval)
end
return T(layers, dims; name = data_cols, kw...)
end

layerdims(s::DimStack{<:Any,<:Any,<:Any,<:Any,<:Any,<:Any,Nothing}, name::Symbol) = dims(s)

### Skipmissing on DimStacks
Expand Down Expand Up @@ -573,4 +631,4 @@ Base.eltype(::Type{Base.SkipMissing{T}}) where {T<:AbstractDimStack{<:Any, NT}}
_nonmissing_nt(NT)

@generated _nonmissing_nt(NT::Type{<:NamedTuple{K,V}}) where {K,V} =
NamedTuple{K, Tuple{map(Base.nonmissingtype, V.parameters)...}}
NamedTuple{K, Tuple{map(Base.nonmissingtype, V.parameters)...}}
Loading
Loading