Skip to content

Conversation

marciw
Copy link
Contributor

@marciw marciw commented Sep 30, 2025

Work in progress

Part of https://github.com/elastic/docs-team/issues/31?issue=elastic%7Cdocs-team%7C41

Status
🟒 Ready for PM/engineer review
🚧 Not ready for tech writer review

❗ Note for reviewers: We're going for "MVP" docs for now and tracking additional improvements in #3179

Changes

  • Revised overview: simplified, clarified
  • Revised setup: removed component templates, simplified
  • New advanced section (reindex, advanced concepts)

TODO:

  • Reconcile with recent changes to general data stream docs
  • Check docs patterns/style/etc.

### Create the destination data stream and reindex [tsds-reindex-op]

Invoke the reindex api, for instance:
Run the reindex operation using `op_type: create` to prevent overwrites:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe skip to prevent overwrites ? This is a new data stream, so no overwriting is possible?

Both a regular data stream and a time series data stream can store timestamped metrics data.

Use a time series data stream for metrics data only. For other timestamped data, such as logs or traces, use a [logs data stream](logs-data-stream.md) or regular data stream.
Choose a time series data stream if you typically add metrics data to {{es}} in near real-time and in `@timestamp` order. For other timestamped data, such as logs or traces, use a [logs data stream](logs-data-stream.md) or [regular data stream](/manage-data/data-store/data-streams.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider expanding what metrics data means. Here, we're looking for a sequence of data point-timestamp pairs, identified by one or more dimension fields that can be used for slicing in aggregation queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nicely described in Time-series concepts below. Maybe add a cross reference, or have this section follow that one?

* **Required fields:** In a TSDS, each document contains:
* A `@timestamp` field
* One or more [dimension fields](#time-series-dimension), set with `time_series_dimension: true`
* One or more [metric fields](#time-series-metric) (not strictly required, but typical for a TSDS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the not strictly required part. A time-series requires a metric field with non-null values.

* One or more [dimension fields](#time-series-dimension), set with `time_series_dimension: true`
* One or more [metric fields](#time-series-metric) (not strictly required, but typical for a TSDS)
* **Document IDs:** Time series documents use two IDs:
* An internal [`_tsid`](#tsid) metadata field, generated by {{es}} for each document in a TSDS and used for sorting and compression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking whether we need to expose the _tsid so readily in our documentation.. If we do, we probably want to mention that it's calculated over all dimension values.

Another option is to have a section towards the end, shedding some light into how data gets structured internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, here I would mention that the id is calculated by es and cannot be provided, and if we want to elaborate more it should be in an implementation section. The reason is that I see it as an implementation detail and not as part of the API, if that makes sense.

* One or more [metric fields](#time-series-metric) (not strictly required, but typical for a TSDS)
* **Document IDs:** Time series documents use two IDs:
* An internal [`_tsid`](#tsid) metadata field, generated by {{es}} for each document in a TSDS and used for sorting and compression
* The document `_id`, a generated hash of the document's dimensions and `@timestamp` (custom `_id` values are not supported)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that it's auto-generated - passing a doc id during indexing results to an error.

* An internal [`_tsid`](#tsid) metadata field, generated by {{es}} for each document in a TSDS and used for sorting and compression
* The document `_id`, a generated hash of the document's dimensions and `@timestamp` (custom `_id` values are not supported)
* **Backing indices:** A TSDS uses [time-bound indices](/manage-data/data-store/data-streams/time-bound-tsds.md) to store data from the same time period in the same backing index.
* **Dimension-based routing:** The matching index template for a TSDS must contain the `index.routing_path` index setting, which specifies dimensions for routing documents to shards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, the setting gets auto-generated if not present in the templates. We actually prefer to auto-generate, and it gets replaced by a different, internal setting that users can no longer touch.

It may suffice to note here that routing logic uses dimension field values to map data to shards per time series, improving storage efficiency and query performance.

* **Backing indices:** A TSDS uses [time-bound indices](/manage-data/data-store/data-streams/time-bound-tsds.md) to store data from the same time period in the same backing index.
* **Dimension-based routing:** The matching index template for a TSDS must contain the `index.routing_path` index setting, which specifies dimensions for routing documents to shards.
* **Sorting:** A TSDS uses internal [index sorting](elasticsearch://reference/elasticsearch/index-settings/sorting.md) to order shard segments by `_tsid` and `@timestamp`, for better compression. Time series data streams do not use `index.sort.*` settings.
* **Synthetic source:** A TSDS uses [synthetic `_source`](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source), which has some [restrictions](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source-restrictions) and [modifications](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source-modifications).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be mentioning this here.. Synthetic source is orthogonal to TSDS these days, only available for enterprise license. When available, it reduces the storage footprint with no loss of functionality, but everything works without it too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference, though, is that it's not possible to disable source for a TSDS. It's either standard or synthetic source mode.

* A TSDS uses internal [index sorting](elasticsearch://reference/elasticsearch/index-settings/sorting.md) to order shard segments by `_tsid` and `@timestamp`.
* TSDS documents only support auto-generated document `_id` values. For TSDS documents, the document `_id` is a hash of the document’s dimensions and `@timestamp`. A TSDS doesn’t support custom document `_id` values.
* A TSDS uses [synthetic `_source`](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source), and as a result is subject to some [restrictions](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source-restrictions) and [modifications](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source-modifications) applied to the `_source` field.
You can use the {{esql}} [`TS` command](elasticsearch://reference/query-languages/esql/commands/ts.md) to query time series data streams. The `TS` command is optimized for time series data. It also enables the use of aggregation functions that efficiently process metrics per time series, before aggregating results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mention that it's in tech preview?

:::

In a TSDS, each {{es}} document represents an observation, or data point, in a specific time series. Although a TSDS can contain multiple time series, a document can only belong to one time series. A time series can’t span multiple data streams.
In a TSDS, each {{es}} document represents an observation, or data point, in a specific time series. Although a TSDS can contain multiple time series, a document can belong to only one time series. A single time series can't span multiple data streams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not 100% correct. The proper definition of a time series includes the metric name. Since we can have multiple metric fields populated in a single doc, these map to multiple time series. @felixbarny too for thoughts here.


### Time series fields

Compared to a regular data stream, a TSDS uses some additional fields specific to time series: dimension fields (required) and metric fields (optional but usually defined), plus an internal `_tsid` metadata field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, let's not call metric fields optional.


A TSDS document is uniquely identified by its time series and timestamp, both of which are used to generate the document `_id`. So, two documents with the same dimensions and the same timestamp are considered to be duplicates. When you use the `_bulk` endpoint to add documents to a TSDS, a second document with the same timestamp and dimensions overwrites the first. When you use the `PUT /<target>/_create/<_id>` format to add an individual document and a document with the same `_id` already exists, an error is generated.
:::{tip}
{{es}} uses dimensions and timestamps to generate time series document `_id` values. Two documents with the same dimensions and timestamp are considered duplicates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renders the reference on _id above redundant, imho. Let's just keep this one.

@kkrik-es kkrik-es requested a review from felixbarny October 1, 2025 06:54
To work with a flattened field, use the `time_series_dimensions` parameter to configure an array of fields as dimensions. For details, refer to [`flattened`](elasticsearch://reference/elasticsearch/mapping-reference/flattened.md#flattened-params).

You can also simplify dimension definitions by using [pass-through](elasticsearch://reference/elasticsearch/mapping-reference/passthrough.md#passthrough-dimensions) fields.
:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not hide this, it's probably the simplest and recommended way to define dimensions.

#### Metrics [time-series-metric]

Metrics are fields that contain numeric measurements, as well as aggregations and/or downsampling values based off of those measurements. While not required, documents in a TSDS typically contain one or more metric fields.
Metrics are numeric measurements that change over time. Although metrics are not required, documents in a TSDS typically contain one or more metric fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Metrics are numeric measurements that change over time. Although metrics are not required, documents in a TSDS typically contain one or more metric fields.
Metrics are numeric measurements that change over time. Documents in a TSDS typically contain one or more metric fields.

To mark a field as a metric, you must specify a metric type using the `time_series_metric` mapping parameter. The following field types support the `time_series_metric` parameter:
To mark a field as a metric, use the `time_series_metric` mapping parameter. This parameter ensures data is stored in an optimal way for time series analysis. The following field types support the `time_series_metric` parameter:

* [`aggregate_metric_double`](elasticsearch://reference/elasticsearch/mapping-reference/aggregate-metric-double.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move this second, it's very rare that users populate it. It gets internally generated during downsampling, mostly.

Due to the cumulative nature of counter fields, the following aggregations are supported and expected to provide meaningful results with the `counter` field: `rate`, `histogram`, `range`, `min`, `max`, `top_metrics` and `variable_width_histogram`. In order to prevent issues with existing integrations and custom dashboards, we also allow the following aggregations, even if the result might be meaningless on counters: `avg`, `box plot`, `cardinality`, `extended stats`, `median absolute deviation`, `percentile ranks`, `percentiles`, `stats`, `sum` and `value count`.
::::

: A cumulative metric that only monotonically increases or resets to `0` (zero). For example, a count of errors or completed tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, resetting when a serving process restarts.


#### `_tsid` metadata field [tsid]

The `_tsid` is an automatically generated object containing the document’s dimensions. It's intended for internal {{es}} use, so in most cases you won't need to work with it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is defined here properly, let's skip the reference at the top.

{{es}} uses [compression algorithms](elasticsearch://reference/elasticsearch/index-settings/index-modules.md#index-codec) to compress repeated values. This compression works best when repeated values are stored near each other β€” in the same index, on the same shard, and side-by-side in the same shard segment.

Most time series data contains repeated values. Dimensions are repeated across documents in the same time series. The metric values of a time series may also change slowly over time.
- You **can't** query or update the internal `_tsid` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd skip these 3 points and just keep the last one, to highlight why it should not be used in queries.

- **Index patterns:** One or more wildcard patterns matching the name of your TSDS, such as `weather-sernsors-*`. For best results, use the [data stream naming scheme](/reference/fleet/data-streams.md#data-streams-naming-scheme).
- **Data stream object:** The template must include `"data_stream": {}`.
- **Time series mode:** Set `index.mode: time_series`.
- **Field mappings:** Define at least one `keyword` dimension field and typically one or more metric fields:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Field mappings:** Define at least one `keyword` dimension field and typically one or more metric fields:
- **Field mappings:** Define at least one dimension field and typically one or more metric fields:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dimensions are no longer required to be keyword fields.

- **Time series mode:** Set `index.mode: time_series`.
- **Field mappings:** Define at least one `keyword` dimension field and typically one or more metric fields:
- To define a dimension, set `time_series_dimension` to `true`. Dimension fields like `counter` only increase over time. For more details, refer to [Dimensions](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md#time-series-dimension).
- To define a metric, use the `time_series_metric` mapping parameter. Metric fields like `gauge` can increase or decrease over time. For more details, refer to [Metrics](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md#time-series-metric).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either mention counters too, or just stick to the cross reference for more details.

- **Data stream object:** The template must include `"data_stream": {}`.
- **Time series mode:** Set `index.mode: time_series`.
- **Field mappings:** Define at least one `keyword` dimension field and typically one or more metric fields:
- To define a dimension, set `time_series_dimension` to `true`. Dimension fields like `counter` only increase over time. For more details, refer to [Dimensions](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md#time-series-dimension).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be referencing pass-through fields here. Dimensions are often defined dynamically, so the pass-through object can be used as a dimension container to simplify definitions. See also:

https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/passthrough#passthrough-dimensions

:::{dropdown} Create an ILM policy

## Create an index lifecycle policy [tsds-ilm-policy]
If you're using {{stack}}, {{ilm-init}} can help you manage a time series data stream's backing indices. {{ilm-init}} requires an index lifecycle policy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in favour of adding ILM here for the following reasons:

  • Not available in serverless, meaning that for serverless we have provided no way of automating lifecycle management.
  • Too verbose, I understand it's an optional step, but still it's the first step we show.

I think the example should use data stream lifecycle and propose ILM as an alternative and reference it's documentation for more info.

What do you think?

```
You can convert an existing regular data stream to a TSDS. Follow these steps:

1. Update your existing index template to include time series settings. Also update your index lifecycle policy (if any) and component templates (if any).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sentence here is a vague, when I read I have no idea what kind of updates it is talking about. Maybe something like:

Update your existing index template and/or component templates (if any) to include time series settings.

It is reasonable to bundle the index template and component templates together, because depending on the setup it might be enough to update only one component template.

About the ILM policy, I have no idea what updates it is referring to, that's why I would suggest to remove it.

After creating the index template, you can create a time series data stream by [indexing a document](use-data-stream.md#add-documents-to-a-data-stream). The TSDS is created automatically when you index the first document, as long as the index name matches the index template pattern. You can use a bulk API request or a POST request.

:::{important}
To test the following `_bulk` example, update the timestamps to within three hours of your current time. Data added to a TSDS must fit the [accepted time range](/manage-data/data-store/data-streams/time-bound-tsds.md#tsds-accepted-time-range).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To test the following `_bulk` example, update the timestamps to within three hours of your current time. Data added to a TSDS must fit the [accepted time range](/manage-data/data-store/data-streams/time-bound-tsds.md#tsds-accepted-time-range).
To test the following `_bulk` example, update the timestamps to within two hours of your current time. Data added to a TSDS must fit the [accepted time range](/manage-data/data-store/data-streams/time-bound-tsds.md#tsds-accepted-time-range).


Only data that falls within this range is indexed.

To check the accepted time range for writing to a TSDS, use the [get data stream API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-get-data-stream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing a nuance here, this API responds with the time range supported by a TSDB, but the writes are not necessarily accepted in this time range, if a backing index is marked read-only for example, they will be rejected.

Not sure how to rephrase this, potentially it could but it's not a given.

```

::::{tip}
These {{ilm-init}} actions mark the source index as read-only or prevent writes for performance reasons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rephrase this because some actions do not fit this explanation. Maybe something along the lines:

The following actions influence the writable time range of a TSDS, either because they make a backing index read-only or remove it:

- [Force merge](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-forcemerge.md)
- [Read only](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-readonly.md)
- [Searchable snapshot](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-searchable-snapshot.md)
- [Shrink](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-shrink.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This action could revert the read-only status at the end of the action. Not sure if this is too much information here, but I thought to share it.

::::{tip}
These {{ilm-init}} actions mark the source index as read-only or prevent writes for performance reasons:
- [Delete](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-delete.md)
- [Downsample](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Move Downsample to the top, it's the most relevant here.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super nice.

@kkrik-es kkrik-es removed the request for review from felixbarny October 1, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants