Skip to content

Commit 6049cac

Browse files
authored
Merge pull request #4534 from ClickHouse/monitoring-guides
monitoring guides with snippets for cloud and self-managed
2 parents ceec789 + 56986c7 commit 6049cac

File tree

10 files changed

+2213
-108
lines changed

10 files changed

+2213
-108
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
## Community monitoring solutions {#community-monitoring}
2+
3+
The ClickHouse community has developed comprehensive monitoring solutions that integrate with popular observability stacks. [ClickHouse Monitoring](https://github.com/duyet/clickhouse-monitoring) provides a complete monitoring setup with pre-built dashboards. This open source project offers a quick-start approach for teams looking to implement ClickHouse monitoring with established best practices and proven dashboard configurations.
4+
5+
:::note
6+
Like other direct database monitoring approaches, this solution queries ClickHouse system tables directly, which prevents instances from idling and impacts cost optimization.
7+
:::
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
import Image from '@theme/IdealImage';
2+
import AdvancedDashboard from '@site/static/images/cloud/manage/monitoring/advanced_dashboard.png';
3+
import NativeAdvancedDashboard from '@site/static/images/cloud/manage/monitoring/native_advanced_dashboard.png';
4+
5+
### Direct Grafana plugin integration {#direct-grafana}
6+
7+
The ClickHouse data source plugin for Grafana enables visualization and exploration of data directly from ClickHouse using system tables. This approach works well for monitoring performance and creating custom dashboards for detailed system analysis.
8+
For plugin installation and configuration details, see the ClickHouse [data source plugin](/integrations/grafana). For a complete monitoring setup using the Prometheus-Grafana mix-in with pre-built dashboards and alerting rules, see [Monitor ClickHouse with the new Prometheus-Grafana mix-in](https://clickhouse.com/blog/monitor-with-new-prometheus-grafana-mix-in).
9+
10+
### Direct Datadog Integration {#direct-datadog}
11+
12+
Datadog offers a Clickhouse Monitoring plugin for its agent which queries system tables directly. This integration provides comprehensive database monitoring with cluster awareness through clusterAllReplicas functionality.
13+
:::note
14+
This integration is not recommended for ClickHouse Cloud deployments due to incompatibility with cost-optimizing idle behavior and operational limitations of the cloud proxy layer.
15+
:::
16+
17+
### Using system tables directly {#system-tables}
18+
19+
Users can perform deep query performance analysis by connecting to ClickHouse system tables, particularly `system.query_log` and querying directly. Using either the SQL console or clickhouse client, teams can identify slow queries, analyze resource usage, and track usage patterns across the organization.
20+
21+
**Query Performance Analysis**
22+
23+
Users can use the system table query logs to perform Query Performance Analysis.
24+
25+
**Example query**: Find the top 5 long-running queries across all cluster replicas:
26+
27+
```sql
28+
SELECT
29+
type,
30+
event_time,
31+
query_duration_ms,
32+
query,
33+
read_rows,
34+
tables
35+
FROM clusterAllReplicas(default, system.query_log)
36+
WHERE event_time >= (now() - toIntervalMinute(60)) AND type='QueryFinish'
37+
ORDER BY query_duration_ms DESC
38+
LIMIT 5
39+
FORMAT VERTICAL
40+
```
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import Image from '@theme/IdealImage';
2+
import AdvancedDashboard from '@site/static/images/cloud/manage/monitoring/advanced_dashboard.png';
3+
import NativeAdvancedDashboard from '@site/static/images/cloud/manage/monitoring/native_advanced_dashboard.png';
4+
5+
## Integration examples {#examples}
6+
7+
External integration allows organizations to maintain established monitoring workflows, leverage existing team expertise with familiar tools, and integrate ClickHouse monitoring with broader infrastructure observability without disrupting current processes or requiring significant retraining investments.
8+
Teams can apply existing alerting rules and escalation procedures to ClickHouse metrics, while correlating database performance with application and infrastructure health within a unified observability platform. This approach maximizes ROI on current monitoring setups and enables faster troubleshooting through consolidated dashboards and familiar tooling interfaces.
9+
10+
### Grafana Cloud monitoring {#grafana}
11+
12+
Grafana provides ClickHouse monitoring through both direct plugin integration and Prometheus-based approaches. The Prometheus endpoint integration maintains operational separation between monitoring and production workloads while enabling visualization within existing Grafana Cloud infrastructure. See [Grafana's ClickHouse documentation](https://grafana.com/docs/grafana-cloud/monitor-infrastructure/integrations/integration-reference/integration-clickhouse/) for configuration guidance.
13+
14+
### Datadog monitoring {#datadog}
15+
Datadog is developing a dedicated API integration that will provide proper cloud service monitoring while respecting service idling behavior. In the interim, teams can use the OpenMetrics integration approach via ClickHouse Prometheus endpoints for operational separation and cost-efficient monitoring. For configuration guidance, see [Datadog's Prometheus and OpenMetrics integration documentation](https://docs.datadoghq.com/integrations/openmetrics/).
16+
17+
### ClickStack {#clickstack}
18+
19+
ClickStack is ClickHouse's recommended observability solution for deep system analysis and debugging, providing a unified platform for logs, metrics, and traces using ClickHouse as the storage engine. This approach relies on HyperDX, the ClickStack UI, connecting directly to the system tables inside your ClickHouse instance.
20+
HyperDX ships with a ClickHouse focused dashboard with tabs for Selects, Inserts, and Infrastructure. Teams can also use Lucene or SQL syntax to search system tables and logs, as well as create custom visualizations via Chart Explorer for detailed system analysis.
21+
This approach is ideal for debugging complex issues, performance analysis, and deep system introspection rather than real-time production alerting.
22+
23+
:::note
24+
Note that this approach will wake idle services as HyperDX queries the system tables directly.
25+
:::

docs/about-us/beta-and-experimental-features.md

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,34 @@ Please note: no additional experimental features are allowed to be enabled in Cl
8383

8484
| Name | Default |
8585
|------|--------|
86+
| [allow_experimental_replacing_merge_with_cleanup](/operations/settings/merge-tree-settings#allow_experimental_replacing_merge_with_cleanup) | `0` |
87+
| [allow_experimental_reverse_key](/operations/settings/merge-tree-settings#allow_experimental_reverse_key) | `0` |
88+
| [allow_remote_fs_zero_copy_replication](/operations/settings/merge-tree-settings#allow_remote_fs_zero_copy_replication) | `0` |
89+
| [enable_replacing_merge_with_cleanup_for_min_age_to_force_merge](/operations/settings/merge-tree-settings#enable_replacing_merge_with_cleanup_for_min_age_to_force_merge) | `0` |
90+
| [force_read_through_cache_for_merges](/operations/settings/merge-tree-settings#force_read_through_cache_for_merges) | `0` |
91+
| [merge_selector_algorithm](/operations/settings/merge-tree-settings#merge_selector_algorithm) | `Simple` |
92+
| [notify_newest_block_number](/operations/settings/merge-tree-settings#notify_newest_block_number) | `0` |
93+
| [part_moves_between_shards_delay_seconds](/operations/settings/merge-tree-settings#part_moves_between_shards_delay_seconds) | `30` |
94+
| [part_moves_between_shards_enable](/operations/settings/merge-tree-settings#part_moves_between_shards_enable) | `0` |
95+
| [remote_fs_zero_copy_path_compatible_mode](/operations/settings/merge-tree-settings#remote_fs_zero_copy_path_compatible_mode) | `0` |
96+
| [remote_fs_zero_copy_zookeeper_path](/operations/settings/merge-tree-settings#remote_fs_zero_copy_zookeeper_path) | `/clickhouse/zero_copy` |
97+
| [remove_rolled_back_parts_immediately](/operations/settings/merge-tree-settings#remove_rolled_back_parts_immediately) | `1` |
98+
| [shared_merge_tree_activate_coordinated_merges_tasks](/operations/settings/merge-tree-settings#shared_merge_tree_activate_coordinated_merges_tasks) | `0` |
99+
| [shared_merge_tree_enable_coordinated_merges](/operations/settings/merge-tree-settings#shared_merge_tree_enable_coordinated_merges) | `0` |
100+
| [shared_merge_tree_enable_keeper_parts_extra_data](/operations/settings/merge-tree-settings#shared_merge_tree_enable_keeper_parts_extra_data) | `0` |
101+
| [shared_merge_tree_merge_coordinator_election_check_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_election_check_period_ms) | `30000` |
102+
| [shared_merge_tree_merge_coordinator_factor](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_factor) | `1.1` |
103+
| [shared_merge_tree_merge_coordinator_fetch_fresh_metadata_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_fetch_fresh_metadata_period_ms) | `10000` |
104+
| [shared_merge_tree_merge_coordinator_max_merge_request_size](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_max_merge_request_size) | `20` |
105+
| [shared_merge_tree_merge_coordinator_max_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_max_period_ms) | `10000` |
106+
| [shared_merge_tree_merge_coordinator_merges_prepare_count](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_merges_prepare_count) | `100` |
107+
| [shared_merge_tree_merge_coordinator_min_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_min_period_ms) | `1` |
108+
| [shared_merge_tree_merge_worker_fast_timeout_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_worker_fast_timeout_ms) | `100` |
109+
| [shared_merge_tree_merge_worker_regular_timeout_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_worker_regular_timeout_ms) | `10000` |
110+
| [shared_merge_tree_virtual_parts_discovery_batch](/operations/settings/merge-tree-settings#shared_merge_tree_virtual_parts_discovery_batch) | `1` |
86111
| [allow_experimental_time_time64_type](/operations/settings/settings#allow_experimental_time_time64_type) | `0` |
87112
| [allow_experimental_kafka_offsets_storage_in_keeper](/operations/settings/settings#allow_experimental_kafka_offsets_storage_in_keeper) | `0` |
113+
| [allow_experimental_delta_lake_writes](/operations/settings/settings#allow_experimental_delta_lake_writes) | `0` |
88114
| [allow_experimental_materialized_postgresql_table](/operations/settings/settings#allow_experimental_materialized_postgresql_table) | `0` |
89115
| [allow_experimental_funnel_functions](/operations/settings/settings#allow_experimental_funnel_functions) | `0` |
90116
| [allow_experimental_nlp_functions](/operations/settings/settings#allow_experimental_nlp_functions) | `0` |
@@ -112,6 +138,7 @@ Please note: no additional experimental features are allowed to be enabled in Cl
112138
| [wait_for_window_view_fire_signal_timeout](/operations/settings/settings#wait_for_window_view_fire_signal_timeout) | `10` |
113139
| [stop_refreshable_materialized_views_on_startup](/operations/settings/settings#stop_refreshable_materialized_views_on_startup) | `0` |
114140
| [allow_experimental_database_materialized_postgresql](/operations/settings/settings#allow_experimental_database_materialized_postgresql) | `0` |
141+
| [allow_experimental_qbit_type](/operations/settings/settings#allow_experimental_qbit_type) | `0` |
115142
| [allow_experimental_query_deduplication](/operations/settings/settings#allow_experimental_query_deduplication) | `0` |
116143
| [allow_experimental_database_hms_catalog](/operations/settings/settings#allow_experimental_database_hms_catalog) | `0` |
117144
| [allow_experimental_kusto_dialect](/operations/settings/settings#allow_experimental_kusto_dialect) | `0` |
@@ -131,32 +158,12 @@ Please note: no additional experimental features are allowed to be enabled in Cl
131158
| [allow_experimental_ytsaurus_table_function](/operations/settings/settings#allow_experimental_ytsaurus_table_function) | `0` |
132159
| [allow_experimental_ytsaurus_dictionary_source](/operations/settings/settings#allow_experimental_ytsaurus_dictionary_source) | `0` |
133160
| [distributed_plan_force_shuffle_aggregation](/operations/settings/settings#distributed_plan_force_shuffle_aggregation) | `0` |
161+
| [enable_join_runtime_filters](/operations/settings/settings#enable_join_runtime_filters) | `0` |
162+
| [join_runtime_bloom_filter_bytes](/operations/settings/settings#join_runtime_bloom_filter_bytes) | `524288` |
163+
| [join_runtime_bloom_filter_hash_functions](/operations/settings/settings#join_runtime_bloom_filter_hash_functions) | `3` |
164+
| [rewrite_in_to_join](/operations/settings/settings#rewrite_in_to_join) | `0` |
134165
| [allow_experimental_time_series_aggregate_functions](/operations/settings/settings#allow_experimental_time_series_aggregate_functions) | `0` |
135166
| [promql_database](/operations/settings/settings#promql_database) | `` |
136167
| [promql_table](/operations/settings/settings#promql_table) | `` |
137-
| [evaluation_time](/operations/settings/settings#evaluation_time) | `auto` |
138-
| [allow_experimental_replacing_merge_with_cleanup](/operations/settings/merge-tree-settings#allow_experimental_replacing_merge_with_cleanup) | `0` |
139-
| [allow_experimental_reverse_key](/operations/settings/merge-tree-settings#allow_experimental_reverse_key) | `0` |
140-
| [allow_remote_fs_zero_copy_replication](/operations/settings/merge-tree-settings#allow_remote_fs_zero_copy_replication) | `0` |
141-
| [enable_replacing_merge_with_cleanup_for_min_age_to_force_merge](/operations/settings/merge-tree-settings#enable_replacing_merge_with_cleanup_for_min_age_to_force_merge) | `0` |
142-
| [force_read_through_cache_for_merges](/operations/settings/merge-tree-settings#force_read_through_cache_for_merges) | `0` |
143-
| [merge_selector_algorithm](/operations/settings/merge-tree-settings#merge_selector_algorithm) | `Simple` |
144-
| [notify_newest_block_number](/operations/settings/merge-tree-settings#notify_newest_block_number) | `0` |
145-
| [part_moves_between_shards_delay_seconds](/operations/settings/merge-tree-settings#part_moves_between_shards_delay_seconds) | `30` |
146-
| [part_moves_between_shards_enable](/operations/settings/merge-tree-settings#part_moves_between_shards_enable) | `0` |
147-
| [remote_fs_zero_copy_path_compatible_mode](/operations/settings/merge-tree-settings#remote_fs_zero_copy_path_compatible_mode) | `0` |
148-
| [remote_fs_zero_copy_zookeeper_path](/operations/settings/merge-tree-settings#remote_fs_zero_copy_zookeeper_path) | `/clickhouse/zero_copy` |
149-
| [remove_rolled_back_parts_immediately](/operations/settings/merge-tree-settings#remove_rolled_back_parts_immediately) | `1` |
150-
| [shared_merge_tree_enable_coordinated_merges](/operations/settings/merge-tree-settings#shared_merge_tree_enable_coordinated_merges) | `0` |
151-
| [shared_merge_tree_enable_keeper_parts_extra_data](/operations/settings/merge-tree-settings#shared_merge_tree_enable_keeper_parts_extra_data) | `0` |
152-
| [shared_merge_tree_merge_coordinator_election_check_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_election_check_period_ms) | `30000` |
153-
| [shared_merge_tree_merge_coordinator_factor](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_factor) | `2` |
154-
| [shared_merge_tree_merge_coordinator_fetch_fresh_metadata_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_fetch_fresh_metadata_period_ms) | `10000` |
155-
| [shared_merge_tree_merge_coordinator_max_merge_request_size](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_max_merge_request_size) | `20` |
156-
| [shared_merge_tree_merge_coordinator_max_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_max_period_ms) | `10000` |
157-
| [shared_merge_tree_merge_coordinator_merges_prepare_count](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_merges_prepare_count) | `100` |
158-
| [shared_merge_tree_merge_coordinator_min_period_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_coordinator_min_period_ms) | `1` |
159-
| [shared_merge_tree_merge_worker_fast_timeout_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_worker_fast_timeout_ms) | `100` |
160-
| [shared_merge_tree_merge_worker_regular_timeout_ms](/operations/settings/merge-tree-settings#shared_merge_tree_merge_worker_regular_timeout_ms) | `10000` |
161-
| [shared_merge_tree_virtual_parts_discovery_batch](/operations/settings/merge-tree-settings#shared_merge_tree_virtual_parts_discovery_batch) | `1` |
168+
| [promql_evaluation_time](/operations/settings/settings#promql_evaluation_time) | `auto` |
162169
<!--AUTOGENERATED_END-->

docs/cloud/onboard/03_tune/resource_tour.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,11 @@ using ClickHouse:
3434

3535
## Monitoring {#monitoring}
3636

37-
| Page | Description |
38-
|-----------------------------------------------------------------|-------------------------------------------------------------------------------|
39-
| [Advanced dashboard](/cloud/manage/monitor/advanced-dashboard) | Use the built in advanced dashboard to monitor service health and performance |
40-
| [Prometheus integration](/integrations/prometheus) | Use Prometheus to monitor Cloud services |
37+
| Page | Description |
38+
|----------------------------------------------------------------------------|-------------------------------------------------------------------------------|
39+
| [Advanced dashboard](/cloud/manage/monitor/advanced-dashboard) | Use the built in advanced dashboard to monitor service health and performance |
40+
| [Prometheus integration](/integrations/prometheus) | Use Prometheus to monitor Cloud services |
41+
| [Cloud Monitoring Capabilities](/use-cases/observability/cloud-monitoring) | Get an overview of built in monitoring capabilities and integration options |
4142

4243
## Security {#security}
4344

0 commit comments

Comments
 (0)