Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/_snippets/_community_monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## Community monitoring solutions {#community-monitoring}

The ClickHouse community has developed comprehensive monitoring solutions that integrate with popular observability stacks. [ClickHouse Monitoring](https://github.com/duyet/clickhouse-monitoring) provides a complete monitoring setup with pre-built dashboards. This open source project offers a quick-start approach for teams looking to implement ClickHouse monitoring with established best practices and proven dashboard configurations.

:::note
Like other direct database monitoring approaches, this solution queries ClickHouse system tables directly, which prevents instances from idling and impacts cost optimization.
:::
40 changes: 40 additions & 0 deletions docs/_snippets/_direct_observability_integration_options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import Image from '@theme/IdealImage';
import AdvancedDashboard from '@site/static/images/cloud/manage/monitoring/advanced_dashboard.png';
import NativeAdvancedDashboard from '@site/static/images/cloud/manage/monitoring/native_advanced_dashboard.png';

### Direct Grafana plugin integration {#direct-grafana}

The ClickHouse data source plugin for Grafana enables visualization and exploration of data directly from ClickHouse using system tables. This approach works well for monitoring performance and creating custom dashboards for detailed system analysis.
For plugin installation and configuration details, see the ClickHouse data source plugin. For a complete monitoring setup using the Prometheus-Grafana mix-in with pre-built dashboards and alerting rules, see Monitor ClickHouse with the new Prometheus-Grafana mix-in.

### Direct Datadog Integration {#direct-datadog}

Datadog offers a Clickhouse Monitoring plugin for its agent which queries system tables directly. This integration provides comprehensive database monitoring with cluster awareness through clusterAllReplicas functionality.
:::note
This integration is not recommended for ClickHouse Cloud deployments due to incompatibility with cost-optimizing idle behavior and operational limitations of the cloud proxy layer.
:::

### Using system tables directly {#system-tables}

Users can perform deep query performance analysis by connecting to ClickHouse system tables, particularly `system.query_log` and querying directly. Using either the SQL console or clickhouse client, teams can identify slow queries, analyze resource usage, and track usage patterns across the organization.

**Query Performance Analysis**

Users can use the system table query logs to perform Query Performance Analysis.

**Example query**: Find the top 5 long-running queries across all cluster replicas:

```sql
SELECT
type,
event_time,
query_duration_ms,
query,
read_rows,
tables
FROM clusterAllReplicas(default, system.query_log)
WHERE event_time >= (now() - toIntervalMinute(60)) AND type='QueryFinish'
ORDER BY query_duration_ms DESC
LIMIT 5
FORMAT VERTICAL
```
24 changes: 24 additions & 0 deletions docs/_snippets/_observability_integration_options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import Image from '@theme/IdealImage';
import AdvancedDashboard from '@site/static/images/cloud/manage/monitoring/advanced_dashboard.png';
import NativeAdvancedDashboard from '@site/static/images/cloud/manage/monitoring/native_advanced_dashboard.png';

## Integration examples {#examples}

External integration allows organizations to maintain established monitoring workflows, leverage existing team expertise with familiar tools, and integrate ClickHouse monitoring with broader infrastructure observability without disrupting current processes or requiring significant retraining investments.
Teams can apply existing alerting rules and escalation procedures to ClickHouse metrics, while correlating database performance with application and infrastructure health within a unified observability platform. This approach maximizes ROI on current monitoring setups and enables faster troubleshooting through consolidated dashboards and familiar tooling interfaces.

### Grafana Cloud monitoring {#grafana}

Grafana provides ClickHouse monitoring through both direct plugin integration and Prometheus-based approaches. The Prometheus endpoint integration maintains operational separation between monitoring and production workloads while enabling visualization within existing Grafana Cloud infrastructure. See Grafana's ClickHouse documentation for configuration guidance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Grafana provides ClickHouse monitoring through both direct plugin integration and Prometheus-based approaches. The Prometheus endpoint integration maintains operational separation between monitoring and production workloads while enabling visualization within existing Grafana Cloud infrastructure. See Grafana's ClickHouse documentation for configuration guidance.
Grafana provides ClickHouse monitoring through both direct plugin integration and Prometheus-based approaches. The Prometheus endpoint integration maintains operational separation between monitoring and production workloads while enabling visualization within existing Grafana Cloud infrastructure. See Grafana's ClickHouse documentation for configuration guidance.

"See Grafana's ClickHouse documentation for configuration guidance." -> Please add a link

Datadog monitoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a markdown heading?

Datadog is developing a dedicated API integration that will provide proper cloud service monitoring while respecting service idling behavior. In the interim, teams can use the OpenMetrics integration approach via ClickHouse Prometheus endpoints for operational separation and cost-efficient monitoring. For configuration guidance, see Datadog's Prometheus and OpenMetrics integration documentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"see Datadog's Prometheus and OpenMetrics integration documentation." -> please link


### ClickStack {#clickstack}

ClickStack is ClickHouse's recommended observability solution for deep system analysis and debugging, providing a unified platform for logs, metrics, and traces using ClickHouse as the storage engine. This approach relies on HyperDX, the ClickStack UI, connecting directly to the system tables inside your ClickHouse instance.
HyperDX ships with a ClickHouse focused dashboard with tabs for Selects, Inserts, and Infrastructure. Teams can also use Lucene or SQL syntax to search system tables and logs, as well as create custom visualizations via Chart Explorer for detailed system analysis.
This approach is ideal for debugging complex issues, performance analysis, and deep system introspection rather than real-time production alerting.

:::note
Note that this approach will wake idle services as HyperDX queries the system tables directly.
:::
94 changes: 94 additions & 0 deletions docs/use-cases/observability/cloud-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
slug: /use-cases/observability/cloud-monitoring
title: 'ClickHouse Cloud Monitoring'
description: 'ClickHouse Cloud Monitoring Guide'
doc_type: 'guide'
---

import AdvancedDashboard from '@site/static/images/cloud/manage/monitoring/advanced_dashboard.png';
import NativeAdvancedDashboard from '@site/static/images/cloud/manage/monitoring/native_advanced_dashboard.png';
import Image from '@theme/IdealImage';
import ObservabilityIntegrations from '@site/docs/_snippets/_observability_integration_options.md';
import DirectIntegrations from '@site/docs/_snippets/_direct_observability_integration_options.md';
import CommunityMonitoring from '@site/docs/_snippets/_community_monitoring.md';

# ClickHouse Cloud monitoring {#cloud-monitoring}

This guide provides enterprise teams evaluating ClickHouse Cloud with comprehensive information on monitoring and observability capabilities for production deployments. Enterprise customers frequently ask about out-of-the-box monitoring features, integration with existing observability stacks including tools like Datadog and AWS CloudWatch, and how ClickHouse’s monitoring compares to self-hosted deployments.

## Advanced observability dashboard {#advanced-observability}

ClickHouse Cloud provides comprehensive monitoring through built-in dashboard interfaces accessible via the Monitoring section. These dashboards visualize system and performance metrics in real-time without requiring additional setup and serve as the primary tools for real-time production monitoring within ClickHouse Cloud.

- **Advanced Dashboard**: The main dashboard interface accessible via Monitoring → Advanced dashboard provides real-time visibility into query rates, resource usage, system health, and storage performance. This dashboard doesn't require separate authentication, won't prevent instances from idling, and doesn't add query load to your production system. Each visualization is powered by customizable SQL queries, with out-of-the-box charts grouped into ClickHouse-specific, system health, and Cloud-specific metrics. Users can extend monitoring by creating custom queries directly in the SQL console.

:::note
Accessing these metrics does not issue a query to the underlying service and will not wake idle services.
:::

<Image img={AdvancedDashboard} size="lg" alt="Advanced dashboard"/>

Users looking to extend these visualizations can use the dashboards feature in ClickHouse Cloud, querying system tables directly.

- **Native advanced dashboard**: An alternative dashboard interface accessible through "You can still access the native advanced dashboard" within the Monitoring section. This opens in a separate tab with authentication and provides an alternative UI for system and service health monitoring. This dashboard allows advanced analytics, where users can modify the underlying SQL queries.

<Image img={NativeAdvancedDashboard} size="lg" alt="Advanced dashboard"/>

Both dashboards offer immediate visibility into service health and performance without external dependencies, distinguishing them from external debugging-focused tools like ClickStack.

For detailed dashboard features and available metrics, see the [advanced dashboard documentation](/cloud/manage/monitor/advanced-dashboard).

## Query insights and resource monitoring {#query-insights}

ClickHouse Cloud includes additional monitoring capabilities:

- Query Insights: Built-in interface for query performance analysis and troubleshooting
- Resource Utilization Dashboard: Tracks memory, CPU allocation, and data transfer patterns

See the [query insights](/cloud/get-started/query-insights) and [resource utilization](/operations/monitoring#resource-utilization) documentation for detailed features.

## Prometheus-compatible metrics endpoint {#prometheus}

ClickHouse Cloud provides a Prometheus endpoint. This allows users to maintain current workflows, leverage existing team expertise, and integrate ClickHouse metrics into enterprise monitoring platforms including Grafana, Datadog, and other Prometheus-compatible tools.

The organization-level endpoint federates metrics from all services, while per-service endpoints provide granular monitoring. Key features include:
- Filtered metrics option: The optional filtered_metrics=true parameter reduces payload from 1000+ available metrics to 125 'mission critical' metrics for cost optimization and easier monitoring focus
- Cached metric delivery: Uses materialized views refreshed every minute to minimize query load on production systems

:::note
This approach respects service idling behavior, allowing for cost optimization when services are not actively processing queries. This API endpoint relies on ClickHouse Cloud API credentials. For complete endpoint configuration details, see the cloud [Prometheus documentation](/integrations/prometheus).
:::

<ObservabilityIntegrations/>

### ClickStack {#clickstack}

ClickStack is ClickHouse's recommended observability solution for deep system analysis and debugging, providing a unified platform for logs, metrics, and traces using ClickHouse as the storage engine. This approach relies on HyperDX, the ClickStack UI, connecting directly to the system tables inside your ClickHouse instance.
HyperDX ships with a ClickHouse focused dashboard with tabs for Selects, Inserts, and Infrastructure. Teams can also use Lucene or SQL syntax to search system tables and logs, as well as create custom visualizations via Chart Explorer for detailed system analysis.
This approach is ideal for debugging complex issues, performance analysis, and deep system introspection rather than real-time production alerting.

:::note
Note that this approach will wake idle services as HyperDX queries the system tables directly.
:::

### ClickStack deployment options {#clickstack-deployment}

- **HyperDX in Clickhouse Cloud** (private preview): HyperDX can be launched on any Clickhouse Cloud service.
- [Helm](/use-cases/observability/clickstack/deployment/helm): Recommended for Kubernetes-based debugging environments. Supports integration with ClickHouse Cloud and allows for environment-specific configuration, resource limits, and scaling via `values.yaml`.
- [Docker Compose](/use-cases/observability/clickstack/deployment/docker-compose): Deploys each component (ClickHouse, HyperDX, OTel collector, MongoDB) individually. Users can modify the compose file to remove any unused components when integrating with ClickHouse Cloud, specifically ClickHouse and the Open Telemetry Collector.
- [HyperDX Only](/use-cases/observability/clickstack/deployment/hyperdx-only): Standalone HyperDX container.

For complete deployment options and architecture details, see the [ClickStack documentation](/use-cases/observability/clickstack/overview) and [data ingestion guide](/use-cases/observability/clickstack/ingesting-data/overview).

:::note
Users can also collect metrics from the ClickHouse Cloud Prometheus endpoint via an OpenTelemetry Collector and forward them to a separate ClickStack deployment for visualization.
:::

<DirectIntegrations/>

<CommunityMonitoring/>

## System impact considerations {#system-impact}

All of the above approaches use a mixture of either relying on Prometheus endpoints, being managed by ClickHouse Cloud, or querying of system tables directly.
The latter of these options relies on querying the production ClickHouse service. This adds query load to the system under observation and prevents ClickHouse Cloud instances from idling, impacting cost optimization. Additionally, if the production system fails, monitoring may also be affected, since the two are coupled. This approach works well for deep introspection and debugging but is less appropriate for real-time production monitoring. Consider these trade-offs between detailed system analysis capabilities and operational overhead when evaluating direct Grafana integration versus the external tool integration approaches discussed in the following section.
41 changes: 41 additions & 0 deletions docs/use-cases/observability/self-managed-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
slug: /use-cases/observability/oss-monitoring
title: 'Self-Managed Monitoring'
description: 'Self-Managed Monitoring Guide'
doc_type: 'guide'
---

import ObservabilityIntegrations from '@site/docs/_snippets/_observability_integration_options.md';
import DirectIntegrations from '@site/docs/_snippets/_direct_observability_integration_options.md';
import CommunityMonitoring from '@site/docs/_snippets/_community_monitoring.md';

# Self-managed monitoring {#cloud-monitoring}

This guide provides enterprise teams evaluating ClickHouse open-source with comprehensive information on monitoring and observability capabilities for production deployments. Enterprise customers frequently ask about out-of-the-box monitoring features, integration with existing observability stacks including tools like Datadog and AWS CloudWatch, and how ClickHouse’ss monitoring compares to self-hosted deployments.

### Prometheus-based integration architecture {#prometheus}
ClickHouse exposes Prometheus-compatible metrics through different endpoints depending on your deployment model, each with distinct operational characteristics:

**Self-Managed/OSS ClickHouse**

Direct server Prometheus endpoint accessible via the standard /metrics endpoint on your ClickHouse server. This approach provides:
- Complete metric exposure: Full range of available ClickHouse metrics without built-in filtering
- Real-time metrics: Generated directly from system tables when scraped

**Direct system access**

Queries production system tables, which adds monitoring load and prevents cost-saving idle states

<ObservabilityIntegrations/>

### ClickStack deployment options {#clickstack-deployment}

- [Helm](/use-cases/observability/clickstack/deployment/helm): Recommended for Kubernetes-based debugging environments. Allows for environment-specific configuration, resource limits, and scaling via `values.yaml`.
- [Docker Compose](/use-cases/observability/clickstack/deployment/docker-compose): Deploys each component (ClickHouse, HyperDX, OTel collector, MongoDB) individually.
- [HyperDX Only](/use-cases/observability/clickstack/deployment/hyperdx-only): Standalone HyperDX container.

For complete deployment options and architecture details, see the [ClickStack documentation](/use-cases/observability/clickstack/overview) and [data ingestion guide](/use-cases/observability/clickstack/ingesting-data/overview).

<DirectIntegrations/>

<CommunityMonitoring/>
Loading