Skip to content

OpenTelemetry

Observability is consistently one of the top feature requests by customers. Valkey GLIDE 2.0 introduces support for OpenTelemetry (OTel), enabling developers to gain deep insights into client-side performance and behavior in distributed systems.

OTel is an open source, vendor-neutral framework that provides APIs, SDKs, and tools for generating, collecting, and exporting telemetry data—such as traces, metrics, and logs. It supports multiple programming languages and integrates with various observability backends like Prometheus, Jaeger, and AWS CloudWatch.

GLIDE’s OpenTelemetry integration is designed to be both powerful and easy to adopt. Once an OTel collector endpoint is configured, GLIDE begins emitting default metrics and traces automatically—no additional code changes are required. This simplifies the path to observability best practices and minimizes disruption to existing workflows.

GLIDE emits several built-in metrics out of the box. These metrics can be used to build dashboards, configure alerts, and monitor performance trends:

  • Timeouts: Number of requests that exceeded their timeout duration.
  • Retries: Count of operations retried due to transient errors or topology changes.
  • Moved Errors: Number of MOVED responses received, indicating key reallocation in the cluster.

These metrics are emitted to your configured OpenTelemetry collector and can be viewed in any supported backend (Prometheus, CloudWatch, etc.).

GLIDE creates a trace span for each Valkey command, giving detailed visibility into client-side performance. Each trace captures:

  • The entire command lifecycle: from creation to completion or failure.
  • A nested send_command span, measuring communication time with the Valkey server.
  • A status tag indicating success or error for each span, helping you identify failure patterns.

This distinction helps developers separate client-side queuing latency from server communication delays, making it easier to troubleshoot performance issues.

Even with these exceptions, GLIDE 2.0 provides comprehensive insights across the vast majority of standard operations, making it easy to adopt observability best practices with minimal effort.

To begin collecting telemetry data with GLIDE 2.0:

  • Set up an OpenTelemetry Collector to receive trace and metric data.
  • Configure the GLIDE client with the endpoint to your collector.
  • Alternatively, you can configure GLIDE to export telemetry data directly to a local file for development or debugging purposes, without requiring a running collector.

GLIDE does not export data directly to third-party services—instead, it sends data to your collector, which routes it to your backend (e.g., CloudWatch, Prometheus, Jaeger).

from glide import OpenTelemetry, OpenTelemetryConfig, OpenTelemetryTracesConfig, OpenTelemetryMetricsConfig
OpenTelemetry.init(OpenTelemetryConfig(
traces=OpenTelemetryTracesConfig(
endpoint="http://localhost:4318/v1/traces",
sample_percentage=10 # Optional, defaults to 1. Can also be changed at runtime via set_sample_percentage().
),
metrics=OpenTelemetryMetricsConfig(
endpoint="http://localhost:4318/v1/metrics"
),
flush_interval_ms=1000 # Optional, defaults to 5000
))

When initializing OpenTelemetry, you can customize behavior using the configuration object.

ConfigurationTypeRequiredDefaultDescription
traces.endpointStringYes (if traces enabled)-The trace collector endpoint URL. Supports http://, https://, grpc://, or file:// protocols.
traces.samplePercentageIntegerNo1Percentage (0–100) of commands to sample for tracing. For production, a low sampling rate (1–5%) is recommended to balance performance and insight. Can be changed at runtime.
metrics.endpointStringYes (if metrics enabled)-The metrics collector endpoint URL. Supports http://, https://, grpc://, or file:// protocols.
flushIntervalMsIntegerNo5000Time in milliseconds between flushes to the collector. Must be a positive integer.

You can configure the OTel collector endpoint using one of the following protocols:

  • http:// or https:// - Send data via HTTP(S)
  • grpc:// - Use gRPC for efficient telemetry transmission
  • file:// - Write telemetry data to a local file (ideal for local dev/debugging)

If using file:// as the endpoint:

  • The path must begin with file://.
  • If a directory is provided (or no file extension), data is written to signals.json in that directory.
  • If a filename is included, it will be used as-is.
  • The parent directory must already exist.
  • Data is appended, not overwritten.
  • Flush interval must be a positive integer.
  • Sample percentage must be between 0 and 100.
  • File exporter paths must start with file:// and have an existing parent directory.
  • Invalid configuration will throw an error synchronously when calling initialization.