Tracing Tools: Jaeger, Zipkin, Tempo, OpenTelemetry Collector
Introduction
Distributed tracing is essential for understanding request flows across microservices. When a single user request hits 10-50 services, traditional logging cannot show you the full picture. Tracing captures the causality chain: which service called which, how long each call took, and where failures occurred. This article covers Jaeger, Zipkin, Grafana Tempo, and the OpenTelemetry Collector.
OpenTelemetry Collector
The foundation for modern observability — receives, processes, and exports telemetry data:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
attributes:
actions:
- key: environment
value: production
action: insert
filter:
error_mode: ignore
traces:
span:
- 'attributes["http.method"] == "OPTIONS"'
# Sampling for cost control
probabilistic_sampler:
sampling_percentage: 10 # Only send 10% of traces
exporters:
otlp:
endpoint: jaeger:4317
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, attributes, filter, probabilistic_sampler]
exporters: [otlp, debug]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
# Run the collector
otelcol --config otel-collector-config.yaml
# Run as Docker
docker run -v $(pwd)/otel-collector-config.yaml:/etc/otel/config.yaml otel/opentelemetry-collector-contrib
**Key features**: Vendor-agnostic data collection, tail-based sampling, attribute enrichment, batch processing, multi-destination export, service graph computation.
Jaeger
Uber's distributed tracing system, now a CNCF graduated project:
# docker-compose.yml
services:
jaeger:
image: jaegertracing/all-in-one:latest
environment:
- COLLECTOR_OTLP_ENABLED=true
ports:
- "16686:16686" # UI
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
# Python instrumentation with OpenTelemetry
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
# Set up tracing
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(
endpoint="http://jaeger:4317",
insecure=True,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# Auto-instrument libraries
FlaskInstrumentor().instrument()
RequestsInstrumentor().instrument()
# Manual instrumentation
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
@app.route("/api/orders/
def get_order(order_id):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("order.value", 99.50)
with tracer.start_as_current_span("validate_cache") as child:
cached = cache.get(order_id)
child.set_attribute("cache.hit", cached is not None)
with tracer.start_as_current_span("query_database") as db_span:
order = db.query("SELECT * FROM orders WHERE id = ?", order_id)
db_span.set_attribute("db.rows", 1)
return order
**Key features**: Rich UI with trace search and filtering, service dependency graph, deep span detail view, comparison view for similar traces, OTLP native support.
Tempo (Grafana Tempo)
Grafana's tracing backend with object storage for cost-effective retention:
# tempo-config.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
ingester:
trace_idle_period: 10s
max_block_duration: 5m
storage:
trace:
backend: s3
s3:
bucket: grafana-tempo-data
endpoint: s3.us-east-1.amazonaws.com
access_key: ${AWS_ACCESS_KEY_ID}
secret_key: ${AWS_SECRET_ACCESS_KEY}
pool:
max_workers: 100
queue_depth: 10000
compactor:
compaction:
block_retention: 336h # 14 days
querier:
search:
max_duration: 168h # 7 days of searchable data
# Run Tempo
docker run -v $(pwd)/tempo-config.yaml:/etc/tempo.yaml grafana/tempo:latest
# Query via Grafana
# Grafana datasource: Tempo
# TraceQL query:
# { resource.service.name = "payment-service" && span.http.status_code >= 500 }
**Key features**: Object storage backend (S3, GCS, Azure) for low-cost long retention, TraceQL query language, seamless Grafana integration, high scalability.
Zipkin
Twitter's distributed tracing system (original inspiration for OpenTracing):
# docker-compose.yml
services:
zipkin:
image: openzipkin/zipkin:latest
ports:
- "9411:9411"
# Zipkin with OpenTelemetry
from opentelemetry.exporter.zipkin.json import ZipkinExporter
zipkin_exporter = ZipkinExporter(
endpoint="http://zipkin:9411/api/v2/spans",
)
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(zipkin_exporter))
trace.set_tracer_provider(provider)
Comparison
| Feature | Jaeger | Zipkin | Tempo | OTel Collector |
|---------|--------|--------|-------|----------------|
| Storage | Cassandra, ES, Badger | Cassandra, ES, in-memory | S3, GCS, Azure | N/A (pass-through) |
| UI | Standalone | Standalone | Grafana | None |
| Query language | Tags/JSON | Tags | TraceQL | N/A |
| Scalability | High | Medium | Very high | High |
| Sampling | Head, tail | Head | Head | Head, tail |
| Cost at scale | Medium | Medium | Low (S3) | N/A |
Recommendations
* **Best all-around**: Jaeger with OTLP ingestion. Rich UI, good scalability, active community.
* **Grafana ecosystem**: Tempo for seamless integration with Grafana dashboards and Loki logs. TraceQL is powerful.
* **Minimal setup**: Zipkin for quick local development tracing.
* **Data pipeline**: OpenTelemetry Collector as the central hub for receiving and routing all telemetry.
The OpenTelemetry Collector should be the first component in any tracing infrastructure. It receives traces from instrumented services, applies sampling and enrichment, and forwards to the backend of your choice (Jaeger, Tempo, or both). This decouples instrumentation from storage decisions.