Introduction
Service discovery is the mechanism by which microservices locate each other on a network. In dynamic environments where containers come and go, IP addresses and ports cannot be hard-coded. A robust discovery layer is essential for resilient, scalable microservice communication. This article covers the core patterns, tools, and best practices for implementing service discovery in production.
Client-Side vs Server-Side Discovery
Client-Side Discovery
The client queries a service registry directly and handles load balancing. This pattern is lightweight but requires service-specific client logic:
import requests
from consul import Consul
class ServiceClient:
def __init__(self):
self.consul = Consul(host="consul.service.consul")
def get_service_url(self, service_name: str) -> str:
_, services = self.consul.catalog.service(service_name)
if not services:
raise Exception(f"Service {service_name} not found")
# Pick a healthy instance
instance = services[0]
return f"http://{instance['ServiceAddress']}:{instance['ServicePort']}"
def call_user_service(self, user_id: str):
base_url = self.get_service_url("user-service")
resp = requests.get(f"{base_url}/users/{user_id}")
return resp.json()
Server-Side Discovery
A load balancer or gateway handles discovery transparently. Clients only know the gateway address:
# Kubernetes: Service handles DNS-based discovery
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector:
app: user-service
ports:
- protocol: TCP
port: 80
targetPort: 8080
---
apiVersion: v1
kind: EndpointSlice
metadata:
labels:
kubernetes.io/service-name: user-service
addressType: IPv4
endpoints:
- addresses: ["10.0.1.5"]
conditions:
ready: true
- addresses: ["10.0.1.6"]
conditions:
ready: true
ports:
- name: http
protocol: TCP
port: 8080
Consul
HashiCorp Consul provides service registration, health checking, and a distributed key-value store:
# service registration configuration
service {
name = "payment-service"
id = "payment-service-v1"
port = 9090
tags = ["v1", "production", "critical"]
check {
id = "payment-health"
name = "Payment Service Health"
http = "http://localhost:9090/health"
method = "GET"
interval = "10s"
timeout = "2s"
deregister_critical_service_after = "5m"
}
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "order-service"
local_bind_port = 8080
}
}
}
}
}
Programmatic registration via the API:
package main
import (
"github.com/hashicorp/consul/api"
)
func registerService() {
client, _ := api.NewClient(api.DefaultConfig())
registration := &api.AgentServiceRegistration{
ID: "order-svc-1",
Name: "order-service",
Port: 8080,
Check: &api.AgentServiceCheck{
HTTP: "http://localhost:8080/healthz",
Interval: "10s",
DeregisterCriticalServiceAfter: "3m",
},
}
client.Agent().ServiceRegister(registration)
}
etcd
etcd offers a strongly consistent key-value store often used for service discovery in Kubernetes (it powers Kubernetes itself):
package main
import (
"context"
"clientv3" "go.etcd.io/etcd/client/v3"
"time"
)
func registerWithLease() {
cli, _ := clientv3.New(clientv3.Config{
Endpoints: []string{"localhost:2379"},
DialTimeout: 5 * time.Second,
})
lease, _ := cli.Grant(context.Background(), 10) // 10-second TTL
key := "/services/payment-service/instance-1"
value := `{"address": "10.0.1.10", "port": 9090}`
cli.Put(context.Background(), key, value,
clientv3.WithLease(lease.ID))
// Keep alive
ch, _ := cli.KeepAlive(context.Background(), lease.ID)
go func() {
for range ch {
// Lease refreshed
}
}()
}
func discoverService(name string) []string {
cli, _ := clientv3.New(clientv3.Config{
Endpoints: []string{"localhost:2379"},
})
resp, _ := cli.Get(context.Background(),
"/services/"+name, clientv3.WithPrefix())
var instances []string
for _, kv := range resp.Kvs {
instances = append(instances, string(kv.Value))
}
return instances
}
Health Checking Strategies
Effective health checks prevent routing traffic to unhealthy instances:
# Kubernetes: multi-probe health checking
apiVersion: v1
kind: Pod
metadata:
name: web-app
spec:
containers:
- name: app
image: web-app:latest
livenessProbe: # Restart if fails
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
readinessProbe: # Remove from service if fails
httpGet:
path: /ready
port: 8080
periodSeconds: 2
failureThreshold: 1
startupProbe: # Delay other probes
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 30
Consul gRPC checks for streaming services:
check {
id = "grpc-health"
name = "gRPC Health Check"
grpc = "localhost:50051"
grpc_use_tls = true
interval = "15s"
timeout = "3s"
notes = "Uses gRPC health checking protocol"
}
Blue-Green Deployments with Discovery
Service discovery enables seamless traffic switching during blue-green deployments:
# Consul: traffic splitting via service resolver
kind = "service-resolver"
name = "web-service"
subsets = {
blue = {
filter = "Service.Meta.version == blue"
}
green = {
filter = "Service.Meta.version == green"
}
}
default_subset = "blue"
Switch traffic atomically:
# Switch from blue to green
consul config write - <<EOF
kind = "service-resolver"
name = "web-service"
default_subset = "green"
EOF
Registry Patterns
Choose your registration approach based on operational maturity:
**Self-Registration**: Services register themselves on startup and deregister on shutdown. Simplest but requires service frameworks to implement registration logic.
**Third-Party Registration**: An external process (like Kubernetes watchers or Kubernetes itself) monitors instances and updates the registry. More resilient but adds operational complexity.
# Kubernetes: third-party via Endpoint Controller
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-v2
spec:
replicas: 3
selector:
matchLabels:
app: my-app
version: v2
template:
metadata:
labels:
app: my-app
version: v2
spec:
containers:
- name: app
image: my-app:v2
readinessProbe:
httpGet:
path: /ready
port: 8080
Kubernetes DNS-based discovery (`my-svc.namespace.svc.cluster.local`) remains the simplest approach for cloud-native workloads, while Consul offers richer health checking and multi-datacenter support for hybrid or VM-based infrastructure.