Webhooks are the backbone of event-driven architectures — they power payment notifications, CI/CD triggers, and SaaS integrations. But implementing webhooks reliably is harder than it looks: you need retry logic, idempotency, security, and monitoring. This guide covers the complete production-grade webhook implementation, both as a sender and a receiver.
Webhook Architecture Overview
Sender (You) Receiver (Third-Party)
| |
| 1. Event occurs (payment.created) |
| 2. Look up webhook URL + secret |
| 3. Build payload + signature |
| 4. POST → ──────────────────────→ | 5. Verify signature
| | 6. Process event
| 7. ← 200 OK | 7. Return 200 OK
| |
| 8. If not 200: retry with backoff |
| Attempt 1: immediate |
| Attempt 2: +5s |
| Attempt 3: +25s (30s total) |
| Attempt 4+: exponential (up to 3 days)
Webhook Sender: Implementation Checklist
| Feature | Why It Matters | Implementation |
| Signature (HMAC-SHA256) | Proves the webhook came from you | Header: X-Webhook-Signature: t=timestamp,v1=HMAC(secret, timestamp+body) |
| Idempotency Key | Prevents duplicate processing | Header: X-Webhook-Id: unique_event_id |
| Retry with Backoff | Handles transient failures | Exponential backoff: 5s, 25s, 125s, 625s... max 3 days |
| Delivery Logging | Debugging failed deliveries | Store: event_id, URL, status_code, request_body, response_body, duration_ms |
| Manual Retry UI | Let users re-trigger failed deliveries | Admin panel showing failed deliveries with "Retry" button |
| Timeout | Don't hang your workers | 30 second timeout (most webhook handlers complete in <5s) |
Webhook Security: Signature Verification
# Python: Webhook sender (generate signature)
import hmac, hashlib, time, json
def sign_webhook(secret: str, body: dict) -> dict:
timestamp = str(int(time.time()))
payload = json.dumps(body)
signed = hmac.new(
secret.encode(),
f"{timestamp}.{payload}".encode(),
hashlib.sha256
).hexdigest()
return {
"X-Webhook-Id": generate_event_id(),
"X-Webhook-Signature": f"t={timestamp},v1={signed}",
"body": payload
}
# Python: Webhook receiver (verify signature)
def verify_webhook(secret: str, signature: str, raw_body: bytes) -> bool:
# Parse: "t=1234567890,v1=abc123..."
parts = dict(p.split("=") for p in signature.split(","))
timestamp, expected = parts["t"], parts["v1"]
# Reject old timestamps (prevent replay attacks)
if abs(time.time() - int(timestamp)) > 300: # 5 min tolerance
return False
computed = hmac.new(
secret.encode(),
f"{timestamp}.{raw_body.decode()}".encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(computed, expected) # Constant-time comparison
Webhook Receiver: Implementation Checklist
| Feature | Why It Matters | Implementation |
| Signature Verification | Prevents spoofed webhooks | Verify HMAC before processing (see above) |
| Idempotency | Handle retries safely | Store processed event_ids, return 200 for duplicates |
| Fast 200 Response | Sender knows delivery succeeded | Respond 200 immediately, process asynchronously (job queue) |
| Event Ordering | Handle out-of-order delivery | Use event version/sequence number; ignore stale events |
| IP Allowlisting | Additional security layer | Only accept webhooks from known sender IPs |
Common Webhook Pitfalls
| Pitfall | Problem | Solution |
| Processing in the request handler | Slow processing → timeout → sender retries → duplicates | Accept webhook, enqueue job, return 200 |
| No idempotency | Retries create duplicate orders/transactions | Store event_id, skip duplicates |
| Ignoring signature | Anyone can POST fake events | Always verify signature before processing |
| No delivery monitoring | Failed deliveries go unnoticed for days | Alert when delivery rate < 95% |
| Hardcoded URLs | Cannot update endpoints without deploy | Store webhook endpoints in database, with UI for management |
Bottom line: A production-grade webhook system needs four things: HMAC signatures (security), idempotency keys (reliability), exponential backoff retries (deliverability), and a delivery log (debugging). The most common mistake is processing webhooks synchronously in the request handler — always accept, enqueue, and return 200 immediately. See also: Rate Limiting Strategies and CI/CD Pipeline Guide.