Data Loss Prevention Strategies
DLP Overview
Data Loss Prevention (DLP) monitors and controls data in use, in motion, and at rest. A comprehensive DLP strategy covers three domains.
Network DLP
Inspect traffic for sensitive data leaving the network:
from scapy.all import *
import re
def packet_inspector(packet):
if packet.haslayer(Raw):
payload = str(packet[Raw].load)
# Check for credit card patterns
if re.search(r"\b(?:\d[ -]*?){13,16}\b", payload):
print(f"[ALERT] Potential CC leak from {packet[IP].src}")
# Trigger block or alert
return False
# Check for API keys (length > 20, high entropy)
if len(payload) > 20 and has_high_entropy(payload):
print(f"[ALERT] High-entropy data from {packet[IP].src}")
return False
return True
def has_high_entropy(data, threshold=4.5):
from collections import Counter
freq = Counter(data)
entropy = -sum((c/len(data)) * math.log2(c/len(data)) for c in freq.values())
return entropy > threshold
Endpoint DLP
Control data movement on endpoints:
# endpoint-dlp-rules.yaml
rules:
- name: Block USB transfer
trigger: usb_device_connect
action: block
conditions:
- device_type: mass_storage
- device_not_in_allowlist: true
user_notification: "USB mass storage is disabled"
- name: Monitor print of classified docs
trigger: print_job
action: alert
conditions:
- document_classification: ["confidential", "restricted"]
notify:
- security_team
- manager
Cloud DLP
Protect data in SaaS and IaaS environments:
# Google Cloud DLP inspection job
resource "google_data_loss_prevention_job_trigger" "bigquery_scan" {
parent = "projects/my-project/locations/us"
triggers {
schedule {
recurrence_period_duration = "86400s"
}
}
inspect_job {
inspect_template_name = "dlp-sensitive-data-scanner"
storage_config {
big_query_options {
table_reference {
project_id = "my-project"
dataset_id = "customer_data"
table_id = "users"
}
}
}
actions {
save_findings {
output_config {
table {
project_id = "my-project"
dataset_id = "dlp_findings"
}
}
}
}
}
}
Policy Design Principles
Effective DLP policies follow these guidelines:
* **Start in monitor mode**: Log without blocking to understand data flows
2\. **Use exceptions**: Provide secure channels for legitimate transfers 3\. **Tiered responses**: Alert, warn, then block progressively 4\. **User education**: Show policy rationale when blocking
def dlp_decision(data, context):
if context["env"] == "monitor":
log_finding(data, context)
return "allow"
if data.sensitivity == "restricted":
if context["destination"] == "approved_bucket":
return "allow"
elif context["user_justification"]:
log_with_justification(data, context)
return "allow_with_audit"
else:
return "block"
Incident Response Integration
DLP alerts should feed into your incident response pipeline:
-- Find users triggering most DLP alerts
SELECT user_email, COUNT(*) as alert_count,
ARRAY_AGG(DISTINCT rule_name) as triggered_rules
FROM dlp_alerts
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_email
ORDER BY alert_count DESC
LIMIT 10;
Conclusion
Effective DLP requires coverage across network, endpoint, and cloud domains. Design policies iteratively, start with monitoring, layer in blocking controls, and feed findings into your incident response workflow. The goal is to prevent data loss without blocking productivity.