Data Classification
Why Classify Data?
Data classification ensures sensitive information receives appropriate protection. Without classification, you either over-protect everything (wasting resources) or under-protect critical data (inviting breaches).
Classification Levels
Define clear tiers:
| Level | Label | Examples | Controls | |-------|-------|----------|----------| | 4 | Restricted | PII, trade secrets | Encryption, MFA, DLP | | 3 | Confidential | Financial reports | Encryption at rest | | 2 | Internal | HR policies | Access control | | 1 | Public | Marketing materials | No restrictions |
Automated Classification
Use content inspection to classify data automatically:
import re
import hashlib
class DataClassifier:
def __init__(self):
self.patterns = {
"ssn": r"\d{3}-\d{2}-\d{4}",
"email": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
"credit_card": r"\b(?:\d[ -]*?){13,16}\b"
}
def classify_document(self, content, metadata):
score = 0
findings = []
for label, pattern in self.patterns.items():
matches = re.findall(pattern, content)
if matches:
score += len(matches) * 10
findings.append({"type": label, "count": len(matches)})
if score > 50:
return "restricted", findings
elif score > 10:
return "confidential", findings
elif metadata.get("internal"):
return "internal", findings
return "public", findings
Handling Procedures
Define procedures for each classification level:
# handling-policies.yaml
restricted:
storage: encrypted_bucket_kms
transmission: require_tls_1.3
retention: 7_years
destruction: shred_and_degauss
sharing: require_nda_and_approval
confidential:
storage: encrypted_bucket
transmission: require_tls_1.2
retention: 3_years
destruction: shred
sharing: require_approval
Labeling Implementation
Apply labels at multiple layers:
// S3 object tagging for classification
const AWS = require("aws-sdk");
const s3 = new AWS.S3();
async function tagObject(bucket, key, classification) {
await s3.putObjectTagging({
Bucket: bucket,
Key: key,
Tagging: {
TagSet: [
{ Key: "classification", Value: classification },
{ Key: "classified-by", Value: "auto-classifier-v2" },
{ Key: "classified-at", Value: new Date().toISOString() }
]
}
}).promise();
}
Integration with DLP
Classification feeds directly into DLP policies:
-- Block restricted data leaving the network
CREATE DLP POLICY block_restricted_exfiltration
MATCHES classification = 'restricted'
AND operation IN ('email.send', 'usb.copy', 'cloud.upload')
ACTION block;
Conclusion
Data classification is foundational to information security. Automate where possible, define clear handling procedures, and integrate classification labels across your data protection stack. Start with the most sensitive data and expand coverage iteratively.