SOC Operations
Introduction
A Security Operations Center (SOC) is a centralized team responsible for monitoring, detecting, analyzing, and responding to security incidents. Building an effective SOC requires structured processes, skilled personnel, appropriate tools, and continuous improvement.
SOC Tier Model
The SOC team structure typically follows a three-tier model that provides clear career progression and escalation paths.
Tier 1 — Triage
Tier 1 analysts monitor dashboards, triage alerts, and determine initial severity. They handle known false positives and escalate suspicious events to Tier 2.
Responsibilities:
* Monitor SIEM dashboards and alert queues
* Perform initial alert triage and categorization
* Execute basic investigation steps per playbooks
* Create tickets for escalated incidents
* Maintain shift logs
# Tier 1 triage automation example
def triage_alert(alert):
# Check against known false positive patterns
for fp_pattern in false_positive_patterns:
if fp_pattern.matches(alert):
alert.auto_close()
return
# Enrich with threat intelligence
alert.iocs = enrich_iocs(alert.extract_iocs())
# Escalate if critical
if alert.severity == 'critical':
alert.assign_tier(2)
alert.notify('pagerduty')
else:
alert.assign_tier(2, queue='standard')
Tier 2 — Investigation
Tier 2 analysts perform deep investigation, containment, and remediation. They correlate data from multiple sources and determine the full scope of incidents.
Responsibilities:
* Deep-dive analysis of escalated alerts
* Host and network forensic analysis
* Malware triage and reverse engineering
* Incident containment and remediation
* Playbook refinement
Tier 3 — Advanced Analysis
Tier 3 analysts handle the most complex incidents, develop detection rules, perform threat hunting, and conduct post-incident reviews.
Responsibilities:
* Advanced malware analysis and reverse engineering
* Threat hunt development and execution
* SIEM content development and tuning
* Red/purple team collaboration
* Incident review and lessons learned
SIEM Tuning
SIEM tuning reduces noise while maintaining detection coverage. A well-tuned SIEM generates alerts that analysts can actually investigate.
# Example: correlation rule tuning cycle
SIEM_ALERTS=10000
FALSE_POSITIVES=8500
TRUE_POSITIVES=1000
ESCALATIONS=500
echo "Alert volume: $SIEM_ALERTS"
echo "False positive rate: $((FALSE_POSITIVES * 100 / SIEM_ALERTS))%"
echo "Escalation rate: $((ESCALATIONS * 100 / SIEM_ALERTS))%"
Playbooks
Playbooks provide step-by-step instructions for handling specific scenarios. They reduce mean time to respond (MTTR) and ensure consistency.
# Incident response playbook example
playbook:
id: IR-001
name: "Ransomware Detection and Response"
severity: critical
steps:
- phase: identification
actions:
- task: "Verify alert from EDR or user report"
- task: "Identify affected systems"
- task: "Determine ransomware variant via IOC hash"
- phase: containment
actions:
- task: "Isolate affected hosts from network"
- task: "Disable compromised accounts"
- task: "Block C2 infrastructure at firewall"
- phase: eradication
actions:
- task: "Remove malware from affected systems"
- task: "Patch vulnerability used for initial access"
- task: "Reset credentials for affected accounts"
- phase: recovery
actions:
- task: "Restore from clean backups"
- task: "Verify system integrity"
- task: "Gradually restore connectivity"
SOC KPIs
Key performance indicators measure SOC effectiveness and efficiency.
| KPI | Target | Measurement | |-----|--------|-------------| | Mean Time to Detect (MTTD) | < 1 hour | Time from compromise to detection | | Mean Time to Respond (MTTR) | < 4 hours | Time from detection to containment | | Alert Triage Time | < 15 minutes | Time to categorize initial alert | | False Positive Rate | < 30% | False alerts / total alerts | | Escalation Rate | 5-15% | Escalated alerts / total alerts | | Coverage Gap | < 5% | Unmonitored assets / total assets |
Shift Handoff
Effective shift handoffs prevent incidents from falling through the cracks.
shift_handoff:
sections:
- name: "Active Incidents"
fields: [id, severity, status, owner, summary, next_steps]
- name: "Pending Investigations"
fields: [alert_id, initial_findings, pending_actions]
- name: "Maintenance and Outages"
fields: [system, type, eta, impact]
- name: "Notable Events"
fields: [timestamp, description, action_taken]
- name: "Tool Status"
fields: [tool, status, known_issues]
Conclusion
A well-structured SOC combines skilled personnel, documented processes, and appropriate technology. Focus on reducing alert fatigue through continuous tuning, maintaining comprehensive playbooks, measuring performance with meaningful KPIs, and ensuring smooth shift transitions.