Incident Response Playbook for Developers
Incident response is the structured process of handling security breaches and cyber attacks. Every development team needs a plan, because it is not a matter of if an incident will happen, but when. This article presents a practical incident response playbook based on the NIST SP 800-61 framework.
The NIST Incident Response Framework
The NIST framework defines four phases: Preparation, Detection and Analysis, Containment Eradication and Recovery, and Post-Incident Activity. We add a fifth phase, Triage, between Detection and Containment.
Phase 1: Preparation
Preparation is the most important phase. Without preparation, every incident becomes a chaotic scramble.
**Build a response team**: Identify who handles security incidents. The team should include a incident commander, a security analyst, a system owner, a communications lead, and a legal representative.
**Create runbooks**: Document step-by-step procedures for common incident types: phishing, malware outbreak, data breach, ransomware, denial of service, and insider threat.
**Set up tooling**: Ensure the team has access to:
* Centralized logging (SIEM like Splunk, ELK, or Sentinel)
* Endpoint detection and response (EDR like CrowdStrike or Defender)
* Network monitoring and packet capture
* Secure communication channels (Slack, Teams, or Signal)
* Evidence collection tools (FTK Imager, Volatility, tcpdump)
**Practice regularly**: Run tabletop exercises every quarter. Simulate a ransomware attack, a data exposure, or a compromised credential. Practice builds muscle memory.
Phase 2: Detection and Analysis
Detection relies on monitoring and alerting. Every alert is a potential incident candidate.
**Alert sources**:
* SIEM correlation rules detecting anomalous patterns
* EDR alerts for malware execution or suspicious process behavior
* Cloud provider alerts (GuardDuty, Security Command Center, Defender)
* Application logs showing unusual error rates or access patterns
* User reports of suspicious activity
**Triage questions**:
* What happened? What systems are affected?
* When did it start? Is it ongoing?
* What is the impact? Data loss? Service disruption?
* Is this a true positive or a false alarm?
* What severity level applies?
**Severity classification**:
* SEV-1: Critical. Active data exfiltration, ransomware, or service-wide compromise. Immediate response required.
* SEV-2: High. Confirmed intrusion but contained. Credential compromise affecting multiple users.
* SEV-3: Medium. Potential compromise under investigation. Phishing campaign targeting employees.
* SEV-4: Low. Minor policy violations. Automated scans with no evidence of exploitation.
Phase 3: Containment, Eradication, and Recovery
Containment stops the attack from spreading. Eradication removes the attacker's presence. Recovery returns systems to normal operation.
**Short-term containment**:
* Disconnect affected systems from the network.
* Disable compromised user accounts.
* Block attacker IP addresses at the firewall.
* Rotate credentials for affected services.
# Example: Block an IP at the firewall
iptables -A INPUT -s 203.0.113.50 -j DROP
# Example: Disable a compromised AWS IAM user
aws iam update-access-key \
--access-key-id AKIAIOSFODNN7EXAMPLE \
--status Inactive \
--user-name compromised-user
**Long-term containment**:
* Apply security patches.
* Implement additional monitoring for affected systems.
* Deploy WAF rules to block attack patterns.
**Eradication**:
* Remove malware using EDR tools.
* Rebuild compromised servers from known-good images.
* Revoke all session tokens and API keys.
* Reset root passwords and privileged credentials.
**Recovery**:
* Restore systems from clean backups.
* Verify system integrity before returning to production.
* Gradually reintroduce traffic while monitoring for recurrence.
* Communicate recovery status to stakeholders.
Phase 4: Post-Incident Activity
The post-mortem is where the team learns from the incident and improves processes.
**Post-mortem meeting**: Within one week of containment, gather everyone involved. Blameless culture is essential — the goal is to improve systems, not assign blame.
**Post-mortem document**:
* Timeline of the incident
* Root cause analysis
* What went well and what went wrong
* Detection gaps and containment delays
* Remediation items with owners and deadlines
* Changes to runbooks, tooling, or architecture
## Post-Mortem: Service Credential Leak
**Date**: 2026-04-15
**Severity**: SEV-2
### Timeline
- 2026-04-15 09:23 UTC — GuardDuty alert for anomalous API calls
- 09:25 — Triage begins
- 09:45 — Compromised key identified and revoked
- 10:30 — Containment confirmed
- 14:00 — All affected resources rotated
### Root Cause
GitHub Actions workflow accidentally logged AWS_SECRET_ACCESS_KEY to debug output. Logs were publicly accessible.
### Action Items
- [ ] Remove debug logging from CI/CD workflows (owner: DevOps, due: 04-22)
- [ ] Enable secret scanning on GitHub repository (owner: Security, due: 04-18)
- [ ] Add alert for API keys used outside expected regions (owner: Platform, due: 04-30)
Forensic Evidence Collection
Proper evidence collection preserves data for legal action and root cause analysis.
* Capture memory dumps using tools like LiME or Volatility before powering off systems.
* Collect disk images using `dd` or FTK Imager rather than copying files live.
* Record command output with timestamps using the `script` command.
* Maintain chain of custody documentation for all evidence.
# Capture memory dump with LiME
insmod lime.ko "path=/evidence/memory.dump format=lime"
# Capture disk image
dd if=/dev/sda of=/evidence/disk.img bs=4M conv=noerror,sync
Conclusion
A well-practiced incident response process turns a potential disaster into a manageable event. Preparation separates professional teams from those that panic. Detection without response is just noise. And every incident, no matter how small, is an opportunity to improve.