Incident Response Playbook for Developers


Incident response is the structured process of handling security breaches and cyber attacks. Every development team needs a plan, because it is not a matter of if an incident will happen, but when. This article presents a practical incident response playbook based on the NIST SP 800-61 framework.





The NIST Incident Response Framework





The NIST framework defines four phases: Preparation, Detection and Analysis, Containment Eradication and Recovery, and Post-Incident Activity. We add a fifth phase, Triage, between Detection and Containment.





Phase 1: Preparation





Preparation is the most important phase. Without preparation, every incident becomes a chaotic scramble.





**Build a response team**: Identify who handles security incidents. The team should include a incident commander, a security analyst, a system owner, a communications lead, and a legal representative.





**Create runbooks**: Document step-by-step procedures for common incident types: phishing, malware outbreak, data breach, ransomware, denial of service, and insider threat.





**Set up tooling**: Ensure the team has access to:


* Centralized logging (SIEM like Splunk, ELK, or Sentinel)

* Endpoint detection and response (EDR like CrowdStrike or Defender)

* Network monitoring and packet capture

* Secure communication channels (Slack, Teams, or Signal)

* Evidence collection tools (FTK Imager, Volatility, tcpdump)




**Practice regularly**: Run tabletop exercises every quarter. Simulate a ransomware attack, a data exposure, or a compromised credential. Practice builds muscle memory.





Phase 2: Detection and Analysis





Detection relies on monitoring and alerting. Every alert is a potential incident candidate.





**Alert sources**:


* SIEM correlation rules detecting anomalous patterns

* EDR alerts for malware execution or suspicious process behavior

* Cloud provider alerts (GuardDuty, Security Command Center, Defender)

* Application logs showing unusual error rates or access patterns

* User reports of suspicious activity




**Triage questions**:


* What happened? What systems are affected?

* When did it start? Is it ongoing?

* What is the impact? Data loss? Service disruption?

* Is this a true positive or a false alarm?

* What severity level applies?




**Severity classification**:


* SEV-1: Critical. Active data exfiltration, ransomware, or service-wide compromise. Immediate response required.

* SEV-2: High. Confirmed intrusion but contained. Credential compromise affecting multiple users.

* SEV-3: Medium. Potential compromise under investigation. Phishing campaign targeting employees.

* SEV-4: Low. Minor policy violations. Automated scans with no evidence of exploitation.




Phase 3: Containment, Eradication, and Recovery





Containment stops the attack from spreading. Eradication removes the attacker's presence. Recovery returns systems to normal operation.





**Short-term containment**:


* Disconnect affected systems from the network.

* Disable compromised user accounts.

* Block attacker IP addresses at the firewall.

* Rotate credentials for affected services.





# Example: Block an IP at the firewall


iptables -A INPUT -s 203.0.113.50 -j DROP




# Example: Disable a compromised AWS IAM user


aws iam update-access-key \


--access-key-id AKIAIOSFODNN7EXAMPLE \


--status Inactive \


--user-name compromised-user







**Long-term containment**:


* Apply security patches.

* Implement additional monitoring for affected systems.

* Deploy WAF rules to block attack patterns.




**Eradication**:


* Remove malware using EDR tools.

* Rebuild compromised servers from known-good images.

* Revoke all session tokens and API keys.

* Reset root passwords and privileged credentials.




**Recovery**:


* Restore systems from clean backups.

* Verify system integrity before returning to production.

* Gradually reintroduce traffic while monitoring for recurrence.

* Communicate recovery status to stakeholders.




Phase 4: Post-Incident Activity





The post-mortem is where the team learns from the incident and improves processes.





**Post-mortem meeting**: Within one week of containment, gather everyone involved. Blameless culture is essential — the goal is to improve systems, not assign blame.





**Post-mortem document**:


* Timeline of the incident

* Root cause analysis

* What went well and what went wrong

* Detection gaps and containment delays

* Remediation items with owners and deadlines

* Changes to runbooks, tooling, or architecture





## Post-Mortem: Service Credential Leak




**Date**: 2026-04-15


**Severity**: SEV-2




### Timeline


- 2026-04-15 09:23 UTC — GuardDuty alert for anomalous API calls


- 09:25 — Triage begins


- 09:45 — Compromised key identified and revoked


- 10:30 — Containment confirmed


- 14:00 — All affected resources rotated




### Root Cause


GitHub Actions workflow accidentally logged AWS_SECRET_ACCESS_KEY to debug output. Logs were publicly accessible.




### Action Items


- [ ] Remove debug logging from CI/CD workflows (owner: DevOps, due: 04-22)


- [ ] Enable secret scanning on GitHub repository (owner: Security, due: 04-18)


- [ ] Add alert for API keys used outside expected regions (owner: Platform, due: 04-30)







Forensic Evidence Collection





Proper evidence collection preserves data for legal action and root cause analysis.




* Capture memory dumps using tools like LiME or Volatility before powering off systems.

* Collect disk images using `dd` or FTK Imager rather than copying files live.

* Record command output with timestamps using the `script` command.

* Maintain chain of custody documentation for all evidence.





# Capture memory dump with LiME


insmod lime.ko "path=/evidence/memory.dump format=lime"




# Capture disk image


dd if=/dev/sda of=/evidence/disk.img bs=4M conv=noerror,sync







Conclusion





A well-practiced incident response process turns a potential disaster into a manageable event. Preparation separates professional teams from those that panic. Detection without response is just noise. And every incident, no matter how small, is an opportunity to improve.