Incident response is the structured process of handling security breaches and cyber attacks. Every development team needs a plan, because it is not a matter of if an incident will happen, but when. This article presents a practical incident response playbook based on the NIST SP 800-61 framework.


The NIST Incident Response Framework


The NIST framework defines four phases: Preparation, Detection and Analysis, Containment Eradication and Recovery, and Post-Incident Activity. We add a fifth phase, Triage, between Detection and Containment.


Phase 1: Preparation


Preparation is the most important phase. Without preparation, every incident becomes a chaotic scramble.


**Build a response team**: Identify who handles security incidents. The team should include a incident commander, a security analyst, a system owner, a communications lead, and a legal representative.


**Create runbooks**: Document step-by-step procedures for common incident types: phishing, malware outbreak, data breach, ransomware, denial of service, and insider threat.


**Set up tooling**: Ensure the team has access to:

  • Centralized logging (SIEM like Splunk, ELK, or Sentinel)
  • Endpoint detection and response (EDR like CrowdStrike or Defender)
  • Network monitoring and packet capture
  • Secure communication channels (Slack, Teams, or Signal)
  • Evidence collection tools (FTK Imager, Volatility, tcpdump)

  • **Practice regularly**: Run tabletop exercises every quarter. Simulate a ransomware attack, a data exposure, or a compromised credential. Practice builds muscle memory.


    Phase 2: Detection and Analysis


    Detection relies on monitoring and alerting. Every alert is a potential incident candidate.


    **Alert sources**:

  • SIEM correlation rules detecting anomalous patterns
  • EDR alerts for malware execution or suspicious process behavior
  • Cloud provider alerts (GuardDuty, Security Command Center, Defender)
  • Application logs showing unusual error rates or access patterns
  • User reports of suspicious activity

  • **Triage questions**:

  • What happened? What systems are affected?
  • When did it start? Is it ongoing?
  • What is the impact? Data loss? Service disruption?
  • Is this a true positive or a false alarm?
  • What severity level applies?

  • **Severity classification**:

  • SEV-1: Critical. Active data exfiltration, ransomware, or service-wide compromise. Immediate response required.
  • SEV-2: High. Confirmed intrusion but contained. Credential compromise affecting multiple users.
  • SEV-3: Medium. Potential compromise under investigation. Phishing campaign targeting employees.
  • SEV-4: Low. Minor policy violations. Automated scans with no evidence of exploitation.

  • Phase 3: Containment, Eradication, and Recovery


    Containment stops the attack from spreading. Eradication removes the attacker's presence. Recovery returns systems to normal operation.


    **Short-term containment**:

  • Disconnect affected systems from the network.
  • Disable compromised user accounts.
  • Block attacker IP addresses at the firewall.
  • Rotate credentials for affected services.

  • 
    # Example: Block an IP at the firewall
    
    iptables -A INPUT -s 203.0.113.50 -j DROP
    
    
    
    # Example: Disable a compromised AWS IAM user
    
    aws iam update-access-key \
    
      --access-key-id AKIAIOSFODNN7EXAMPLE \
    
      --status Inactive \
    
      --user-name compromised-user
    
    

    **Long-term containment**:

  • Apply security patches.
  • Implement additional monitoring for affected systems.
  • Deploy WAF rules to block attack patterns.

  • **Eradication**:

  • Remove malware using EDR tools.
  • Rebuild compromised servers from known-good images.
  • Revoke all session tokens and API keys.
  • Reset root passwords and privileged credentials.

  • **Recovery**:

  • Restore systems from clean backups.
  • Verify system integrity before returning to production.
  • Gradually reintroduce traffic while monitoring for recurrence.
  • Communicate recovery status to stakeholders.

  • Phase 4: Post-Incident Activity


    The post-mortem is where the team learns from the incident and improves processes.


    **Post-mortem meeting**: Within one week of containment, gather everyone involved. Blameless culture is essential — the goal is to improve systems, not assign blame.


    **Post-mortem document**:

  • Timeline of the incident
  • Root cause analysis
  • What went well and what went wrong
  • Detection gaps and containment delays
  • Remediation items with owners and deadlines
  • Changes to runbooks, tooling, or architecture

  • 
    ## Post-Mortem: Service Credential Leak
    
    
    
    **Date**: 2026-04-15
    
    **Severity**: SEV-2
    
    
    
    ### Timeline
    
    - 2026-04-15 09:23 UTC — GuardDuty alert for anomalous API calls
    
    - 09:25 — Triage begins
    
    - 09:45 — Compromised key identified and revoked
    
    - 10:30 — Containment confirmed
    
    - 14:00 — All affected resources rotated
    
    
    
    ### Root Cause
    
    GitHub Actions workflow accidentally logged AWS_SECRET_ACCESS_KEY to debug output. Logs were publicly accessible.
    
    
    
    ### Action Items
    
    - [ ] Remove debug logging from CI/CD workflows (owner: DevOps, due: 04-22)
    
    - [ ] Enable secret scanning on GitHub repository (owner: Security, due: 04-18)
    
    - [ ] Add alert for API keys used outside expected regions (owner: Platform, due: 04-30)
    
    

    Forensic Evidence Collection


    Proper evidence collection preserves data for legal action and root cause analysis.


  • Capture memory dumps using tools like LiME or Volatility before powering off systems.
  • Collect disk images using `dd` or FTK Imager rather than copying files live.
  • Record command output with timestamps using the `script` command.
  • Maintain chain of custody documentation for all evidence.

  • 
    # Capture memory dump with LiME
    
    insmod lime.ko "path=/evidence/memory.dump format=lime"
    
    
    
    # Capture disk image
    
    dd if=/dev/sda of=/evidence/disk.img bs=4M conv=noerror,sync
    
    

    Conclusion


    A well-practiced incident response process turns a potential disaster into a manageable event. Preparation separates professional teams from those that panic. Detection without response is just noise. And every incident, no matter how small, is an opportunity to improve.