Organisations face an increasing number of sophisticated cyberattacks, for larger organisations, the sheer volume of cyber alerts can be overwhelming. To protect critical assets and data, and to ensure the business can function effectively, security teams must be able to respond quickly and effectively to a potential incident. One of the most important steps in achieving this is a well-structured incident response runbook.
What is a Runbook?
A runbook is a step of actions to take when a specific incident has occurred. These specific scenarios are called usecases. When an alert for a usecase is trigger, a runbook should be followed to guide the investigation.
Why are Runbooks important?
A runbook is important for several reasons, below are some of the ones we believe are key:
- Response Time
- Allows your team to respond to an incident quickly and appropriately
- Efficiency
- Ensures that your team take the correct steps in investigating and remediating the incident
- Ensures quality and consistency
- Incident handling is standardised, reducing the risk of error from inexperienced team members
- Knowledge retention
- Runbooks are built by experienced members of the team, therefore if they were to leave the business, their knowledge would not leave with them
How should a Runbook be structured?
Some companies build a step-by-step guide without aligning to a framework. This can lead to inconsistencies with separate Runbooks because the steps recorded may not be in the right order or the quality may vary depending on the author. Using a framework can help resolve this problem. In the past I have used the SANS 6-step Incident Response LifeCycle but most recently I have been using a Detect, Respond and Recover approach like the NIST Cyber Security Framework (CSF). Today we will use this to create a runbook.
- Detect
- Identify: Here you triage the alert and confirm it is a genuine threat (true positive).
- Scope: An extremely important element within identify that cannot be stressed enough is scoping: When you have confirmed the alert is a threat and you need to understand how big a threat it is. Here your analyst is looking to answer 2 questions:
- Has there been any lateral movement (how many devices are impacted)?
- Has there been any privilege escalation (are other accounts, specifically admin accounts, compromised)?
- Respond
- Contain: Contain the spread of the incident.
- Eradicate: Fix the root cause.
- Recover: Get everything up and running as normal.
When my team creates a runbook, I require them to build several “How To” guides to support this document. A “How To” guide helps the person following the runbook carry out the actions mentioned in the runbook. E.g. “Raise a ticket” links to a “How To” guide that shows the analyst how we expect a security Incident ticket be created. A simple rule to follow is: If a new level 1 analyst would question how to do something from a runbook, it should have a linked “How To” guide.
Lets create a Runbook.
Malware Infection Runbook
Detect – Identify and Scope:
- Raise a ticket
- Actions to take:
- Check any provided Indicators of Compromise (IOC) (IPs, Domains, Hashes) against reputable open-source threat intelligence tools such as Virus Total, Alien Vault OTX, IBM XForce, Cisco Talos, Sophos Labs, urlscan & Google Safe Browsing etc.
- Check the logs in the EDR/SIEM to confirm the scope. Have those IOCs been seen elsewhere in the network? How many devices and accounts are affected?
- Investigate the device looking for signs of compromise:
- Internal recon (“net” series of commands or nmap scan, etc.)
- Privilege escalation (password dumping).
- Persistence (creation or updates to services or scheduled tasks)
- External communication (connection to C2 server)
- Lateral movement (look for any compromised accounts logging into other devices)
- Confirm the impact? What is stored on the compromised system(s)? What services could be impacted? What is exposed due to this incident?
- Check the CMDB or speak to Service owners
- Use this information to confirm priority and which teams need to be included, such as risk and compliance.
- All key events, decisions and actions should be recorded on a timeline
- If P1, trigger the major incident process
Respond – Contain:
- Choose the correct steps based on incident scope:
- Isolate affected systems using your EDR
- Block malicious IPs and Domains; Guides for blocking in your security tools
- Contact IT (use the process agreed with IT for incident communication and collaboration) to disable compromised accounts (revoke privileged access for affected accounts)
- Stop processes or services that are identified as malicious.
- Investigate the user in in your EDR tool to learn the root cause. (Forensics partner may be needed at this stage). Look for:
- Suspicious emails
- File downloads
- login attempts
- execution of software
- Run a malware scan
- Use any information found to rescan/look for these indicators across the network to identify any other devices that may have been compromised
Respond – Eradicate:
- Choose the correct steps based on the root cause of the incident:
- Remove malicious software or code
- This may require IT to rebuild the user device and/or profile
- Delete attacker created or unauthorised accounts
- Contact IT for this action
- Reset passwords for genuine accounts that may have been compromised.
- Look for and remove any persistence that was found in previous steps
- IT would be required to help make the changes, depending on the findings, e.g. deleted services, or changing registry settings, etc.
- Patch vulnerabilities
- IT for end user. Service owners for other applications
- Rotate any compromised certificates or keys
- Service owners
Respond – Recover:
- Choose the correct steps based on incident requirements:
- Ensure any corrupted data is restore from a known good backup
- IT or service owner
- Begin to re-enable services, in a controlled manner.
- Run EDR scan
- Monitor for any sign of re-occurrence. And ensure no malicious code remains.
- Setup up temporary monitoring.
- Ensure there are no vulnerabilities on system.
- Run an adhoc Vulnerability scan.
- Test system operations.
Talk to DigiF9 for vCISO Services Security operations can be challenging to implement and mature effectively. If you need practical guidance on developing and operationalising your incident response capabilities, DigiF9’s vCISO services can help. We work hands-on with your team to build and enhance security operations that actually work – focusing on practical solutions over complex tools. Our experienced security professionals can assist with everything from creating effective runbooks to implementing efficient incident response processes tailored to your organisation’s needs. Contact us to discuss how we can help strengthen your security operations with straightforward, results-focused solutions.
Get in touch here