The SSG is prepared to respond to an event or alert and is regularly included in the incident response process, either by creating its own incident response capability or by regularly interfacing with the organization’s existing team. A standing meeting between the SSG and the incident response team keeps information flowing
in both directions. Having prebuilt communication channels with critical vendors (e.g., ISP, monitoring, IaaS, SaaS, PaaS) is also very important.
Defects identified in production through operations monitoring are fed back to development and used to change engineering behavior. Useful sources of production defects include incidents, bug bounty (see [CMVM3.4]), responsible disclosure (see [CMVM3.7]), SIEMs, production logs, customer feedback, and telemetry from cloud security posture monitoring, container configuration monitoring, RASP, and similar technologies. Entering production defect data into an existing bug-tracking system (perhaps by making use of a special security flag) can close the information loop and make sure that security issues get fixed. In addition, it’s important to capture lessons learned from production defects and use these lessons to change the organization’s behavior. In the best of cases, processes in the SSDL can be improved based on operations data (see [CMVM3.2]).
Defects found in operations (see [CMVM1.2]) are entered into established defect management systems and tracked through the fix process. This tracking ability could come in the form of a two-way bridge between defect finders and defect fixers or possibly through intermediaries (e.g., the vulnerability management team), but make sure the loop is closed completely. Defects can appear in all types of deployable artifacts, deployment automation, and infrastructure configuration. Setting a security flag in the defect tracking system can help facilitate tracking.
The organization can make quick code and configuration changes when software (e.g., application, API, microservice, infrastructure) is under attack. An emergency response team works in conjunction with stakeholders such as application owners, engineering, operations, and the SSG to study the code and the attack, find
a resolution, and fix the production code (e.g., push a patch into production, rollback to a known-good state, deploy a new container). Often, the emergency response team is the engineering team itself. A well-defined process is a must here, a process that has never been used might not actually work.
The organization has a map of its software deployments and related containerization, orchestration, and deployment automation code, along with the respective owners. If a software asset needs to be changed or decommissioned, operations or DevOps teams can reliably identify both the stakeholders and all the places where the change needs to occur. Common components can be noted so that when an error occurs in one application, other applications sharing the same components can be fixed as well. Building an accurate representation of an inventory will likely involve enumerating at least the source code, the open source incorporated both during the build and during dynamic production updates, the orchestration software incorporated into production images, and any service discovery or invocation that occurs in production.
When a security defect is found in operations (see [CMVM1.2]), the organization searches for and fixes all occurrences of the defect in operations, not just the one originally reported. Doing this proactively requires the ability to reexamine the entire operations software inventory (see [CMVM2.3]) when new kinds of defects come to light. One way to approach reexamination is to create a ruleset that generalizes deployed defects into something that can be scanned for via automated code review. In some environments, addressing a defect might involve removing it from production immediately and making the actual fix in some priority order before redeployment. Use of orchestration can greatly simplify deploying the fix for all occurrences of a software defect (see [SE2.7]).
Experience from operations leads to changes in the SSDL (see [SM1.1]), which can in turn be strengthened to prevent the reintroduction of defects. To make this process systematic, the incident response postmortem includes a feedback-to-SSDL step. The outcomes of the postmortem might result in changes such as to tool-based policy rulesets in a CI/CD pipeline and adjustments to automated deployment configuration (see [SE2.2]). This works best when root-cause analysis pinpoints where in the software lifecycle an error could have been introduced or slipped by uncaught (e.g., a defect escape). DevOps engineers might have an easier time with this because all the players are likely involved in the discussion and the solution. An ad hoc approach to SSDL improvement isn’t sufficient for prevention.
The SSG simulates high-impact software security crises to ensure that software incident detection and response capabilities minimize damage. Simulations could test for the ability to identify and mitigate specific threats or could begin with the assumption that a critical system or service is already compromised and evaluate the organization’s ability to respond. Planned chaos engineering can be effective at triggering unexpected conditions during simulations. The exercises must include attacks or other software security crises at the appropriate software layer to generate useful results (e.g., at the application layer for web applications and at lower layers for IoT devices). When simulations model successful attacks, an important question to consider is the time required for clean up. Regardless, simulations must focus on security-relevant software failure, not on natural disasters or other types of emergency response drills. Organizations that are highly dependent on vendor infrastructure (e.g., cloud service providers, SaaS, PaaS) and security features will naturally include those things in crisis simulations.
The organization solicits vulnerability reports from external researchers and pays a bounty for each verified and accepted vulnerability received. Payouts typically follow a sliding scale linked to multiple factors, such as vulnerability type (e.g., remote code execution is worth $10,000 vs. CSRF is worth $750), exploitability (demonstrable exploits command much higher payouts), or specific service and software versions (widely deployed or critical services warrant higher payouts). Ad hoc or short-duration activities, such as capture-the-flag contests or informal crowdsourced efforts, don’t constitute a bug bounty program.
The SSG works with engineering teams to verify with automation the security properties (e.g., adherence to agreed-upon security hardening) of infrastructure generated from controlled self-service processes. Engineers use self-service processes to create networks, storage, containers, and machine instances, to orchestrate deployments, and to perform other tasks that were once IT’s sole responsibility. In facilitating verification, the organization uses machine-readable policies and configuration standards (see [SE2.2]) to automatically detect issues and report on infrastructure that does not meet expectations. In some cases, the automation makes changes to running environments to bring them into compliance, but in many cases, organizations use a single policy to manage automation in different environments, such as in multi- and hybrid-cloud environments.
The organization collects and publishes risk information about the applications, services, APIs, containers, and other software it deploys. Whether captured through manual processes or telemetry automation, published information extends beyond basic software security (see [SM2.1]) and inventory data (see [CMVM2.3]) to include risk information. This information usually includes constituency
of the software (e.g., BOMs [SE3.6]), provenance data about what group created it and how, and the risks associated with known vulnerabilities, deployment models, security controls, or other security characteristics intrinsic to each artifact. This approach stimulates cross-functional coordination and helps stakeholders take informed risk management action. Making a list of risks that aren’t used for decision support won’t achieve useful results.
Provide external bug reporters with a line of communication to internal security experts through a low-friction, public entry point. These experts work with bug reporters to invoke any necessary organizational responses and to coordinate with external entities throughout the defect management lifecycle. Successful disclosure processes require insight from internal stakeholders, such as legal, marketing, and public relations roles, to simplify and expedite decision-making during software security crises (see [CMVM3.3]). Although bug bounties might be important to motivate some researchers (see [CMVM3.4]), proper public attribution and a low-friction reporting process is often sufficient motivation for researchers to participate in a coordinated disclosure. Most organizations will use a combination of easy-to-find landing pages, common email addresses (security@), and embedded product documentation when appropriate (security.txt) as an entry point for external reporters to invoke this process.
Operations standards and procedures proactively minimize application attack surfaces by using attack intelligence and application weakness data to limit vulnerable conditions. Finding and fixing software defects in operations is important (see [CMVM1.2]) but so is finding and fixing errors in cloud security models, VPNs, segmentation, security configurations for networks, hosts, and applications, etc., to limit the ability to successfully attack deployed applications. Combining attack intelligence (see [AM1.5]) with information about software assets (see [AM2.9]) and a continuous view of application weaknesses helps ensure that attack surface management keeps pace with attacker methods. SBOMs (see [SE3.6]) are also an important information source when doing attack surface management in a crisis.