ISO/IEC DIS 9837:2024(en)
ISO/IEC JTC 1/SC 7/N9548
Secretariat: BIS
Date: 2024-09-04
Systems and software engineering — Systems resilience concepts
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
Contents
3.1 Systems resilience fundamentals 1
4 Key resilience concepts and their relationships 8
4.1 The system context for systems resilience 8
4.2 Understanding resilience 9
4.4 Relation of resilience to other system qualities 10
5.2 Fundamental objectives layer 11
5.4 Resilience techniques layer 12
5.5 Resilience considerations during systems engineering life cycle activities 13
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions related to conformity assessment, as well as information about ISO's adherence to the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Joint Technical Committee ISO/JTC 1/SC 7/WG 30.
Any feedback or questions on this document should be directed to the user’s national standards body. A complete listing of these bodies can be found at www.iso.org/members.html.
Introduction
As the complexity of systems continues to increase and the list of capabilities required of those systems continues to grow, the systems are expected to deliver those capabilities under various conditions, including adverse ones. Resilience is the quality characteristic that enables systems to achieve this.
This document focuses on establishing systems resilience concepts that form the basis for understanding, building, and enhancing the resilience of systems. It also provides a resilience framework that includes fundamental objectives, means objectives, and techniques for achieving systems resilience. It is compatible with a systems engineering approach and systems life cycle processes.
This document serves as a foundation for other documents related to various aspects of systems resilience.
Systems and software engineering — Systems resilience concepts
1.0 Scope
This document establishes concepts for understanding and improving systems resilience. Systems resilience addresses the capabilities of systems under adversity. Broadly, systems resilience involves the capabilities of systems to avoid, withstand, and recover from adversity.
Adversities can be known or unknown and can arise in many ways, such as security threats, dangers affecting safety, financial and business impacts from external system disruptions, from internal system faults and defects, and from adverse effects of disclosure or loss of data and information. Resilience goals are realized through application of techniques during requirements, architecture, design or operations processes of a system.
This document is applicable to human-created systems that can be either physical or conceptual, or a combination of both. Systems include services and products. It is not intended to apply to naturally occurring systems.
NOTE “System” as used in this document follows the definition and scope of systems in ISO/IEC/IEEE 15288:2023.
2.0 Normative references
There are no normative references in this document.
3.0 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp/ui
— IEC Electropedia: available at https://www.electropedia.org/
NOTE For additional terms and definitions in the field of systems and software engineering, see ISO/IEC/IEEE 24765, which is published periodically as a “snapshot” of the SEVOCAB (Systems and software engineering — Vocabulary) database and is publicly accessible at https://www.computer.org/sevocab.
3.1 Systems resilience fundamentals
3.1.1
resilience
ability to provide required capability (3.1.2) when faced with adversity (3.1.3)
[SOURCE: ISO/IEC/IEEE 24641:2023, 3.1.28]
Note 1 to entry: Under adversity, what is required of a system can be distinct from the capability required during normal operation.
Note 2 to entry: Resilience includes the ability to anticipate and adapt to, resist or quickly recover from a potentially disruptive event.
3.1.2
capability
ability to do something useful under a particular set of conditions
[SOURCE: ISO/IEC/IEEE 24641:2023, 3.1.3, modified — Note to entry has been removed.]
3.1.3
adversity
anything that can degrade the required capability (3.1.2) of the system, directly or indirectly
Note 1 to entry: In contrast to risk, which is the effect of uncertainty on outcome [ISO/IEC/IEEE 15288:2023, 3.40], adversity can be anything actual or possible.
3.1.4
stress
force, demand or influence on a system due to adversity (3.1.3) that directly affects the system
3.1.5
error recovery
automatic detection, control, and correction of an internal discrepancy
[SOURCE: IEC 60050:192-10-16, modified — “erroneous state” is changed to “discrepancy”.]
3.1.6
fault
defect in a system or a representation of a system that if executed/activated could potentially result in an error
[SOURCE: ISO/IEC/IEEE 15026-1]
3.1.1 Fundamental objectives
3.2.1
fundamental objective
end goal for achieving resilience (3.1.1)
Note 1 to entry: In this document, there are three fundamental objectives of resilience: avoiding, withstanding and recovering from adversity.
3.2.2
avoiding
eliminating or reducing exposure to adversity (3.1.3)
3.2.3
recovering
replenishing lost capability (3.1.2) after degradation
Note 1 to entry: The degree of recovery can be less than, the same as, or greater than the degree of degradation.
3.2.4
withstanding
resisting degradation of capability (3.1.2) when stressed
3.1.2 Means objectives
3.3.1
means objective
objective which enables the achievement of other objectives
EXAMPLE Anticipation (3.3.4) is a means objective used to achieve the fundamental objectives of avoiding, withstanding and recovering from adversity.
Note 1 to entry: Means objectives are used to achieve fundamental objectives or other means objectives.
Note 2 to entry: A means objective can be implemented through one or more resilience techniques.
Note 3 to entry: Means objectives are named by noun phrases in this document. Some names are based on the approach that is provided, such as preparation, prevention, rearchitecting and redeployment. Some names are based on the outcome achieved, such as agility, integrity, robustness and situation awareness.
3.3.2
adversity management
acting to reduce the effectiveness of adversities (3.1.3)
3.3.3
agility
ability of a system to adapt to deliver required capability (3.1.2) in unpredictably evolving conditions
3.3.4
anticipation
establishing awareness of the nature of potential adversities, their likely consequences and appropriate responses, prior to the adversity (3.1.3) stressing the system
3.3.5
damage control
limiting the propagation of malfunction within a system
3.3.6
evolution
modifying a system to address changes over time due to adversity (3.1.3) or emergent needs
3.3.7
graceful degradation
ability of a system to transition to acceptable states after damage
Note 1 to entry: See also fail soft (3.4.17)
3.3.8
integrity
means by which a system remains complete and unaltered under adversity (3.1.3)
3.3.9
preparation
developing and maintaining courses of action that address predicted adversity (3.1.3)
3.3.10
prevention
precluding the realization of adversity (3.1.3)
3.3.11
rearchitecting
modifying an architecture for improved resilience (3.1.1)
3.3.12
redeployment
putting system capabilities (3.1.2) into operation following stress (3.1.4)
3.3.13
robustness
means to enable a system to function correctly in the presence of invalid inputs or stressful environmental conditions
3.3.14
service continuity
means to deliver required capability (3.1.2) under stress (3.1.4)
3.3.15
situational awareness
perception of elements in the environment, and a comprehension of their meaning, and could include a projection of the future status of perceived elements and the risk associated with that status
[SOURCE: ISO 17757:2019, 3.1.23]
Note 1 to entry: Situational awareness of the environment is complemented by resilience modeling (3.4.37) of the internal elements of the system.
3.1.3 Resilience techniques
3.4.1
resilience technique
method to realize a means objective (3.3.1)
Note 1 to entry: A resilience technique can contribute to the realization of one or more means objectives.
Note 2 to entry: Resilience techniques are named by noun phrases in this document. Some names are derived from the ability provided by the technique, such as absorption (3.4.2), dynamic repositioning (3.4.16), forward recovery (3.4.19), and replacement (3.4.34). Other names are derived from the outcome achieved by application of the resilience technique, such as fail soft (3.4.17), human-in-the-loop (3.4.20), safe state (3.4.38), and self-modeling (3.4.40).
3.4.2
absorption
withstanding (3.2.4) stress (3.1.4) without unacceptable degradation in the system’s capability (3.1.2)
3.4.3
adaptive response
dynamic reaction to limit or avoid consequences of an adverse situation
Note 1 to entry: An adaptive response can occur before or after adversity stresses the system.
3.4.4
anomaly detection
discovering salient irregularities or abnormalities in the system or in its environment in a timely manner that enables effective response action
3.4.5
boundary enforcement
implementing process, temporal, and spatial limits intended to protect the system
3.4.6
buffering
reducing degradation due to stress (3.1.4) by means of excess capacity
3.4.7
coordinated defense
using multiple, synergistic mechanisms to protect required capability (3.1.2)
3.4.8
deception
confusing and thus impeding an adversary
3.4.9
defense-in-depth
hierarchical deployment of different levels of diverse equipment and procedures (known as barriers) to prevent the escalation of faults to a hazardous condition
[SOURCE: ISO 1709:2018, 3.12]
3.4.10
detection avoidance
reducing an adversary’s awareness of a system
3.4.11
disaggregation
dispersing missions, functions, subsystems, or components across multiple systems or subsystems
3.4.12
distributed privilege
requiring multiple authorized entities to act in a coordinated manner before a system function is allowed to proceed
3.4.13
diversification
using a heterogeneous set of technologies, data sources, processing locations, equipment locations, supply chains, communications paths, etc., to minimize common vulnerabilities and common mode failures
3.4.14
domain separation
physically or logically isolating items with distinctly different protection needs
3.4.15
drift correction
preventive action to keep system operation within the boundaries of acceptable performance
3.4.16
dynamic repositioning
relocation of system functionality or components
3.4.17
fail soft
technique to enable prioritized, gradual termination of affected functions, in the case of a fault, or when failure is imminent
[SOURCE: IEC 60050:192-10-07, modified — “capable of” is replaced by “technique to enable”.]
3.4.18
fault tolerance
technique to continue functioning with certain faults present
[SOURCE: IEC 60050:192-10-09, modified — “ability” is replaced by “technique”.]
3.4.19
forward recovery
error recovery in which a system, program, database, or other system resource is restored to a new, not previously occupied state in which it can perform required functions.
[SOURCE: IEC 60050:192-10-18]
3.4.20
human-in-the-loop
including persons as part of a system for adaptive capability
3.4.21
least functionality
technique by which each element of a system has the ability to accomplish its required functions, but no more
3.4.22
least persistence
technique by which system elements are available, accessible, and able to fulfil their design intent only for the time they are needed
3.4.23
least privilege
technique by which system elements are allocated authorizations that are necessary to accomplish their specified functions, but no more
3.4.24
least sharing
technique by which system resources are accessible by multiple system elements only when necessary, and among as few system elements as possible
3.4.25
loose coupling
technique by which dependencies between elements of a system are intentionally reduced to limit the potential for propagation of damage
3.4.26
maintainability
technique by which a system has the ability to be retained in, or restored to, a state to perform as required
3.4.27
mediated access
controlling the ability to use system elements
3.4.28
modularity
technique by which a system is composed of discrete elements such that a change to one element has minimal impact on other components
3.4.29
neutral state
technique by which a system assumes a condition in which it is acceptable to make no changes while awaiting stakeholder evaluation of possible changes
3.4.30
non-persistence
retaining information, services, and connectivity or functions for a limited time, thereby reducing an adversary’s opportunity to exploit vulnerabilities and establish a persistent foothold
3.4.31
privilege restriction
restricting authorization assigned to an entity by an authority
3.4.32
protective default
technique by which a predetermined configuration of a system safeguards its effectiveness
3.4.33
protective recovery
ensuring that recovery of a system element does not result in, nor lead to, unacceptable loss
3.4.34
redundancy
technique by which a system has more than one means at a given time for providing required capability (3.1.2)
3.4.35
replacement
changing parts of an existing item to regain its functionality
[SOURCE: ISO 20887:2020, 3.32]
3.4.36
resilience modeling
developing and maintaining useful representations of required system capabilities (3.1.2), how those capabilities are generated, the system environment, and the potential for degradation due to adversity (3.1.3)
3.4.37
resilience monitoring
gathering, fusing, and analyzing data to identify vulnerabilities, adverse conditions and system degradation and evaluating the efficacy of system countermeasures
3.4.38
safe state
providing the ability to transition to a state that does not lead to critical or catastrophic consequences
[SOURCE: ISO 14620-1:2018, 2.1.16, modified — ‘providing the ability to transition to’ has been added.]
3.4.39
segmentation
technique by which system elements are separated, logically or physically, to limit the spread of damage
3.4.40
self-modeling
providing a system with a model of itself to enable it to achieve resilience (3.1.1)
Note 1 to entry: A system provided with a model of itself can have capability to adapt to adversities.
3.4.41
substantiated integrity
providing the ability to ensure that system components have not been corrupted
3.4.42
substitution
use of system elements from an alternate source or with differences in form or function to provide or restore capability (3.1.2)
3.4.43
system reconfiguration
technique to change the location or functionality of system elements, in the event of failure or external disturbance, to enable the system to continue operation
[SOURCE: IEC 60050:192-10-15, modified — “process” is replaced by “technique”.]
3.4.44
tolerance
technique to provide capability (3.1.2) in spite of the effects of stress (3.1.4) on the system
3.4.45
virtualization
use of digital entities to represent a system, allowing for redundancy and simultaneous use of physical system resources
4.0 Key resilience concepts and their relationships
4.1 The system context for systems resilience
The concepts presented in this document are grounded in systems engineering. Systems produce required capability. The term capability is used in this document to represent a system’s ability to achieve desired effects. It is the delivery of such capability, under adversity, that resilience addresses. Systems interact with their environments. The concept of resilience makes explicit the consideration of adversity in requirements analysis, architecture, design, and operations tasks to develop a system that addresses adverse conditions under which it needs to operate. When resilience is a mission objective, it can be reflected in stakeholder and system requirements.
System resilience is determined by properties of many other systems in addition to the properties of the system of interest. Resilience often depends on enabling systems that conceive, deliver, support, maintain, repair, and overhaul the system of interest. Resilience is affected by the entire capability acquisition and support chain.
Resilience directs the systems engineering focus to the system’s ability to deliver required capability when faced with adversity. This perspective can be important to stakeholders but is sometimes overlooked. Resilience in the realm of systems engineering involves identifying: 1) the capabilities that are required of the system; 2) the adverse conditions under which the system is required to deliver those capabilities; and 3) the architecture, design, and operations decisions that enable the system to provide required capabilities when facing adversity.
4.1.1 Understanding resilience
Resilience encompasses a system’s ability to avoid, withstand, and recover from adversity. The following aspects of resilience are important in systems analysis for resilience and resilience engineering:
— the system’s capabilities of interest, how they are to be measured, and the requisite levels of delivery; understanding a system’s capability can further involve:
— the system’s architecture and/or designs under consideration;
— the system’s functional behavior, data and control flows that deliver the required capability.
— the adversities that can affect the system: the risks to capability, the sources and types of adversity, the time scales of those adversities, and how adversity can affect the system and any consequences if it does.
— the system’s capabilities in response to adversities: the means by which capability can be restored to an agreed state, in the event of failure.
It is not always possible or necessary to know the nature of the adversity. For this reason, the ability to adapt and respond is often important. Understanding required system capability and how to restore it, whatever the source of failure, is one key to engineering resilience.
Similarly, it is not possible to predict how a complex system will respond to all possible adverse circumstances, or even to know what all adverse circumstances might occur. Therefore, a focus on how to recover from possible effects, outcomes, cascading outcomes and their potential effects needs consideration. It is also important to gain an understanding of how capability can be restored in the event of failure, prior to having to do it in adverse circumstances.
In the context of resilience, adversity refers to anything that could reduce or degrade the capability provided by a system under nominal conditions. Westrum identified three dimensions to adversity: its source or origin, its predictability and its severity or degree of disruption.[20] Achieving resilience requires consideration of a range of sources and types of adversity. Sources of adversity include environmental sources, human sources, and system failures that can arise from adversarial, friendly, or neutral parties. Adversities can be malicious or accidental; expected or unexpected. Adversities can be known issues, risks or unknown-unknowns. Some adversities can be expected to predictably occur, others are unpredictable and some are unexpected. Adversities can arise from inside or outside the system. An adversity can arise from a single event, or from a complex, causal chain of conditions and events that stress the system over many periods of time. Adversities include direct disruptions to the system and the result of indirect causal chains.
4.1.2 Aspects of resilience
In applying the definition of resilience, there are a range of means by which resilience is achieved. There are three fundamental objectives in achieving resilience: avoiding adversity, withstanding adversity and recovering from adversity. Due to the possibly complex nature of adversity, one or more of these can be observed in any specific situation.
Classically, the focus of resilience was on recovering or rebounding from disruptions. It has evolved to include withstanding stress, via “an increased ability to absorb perturbations”[21]. Most recently, resilience has come to include avoiding adversity. For the purpose of human-created systems, avoiding adversity is considered a means of achieving resilience. Engineering resilience has expanded to consider the system’s ability to evolve and adapt to future adversities and unknown-unknowns.
This document takes the broader view to include both proactive and reactive means of achieving resilience. As a result, the scope of means for resilience includes those taken before adversity, under stress, and after disruption.
NOTE Clause 5 provides a framework of means to achieve resilience, rooted in the definition given in Clause 3 (3.1.1) and the three fundamental objectives (3.2).
4.1.3 Relation of resilience to other system qualities
There are many quality characteristics that can be relevant to a system. Resilience is one among them. There are implicit and explicit relationships between quality characteristics, and the importance of each depends on the specific project and domain. Their relative importance influences the nature of the relationships between resilience and other quality characteristics of a system. This clause highlights the various relationships of systems resilience with quality characteristics defined in other standards, without implying any relative importance or hierarchy.
Resilience shares characteristics with various other system qualities including availability, dependability, recoverability, and reliability. Resilience is one way in which a system can be regarded as dependable and trustworthy. Resilience is also a cross-cutting concern related to reliability, security, and maintainability.
ISO/IEC 25010 identifies several (nine) product quality characteristics which are further composed of subcharacteristics.[6] Each is defined as a capability. Those characteristics (and their subcharacteristics) related to resilience include:
— reliability (availability, fault tolerance, recoverability)
— security (integrity, resistance)
— maintainability (analysability, modifiability)
— flexibility (adaptability)
— safety (fail safe)
ISO 25023 develops measures for quality characteristics that are applicable to aspects of resilience (such as measures for recoverability, fault-tolerance, and availability)[8].
Resilience can be realized in association with dependability, that collectively includes availability, reliability, recoverability, maintainability, and maintenance support performance. Not all dependable systems are affected by stress or need to recover from stress. Recoverability is a characteristic of some systems that have experienced a system fault or failure. Resilient systems can operate in limited or degraded modes while under abnormal or extreme stress, and then return to normal operations.
IEC 60300-1:2014 considers resilience in association with dependability. Dependability is a broad concept encompassing a system’s availability, reliability, recoverability, maintainability, and maintenance support performance. Dependability of a system is defined as the “ability to perform as and when required”[1]. While dependability focuses on the overall assurance that a system will perform its intended function under stated conditions for a specified period, resilience specifically addresses the system’s ability to continue delivering its capabilities under adverse situations and to recover from disruptions. Thus, all resilient systems can be considered dependable, but not all dependable systems are necessarily resilient, especially if they do not account for extreme stress or recovery from significant adversities.
NOTE Further information about dependability can be found in the IEC 60300 series.
Resilience is also used in open systems dependability, which limits resilience to a system’s “adaptive capacity in a complex and changing environment”[2]. IEC 62853 contrasts traditional notions of resilience which focused on restoring normal operation after disturbances with open systems’ recognition that normal operation changes over time and with point of view. In the present document, resilience is not limited to “complex or changing environments” but includes other adversities (see 4.2) and broadens the means of achieving resilience to include avoiding adversity.
Trustworthiness encompasses a system’s ability to maintain its integrity, reliability, and security in the face of various internal and external challenges. Recent work in Trustworthiness defines resilience as the “capability of a system to maintain its functions and structure in the face of internal and external change, and to degrade gracefully when this is necessary”[10].
Boehm et al. present a two-level hierarchy of system qualities.[15] Stakeholder-based Value System Quality Ends appear on the first level and Contributing System Quality Means appear at the second level. That paper revises an earlier version of the hierarchy. The revision emphasizes Changeability and Dependability as the contributing means to resilience. These two contributing means are in turn supported by Maintainability. Unlike the present document, Boehm et al. do not break down resilience into strategies and techniques, but simply relate these qualities to other qualities.
The European Network and Information Security Agency (ENISA) ontology addresses resilience in the context of cybersecurity. It focuses on the context of resilience, rather than the refinement of resilience as a goal into resilience objectives and techniques.[19] As in this document, ENSIA identifies several “means” but at the level of disciplines such as Security, Governance, Trust Management and Risk Management without delineating specific objectives and resilience techniques.
5.0 A resilience framework
5.1 Overview
A framework for achieving resilience can assist an engineer in achieving a design that considers all potential solutions for achieving resilience. A resilience framework is presented in this clause. It includes on the first layer, fundamental objectives; on the second layer, means objectives for achieving one or more of the fundamental objectives or other means objectives; and, on the third layer, resilience techniques applicable during architecture, design, and operations for realizing means objectives and thereby achieving resilience. The three layers of this framework are interconnected by many-to-many relationships.
5.1.1 Fundamental objectives layer
The first layer is derived from the definition of resilience. Resilience equates to achieving one or more of the three fundamental objectives: avoiding adversity, withstanding adversity and recovering from adversity. These fundamental objectives bound the possible alternatives means to be considered.[18] These three are expected to be immutable. These fundamental objectives can be achieved through one or more means objectives (see 5.3).
5.1.2 Means objectives layer
Means objectives name high-level approaches to achieving resilience. Means objectives provide intermediate goals for realizing one or more of the fundamental objectives for a system. Unlike the fundamental objectives, means objectives are not a closed set. Means objectives include:
— adversity management
— agility
— anticipation
— damage control
— evolution
— graceful degradation
— integrity
— preparation
— prevention
— rearchitecting
— redeployment
— robustness
— service continuity
— situational awareness
5.1.3 Resilience techniques layer
The third layer of the framework identifies resilience techniques. These techniques provide specific methods to implement means objectives. The set of resilience techniques is not a closed set. The relationships between means objectives and resilience techniques is many-to-many: in some cases, an individual technique can be used to realize several different means objectives. Resilience techniques include:
— absorption
— adaptive response
— anomaly detection
— boundary enforcement
— buffering
— coordinated defense
— deception
— defense-in-depth
— detection avoidance
— disaggregation
— distributed privilege
— diversification
— domain separation
— drift correction
— dynamic repositioning
— fail soft
— fault tolerance
— forward recovery
— human-in-the-loop
— least functionality
— least persistence
— least privilege
— least sharing
— loose coupling
— maintainability
— mediated access
— modularity
— neutral state
— non-persistence
— privilege restriction
— protective default
— protective recovery
— redundancy
— replacement
— resilience modeling
— resilience monitoring
— safe state
— segmentation
— self-modeling
— substantiated integrity
— substitution
— system reconfiguration
— tolerance
— virtualization
5.1.4 Resilience considerations during systems engineering life cycle activities
To achieve resilience in system operations, resilience concepts can be applied through many of the life cycle processes of ISO/IEC/IEEE 15288:2023. Specific considerations to be included in the life cycle activities as specified in ISO 15288:2023 are shown in Table 1.
Table 1 — Resilience considerations during life cycle activities
Life Cycle Process Groups | Life Cycle Processes | Resilience Considerations |
Agreement Processes | Acquisition | — Preparing a request for the supplier of a system (product or service) to deliver required capabilities under adversity — Developing an agreement with the supplier that includes acceptance criteria for the system to deliver required capabilities under adversity — Providing data related to resilience needed by the supplier to resolve issues in a timely manner |
Supply | — Evaluating requests for the supply of a system related to resilience to determine feasibility and how to respond — Negotiating agreement with the acquirer that includes acceptance criteria for the system to deliver required capabilities under adversity — Identifying necessary provisions to the agreement related to resilience | |
Organizational Project-Enabling Processes | Portfolio Management | — Coordinating among projects any required extent of system capability to deliver required capabilities under adversity — Identifying any multi-project interfaces and dependencies to be managed or supported by each project for resilience |
Human Resource Management | — Identifying and developing individuals skilled in resilience engineering practices | |
Technical Management Processes | Project Planning | — Preparing plans for recovery, including in situations where resources are compromised — Defining roles, responsibilities and authorities specific to recovery situations — Planning for alternatives when organizational capability is compromised — Defining capability recovery objectives including acceptable time to recovery and minimum acceptable capability — Evaluating recovery plans through scenario analysis |
Risk Management | — Identifying adversities and their potential effects — Planning risk management activities to handle risks, issues, and opportunities identified by resilience activities — Developing strategies of risk management to handle risks, issues, and opportunities identified by resilience activities | |
Technical Processes | Business or Mission Analysis | — Defining the problem space to include identification of adversities and expectations for performance under those adversities — Devising operational concepts and potential solution classes to consider capabilities to avoid, withstand, and recover from adversity — Evaluating alternative solution classes to consider capabilities to deliver required capability under adversity |
Stakeholder Needs and Requirements Definition | — Identifying stakeholders who understand potential adversities and stakeholder resilience needs — Identifying stakeholder needs and expectations for required capability under adversity and degraded modes of operation — Devising operational concept scenarios to include resilience scenarios — Transforming stakeholder needs to stakeholder requirements to include stakeholder resilience requirements — Analyzing stakeholder requirements to include resilience scenarios in the adverse operational environment | |
System Requirements Definition | — Achieving resilience and other adversity-driven considerations to be addressed in system requirements | |
System Architecture Definition | — Selecting viewpoints to support the representation of resilience — Achieving resilience and other adversity-driven considerations to be addressed by architecture — Constraining architecture solutions in consideration of resilience requirements | |
Design Definition | — Achieving resilience and the other adversity-driven considerations to be addressed | |
System Analysis | — Identifying necessary enabling systems or services needed to support system analysis of resilience — Reviewing the analysis results for quality and validity of resilience | |
Verification | — Identifying any constraints that potentially limit the feasibility for verification of resilience — Selecting appropriate verification methods or techniques and associated criteria for verification of resilience — Defining the verification procedures of resilience | |
Validation | — Identifying constraints that potentially limit the feasibility for validation of resilience — Selecting appropriate validation methods or techniques and associated criteria for validation of resilience — Identifying necessary enabling systems or services needed to support validation of resilience — Defining the validation procedures of resilience |
Bibliography
[1] IEC 60300‑1:2014, Dependability management — Part 1: Guidance for management and application
[2] IEC 62853:2018, Open systems dependability
[3] ISO 1709:2018, Nuclear energy — Fissile materials — Principles of criticality safety in storing, handling and processing
[4] ISO 20887:2020, Sustainability in buildings and civil engineering works — Design for disassembly and adaptability — Principles, requirements and guidance
[5] ISO 22301:2019, Security and resilience — Business continuity management systems — Requirements
[6] ISO/IEC 25010:2023, Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — Product quality model
[7] ISO/IEC/TS 25011:2017, Information technology — Systems and software Quality Requirements and Evaluation (SQuaRE) — Service quality models
[8] ISO/IEC 25023:2016, Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — Measurement of system and software product quality
[9] ISO/IEC 27031, Information technology — Security techniques — Guidelines for information and communication technology readiness for business continuity
[10] ISO/IEC/TS 5723:2022, Trustworthiness — Vocabulary
[11] ISO/IEC/IEEE 15288:2023, Systems and software engineering — System life cycle processes
[12] ISO/IEC/IEEE 24641:2023, Systems and Software engineering — Methods and tools for model-based systems and software engineering
[13] ISO/IEC/IEEE 24765, Systems and software engineering — Vocabulary
[14] NIST SP 800-160 Vol. 2 Rev. 1, Developing Cyber-Resilient Systems: A Systems Security Engineering Approach, doi: 10.6028/NIST.SP.800-160v2r1, December 2021
[15] Boehm B., Chen C., Srisopha K., Shi L. The Key Roles of Maintainability in an Ontology for System Qualities. INCOSE International Symposium, 26(1), 2026–2040, 2016
[16] Brtis J.S. “How to Think About Resilience in a DoD Context,” MITRE Technical Report MTR-160138, 2016
[17] Clemen, R. and Reilly, T., Making Hard Decisions with DecisionTools, Cengage Learning, 2014
[18] Keeney, R., Value-Focused Thinking, Harvard University Press, 1992
[19] Vlacheas, P., V. Stavroulaki, P. Demestichas, S. Cadzow, S. Gorniak, D. Ikonomou, Ontology and taxonomies of resilience. European Network and Information Security Agency (ENISA). Heraklion, Greece, 2011
[20] Westrum, R., A Typology of Resilience Situations, chapter 5 of Resilience Engineering Concepts and Precepts, E. Hollnagel, D.D. Woods, N. Leveson (editors), Ashgate, 2006
[21] Winstead M., Hild D., McEvilley M. Principles of Trustworthy Design of Cyber-Physical Systems, MITRE Technical Report #210263, 2021