A new way of thinking about operational resilience and cyber resilience. C3S Resilience Framework is useful in identifying and representing the dependencies and the inter-dependencies of people, process and technology within business services with a unique focus on continuity of service.
The Framework is based on a complex dependent and inter-dependent relationship of entities. It drills down, like a fault tree, in order to determine all elements that are needed in order to operate resiliently.
Not only do we examine the dependency during operational state, but the Framework also determines dependency during transient and failure states, and hence able to explore beyond the known-knowns space. The path into the unknown events that threaten the proper continuation of business operations is the key to solve some of the complex system mysteries.
A Resilience Framework Principle
Dependency is a very important concept in our Resilience Framework.
It is the dependence that will bring a fault in a small part of the system to a squelching halt in a system disruption. It is also the dependency that allows the system to recover from a potential failure, once a fault is healed quickly enough, the restoration can be propagated to other dependent systems.
The Resilience Framework defines seven interdependent Layers (two Layers are represented in grey). Each Layer defines a functional domain for modular resilience, and there is a clear functional dependency between each Layer. A Node (indicated as a blue outlined circle) is a collective representation of the recoverable entities in each Layer. It depicts the complete expected performance to achieve the desired outcome from that Layer.
The proper operation of a Node is dependent on the proper operation of the immediate lower level Node linked by a Path (green vertical line). There exists a vertical order in this dependency; a failure in any Active Node (indicated by a green circle with green star) will render ineffectiveness to the Active Node linked directly above by an Active Path. Therefore, there is an upward sequencing of failure effect along the line of the Active Path when a failure occurs at a lower Node.
Consequently, there exists an upward sequencing when Node recovers from inactive to active status. When a Node turns active (indicated by a green star), it will attempt to promote the Inactive Path (green dotted line) linking to the Node in the next higher Layer to become the Active Path.
Thus, the resilience behaviour of a system is expressed, assessed for vulnerabilities and the recovery mechanism analysed.
Resilience Framework Notation
C3S Resilience Framework Notation defines a set of primitives that are simple but essential in describing the resilient characteristics of an ICT system.
There are seven layers of interdependencies in the notation. Failure in any active node has to be responded by a new working path that has been established and tested. Failing which the system will not be able to continue its intended operation.
When the notation is applied to the different stages in an ICT system design, mission-critical users can easily understand the vulnerabilities and communicate them clearly.
The Resilience Lifecycle focuses on the key activities around the Actual Event, defined as the unexpected incident that threatens the proper continuation of the operations. The objective of good resilience design is solely to reduce the possibility of the Actual Event, and if that is not exhaustively possible, then a system of Command and Control (C2) Operations that allow the components affected by the Event to be recovered gracefully without disrupting the business processes.
The Unknown Event is the most uncertain domain in the Lifecycle, as they are not yet known but likely to exist. They are the deadliest Event if it ever occurs as they usually catch the organisation by surprise and because it has never been studied or analysed, the initial shell shock with a high degree of uncertainty can set back the recovery process in an unexpected dimension.
Systematic Vulnerability Assessment is used to mine some of the Unknown Events to become the new box known as Virtual Event. The Virtual Event can then be tested and be confirmed as a Potential Event or dismissed totally as a vulnerability, hence reducing the list of uncertain vulnerabilities in the Unknown Event box.