Ensuring System Reliability Through Human-in-the-Loop (HITL) Simulations: A Robustness and Resilience Approach to Disaster Management

safiye turgay; muhammed serkan şahin

Outline

Open Access

Research article

Ensuring System Reliability Through Human-in-the-Loop (HITL) Simulations: A Robustness and Resilience Approach to Disaster Management

Safiye Turgay¹^*

,

Muhammed Serkan Şahin²

¹

Department of Industrial Engineering, Sakarya University, 54050 Serdivan, Turkey

²

Institute of Social Sciences, Pamukkale University, 20070 Denizli, Turkey

Journal of Intelligent Management Decision

|

Volume 4, Issue 3, 2025

|

Pages 187-211

https://doi.org/10.56578/jimd040302

Received: 06-05-2025,

Revised: 07-12-2025,

Accepted: 07-19-2025,

Available online: 07-27-2025

View Full Article|

Download PDF

Abstract:

It is crucial to ensure system reliability in changing situations where systems are required to operate in uncertainty or against disturbances. Human-in-the-Loop (HITL) simulations have in recent times emerged as an important key in ensuring the robustness and resilience of the system via evaluation and testing. The method introduces human decision-making and adaptability as well as providing insight into zones of possible weak points of the system and failure modes which are not even captured by computer-based systems. By incorporating HITL simulations into the system, designers and engineers could simulate operational challenges in real life, identify unforeseen defects, and implement mitigative strategies to enhance both robustness-ensuring consistent performance in the nominal operation and resilience-maintaining functionality in and after disruptions. This article looks at the effectiveness of HITL simulations in various domains, and particularly at the role that these contribute to system robustness and resilience. Among the most significant issues will be the nature of building the simulation environments, the test for the range of scenarios, and the roles for humans to be simulated within the loop. We investigate the behavior of humans during stress and uncertainty, then provide valuable feedback to the system to help it learn. By revealing the vulnerabilities of the system design and acknowledging human effects on recovery and decision-making operations, HITL simulations finalize the development of more adaptable, stable systems that could recover rapidly from interruptions. To conclude, HITL simulations are a critical tool for improving the reliability of systems, hence providing a comprehensive framework to address either expected or unexpected challenges in complex operating environments.

Keywords: Human-in-the-Loop (HITL) simulations, System reliability, Robustness, Resilience, Decision-making under uncertainty, System adaptability, Operational disruptions, Recovery processes

1. Introduction

Modern systems demand reliability in an era of high interdependence, the situation worsens under the stress of dynamic, uncertain, or disruptive factors. System reliability failure leads to severe economic, social, and safety consequences. For example, in aviation, health care, power networks, and autonomous transportation. Traditional methods of testing and establishing the reliability of a system could perform satisfactorily in the laboratory setting but have at all times been proven wanting whenever they were subjected to the unstructured reality of the outside world. In such situations, the integration of human factors into system testing by employing HITL simulations offers a principal advantage.

HITL simulations offer a hybrid solution to where human operators engage the automatic or semi-automatic system in real time. The determination of system performance is becoming progressively more holistic in nature. These allow far higher accuracy for the operation of the system in cases of the real world by incorporating human decision-making as well as handling of uncertainty, and flexibility. This approach becomes particularly pertinent when considering robustness and the ability of the system to preserve stable behavior under normal operating conditions, resilience, and ability to adapt, recover, or renew itself in the face of disruptions or unexpected adversity.

The recently merged wisdom views robustness and resilience as the two faces of one and the same coin, and the pillars of system dependability. While robustness is concerned with the predictability of operation performed by a system under a given envelope of operations, resilience is brought to the fore for the ability of the system to resist unexpected stressors, interruptions, and failures in its sub-systems. Both attributes contribute to the design of systems which are capable of surviving the inherent uncertainties and adversities of their holistic operational environments.

The paper expounds on robustness, resilience, and the way in which HITL simulations could be related to obtain system reliability. It also emphasized the problems encountered in the design and implementation of HITL simulations in various industries, such as system weakness identification, system adaptability enhancement, and overall system performance improvement. We hope to illustrate, through the use of case studies and simulation output, how HITL simulations could help build systems that are robust and, more significantly, resilient to overcome challenges. We explained first in the following sections the theoretical basis of robustness and resilience and how the basis is utilized in system design. Second, we discussed the mechanics of HITL simulations by explaining methodologies and tools which enhance the efficiency of their application. Lastly, some examples were used to demonstrate the engineering benefits of HITL simulations in enhancing system reliability, and thereafter recommend future research and development in the significant area.

This paper is structured as follows: Section 2 provides a review of the state-of-the-art methods. Then, Section 3 provides a succinct explanation of the Ensuring System Reliability Human-in-the-Loop Simulation Model. We presented our simulation model and analyzed the experimental results in Section 5. Finally, Section 6 provides a conclusion of the paper.

2. Related Works

Research on system reliability, mostly from the aspects of robustness and resilience, has been attracting enormous interest from various fields like engineering and social systems. The chapter gives a review of literature relative to the provided concepts and the evolving role of HITL simulations as one of the primary methods of enhancing the performance of a system under determinable and indeterminate conditions. Literature in the domain of resilience assessment frameworks has developed significantly with more focus on system dynamics, cyber-physical integration, and high-response capability in many domains. Zhang et al. [1] provided a dynamic assessment framework to measure the long-term impact of urban policy on resilience. Their paper promoted dynamic systems modeling for reaching socio-economic and environmental interactions to develop an integrated urban resilience planning framework. Supporting this perspective, Zhang et al. [2] also presented an extensive review of optimal control strategies and modeling strategies in achieving cyber-physical resilience of power grids. Their study united methods such as robust optimization, adaptive control, and machine learning to mitigate vulnerabilities of smart grids. Emphasis on predictive modeling and real-time control underscored the need for resilience against mounting cyber and physical threats. Pawar et al. [3] also contributed towards the discussion by creating a system-specific assessment framework for a rapid response process system that uses system performance indicators, degradation models, and recovery methods for assessing the resilience capability of the industrial system in responding to disturbing events and rapid recovery. These papers together offered a sophisticated view of resilience in urban policy, smart grid networks, and industrial safety, though emphasized the need for dynamic, real-time, and policy-focused assessment techniques. Resilience and robustness are fundamental concepts of system design and theory in reliability. While often used interchangeably with great frequency, they are different aspects of system behavior [4]. Recent studies on resilience modeling and assessment have revealed diverse approaches to critical infrastructures, urban networks, and disaster management.

Moradi et al. [5] adopted a system dynamics approach to seismic resilience modeling for Society 5.0, suggesting an anticipatory approach that integrated socio-technical factors into resilience planning. Their case study illustrated the relationship between societal innovation and disaster resilience. Yang et al. [6] proposed a quantitative resilience assessment framework for advanced engineered systems using dynamic performance metrics and adaptive capacity to disruption. Leštáková et al. [7] critically analyzed the applicability of commonly used resilience metrics in water supply systems, hence challenging assumptions and the need for functionally more valuable metrics. Xu et al. [8] undertook simple review and emphasized the cyber-physical nature of power system resilience, for which the integration of ICT and operational technology was vital for system stability. Zhang et al. [9] used the pressure-state-response (PSR) model to assess urban ecosystem resilience in Yangtze River Delta, China with emphasis on socio-ecological interactions. Li et al. [10] applied a system dynamics model to describe urban flood resilience, correlating infrastructure response to Sustainable Development Goals. Tseng and Stojadinović [11] presented Capabilities-based Interface for Socio-Technical Resilience (CI-STR), a new modeling interface for socio-technical systems in disasters, to improve capabilities-based resilience assessments. Magoua and Li [12] pointed out the human factors involved in resilience modeling and recommended their integration into infrastructure safety planning. Gu et al. [13] put forward a Bayesian decision network to optimize cost-effective pre-disaster mitigation against cascading seismic events. Besides, He et al. [14] studied resilience in underground urban space under flooding and reviewed current solutions and research directions for the future. Collectively, these studies revealed the evolution of resilience as an adaptive concept; it was also a dynamic and multidisciplinary phenomenon with physical, cyber, social, and ecological dimensions. The concept of robustness focused on the ability of a system to operate reliably under expected operating conditions and within given constraints.

Robust systems are designed to withstand internal and external variabilities with minimal degradation in performance according to some research works. These classic works implied that a system could work under various levels of stress. Resilience, however, is the degree of a system to rebound and change from abrupt disturbances; this is also a differentiating feature in ecological systems. Resilience theory, when extended to other disciplines, characterized resilience as the capacity of a system to endure shocks, reorganize, and transform under changed conditions. The idea has been employed in numerous diverse uses. Literature showed that systems being engineered for robustness were not always resilient and vice versa; therefore, the approach had to be thoroughly reviewed for general system reliability.

This differentiation is crucial in settings where the adaptability to unplanned disruption is equally important as the capability to generate uniform performance under steady conditions. Contemporary thought on resilience in infrastructures and socio-technical systems increasingly emphasizes the necessity of inter-disciplinary, evidence-based, and adaptive approaches to managing cascading disasters, interruptions, and recovery planning. Gong et al. [15] analyzed the interdependence of critical infrastructures through empirical research of cascading disasters and found common patterns of vulnerabilities in different case studies. Gros et al. [16] showed use cases for diagnostics and simulations by means of knowledge graphs, e.g., use by Notre-Dame's buttressing system. Digital twin technology lies at the core of Gebhard et al.'s [17] research, in which they outlined concepts and modeling challenges for enhancing socio-technical urban infrastructure resilience. Research across a broad spectrum of disciplines has confirmed the potential of HITL simulations to boost system reliability; some of the areas in which HITL could be applied include Human Factors Engineering, Control Systems, and Artificial Intelligence. HITL simulations allow the integration of human decision-making into the testing and validation process for an understanding of human errors and adaptability in complex situations.

Aghazadeh et al. [18] developed a dynamic reliability model for safety instrumented systems with a focus on temporal evolution of system integrity, whereas Wang et al. [19] used system dynamics in estimating long-term transportation resilience. Lagap and Ghaffarian [20] discussed the application of digital twins for post-disaster risk management and introduced a more sophisticated conceptual framework. Public health emergency supply chain resilience was explored by Sun and Liao [21]with an emphasis on reliability-based risk transmission models. The latest research focused on situation awareness in HITL simulations in relation to the primary role of human response in real time towards system performance. These showed that operator-in-the-loop simulations are superior in discovering potential weaknesses, especially in conditions of uncertainty and rapid decision-making [22],[23],[24],[25]. Alipour et al. [22] built a bi-objective maintenance policy focusing on resilience and robustness using dynamic Bayesian networks , whereas Pawar et al. [23] critically examined the application of resilience engineering in industrial systems. COVID-19 port resilience was analyzed by Gu and Liu [24] using the Hierarchical Holographic Modeling (HHM) with a Fuzzy Cognitive Map (FCM) approach whereas Du et al. [25] discussed the contributions of electric vehicles in post-disaster power recovery. Stress testing systems using HITL simulations has been emphasized in the context of safety-critical systems. Self-healing techniques in electrical power systems are meta-analyzed by Shittu et al. [26] with autonomous recovery modes. Their system safety and accident prevention research illustrated how human factors uncovered latent hazards, especially in high-hazard fields like aviation, healthcare, and automated systems.

Hosseini [27] presented post-disaster network reliability based on sensitivity analysis while Chen and Wang [28] investigated cyber-physical power system resilience under multi-stage attacks. There are numerous papers citing the success of HITL simulations in enhancing robustness and resilience in various domains, such as aerospace, and health care, etc. Although HITL simulations have demonstrated impressive advantages, several gaps and opportunities exist for further research. For instance, machine learning and AI effects on HITL simulations are growing areas for research.

Existing research, for instance, investigated how AI systems could collaborate with human operators in HITL environments, yet more research could be conducted to comprehend long-term AI-human co-adaptation effects in dynamic systems. Decentralized recovery strategies were illustrated by Maddah and Heydari [29] in strategic network dynamics. Urken et al. [30] demonstrated a long-term paradigm in sustainable and resilient engineering design. Yang et al. [31] assessed the resilience of complex equipment systems by decomposing its elements and indicators whereas Landwehr et al. [32] presented validation methods for flood maps through Earth observation. Caldera et al. [33] recommended incorporating disaster planning in road infrastructure management while Hossain et al. [34] debated the way power grids were sustained under natural disasters. As the increasing role of AI enhances decision-making and maximizes system utilization, it is crucial to conduct a deeper investigation of how AI technology strengthens or compromises system resilience in scenarios involving human intervention. By focusing on positive and negative effects of AI in HITL systems, this paper was supported by a more general description of the AI contribution to system performance; this is an essential link in the confines. In addition, there is an increasing need to investigate the intersection of ethical issues in HITL simulations, especially in autonomous systems where human involvement is minimal. A number of scientists advocated more emphasis on ethical decision-making in the HITL framework, pointing out that at certain times, human decisions made within simulations would bias the outcome of the system. Khameneh et al. [35] applied hybrid simulation and machine learning to analyze disruptions due to disinformation on public transport systems whereas Mazur et al. [36] developed an integrated resilience plan for electric power infrastructure in rural emerging economies.

Moreover, Jafarian et al. [37] employed hybrid optimization techniques to enhance evacuation plans in the scenario of geographic hazards, and Rijal et al. [38] published a world review on mountain flood hazards under climatic and anthropogenic changes. The study mentioned the restriction of simulation and possible causes of uncertainty in the model, e.g., assumptions about human behavior, outage duration, and failure rates for the system, for different circumstances. The simulation results of the uncertainties in parameters were compared and statistical analysis determined the effect of uncertainty on the outcome. Sensitivity analysis or confidence intervals for model projections and results were discussed in the context of framing the bounds of application of the simulations. Evidence from literature convinced that HITL simulations were some of the most beneficial methods contributing to system resilience and robustness. The intermingling of human behavior, decision-making, and modifying capacity has enabled the understanding of system operation both in operational conditions and in catastrophic events, through the possibilities of HITL simulations. Jain et al. [39] proposed Process Resilience Analysis Framework (PRAF)—a system-level industrially oriented resilience framework—to maximize industrial safety and risk reduction.

The above articles collectively set the trend towards adaptive, integrated, and smart resilience planning for infrastructural, environmental, and industrial systems. Future research into HITL simulations could be linked to further integration with AI, more ethical issues, and wider diffusion in industries, especially in methods where systems are more complex and autonomous.

3. Methodology

This paper suggested a multi-perspective approach to measure the effectiveness of Human-in-the-Loop (HITL) simulations in enhancing the robustness and resilience of a system. The approach is modular in multiple ways: system selection, simulation design, scenario development, human participant involvement, and data analysis. Each section was structured in depth to cover the application of HITL simulations in a range of systems, from autonomous technologies to critical infrastructure, and to reveal weaknesses and dependability of systems both in the routine and disruptive conditions. The application of HITL simulations was tested to ensure system dependability by selecting three types of systems from different applications; each was reliant on a combination of human decisions and automatic computation. The autonomous vehicle control system, in respect of the escalated use and human dependence on autonomous transport technologies, was selected to establish the use of HITL simulations in robustness for normal driving situations and in resilience for catastrophic situations, like in the form of obstructions or system breakdowns.

Cyber-physical systems (CPS) refer to a connection of physical processes. Simulations like HITL investigate man and machine responses in the case of cyberattacks or system failures and verify the resilience and robustness of defense mechanisms against the protection of critical infrastructure.

3.1 Simulation Design

The HITL simulation environment is a mixed-method design composed of qualitative and quantitative information. The simulated world would be transformed to mimic real-world operational environments with both automated and human inputs. The steps involved in the design of simulations are:

System Modeling: Each system to be selected will be modeled computationally into a simulation environment that could accurately simulate its normal running procedures. Key performance metrics for robustness, namely system accuracy and fault tolerance, and resilience, such as recovery time and adaptability to disruptions, were identified in every system.

Human Role Integration: In every system simulation, human subjects are introduced to the decision-making roles to aid or substitute the automated processes. The subjects are trained about the operational limitations of the system but are free to act independently under simulated disruptions.

Automation Units: The autonomous part of each system is intended to perform routine operations. This includes creating conditions under which the automation fails or is faced with unexpected events, such that human input becomes the focus. Automation offers effectiveness and dependability under typical operating conditions; it will probably fall short of random, dynamic, or complex conditions when human monitoring and reactive decision-making take on a primary significance.

The extended consideration of this paper concerned the inherent limitations of automation, like in adverse or high-stakes conditions where automated systems will have difficulty coping with unexpected disruptions or adapting to new situations and this could be an area for future study. By acknowledging these limitations, the paper is in a position to put more weight on the requirements for human intervention in ensuring system reliability and resilience and present a more balanced debate regarding how the human-automated collaborative effect affects system performance under stressful situations.

3.2 Scenario Development

Various scenarios are used to test robustness and resilience for various conditions: nominal and disrupting incidents, following the development of every system simulation.

Nominal Conditions: These are nominal condition systems. The primary intention will be to understand how these systems perform when there is no abnormal condition or any other unforeseen variables. System efficiency, precision, and user satisfaction are some of the primary performance measures to be measured.

Disruptive Events: A particular unforeseen challenging circumstance or failure, i.e., system failure, environmental hazards, or security breach, is introduced into the picture. Human operators are to respond to these disruptions to witness the resilience of the system. Recovery time, system performance during disruptions, and human intervention effectiveness are tracked to attain this goal.

The three levels of disruptions are:

- Low-level disruptions: small errors, minor environmental changes;

- Medium-level disruptions: system component failures, moderate threats;

- High-level disruptions: fatal failures, external cyber-attacks.

3.3 Human Participant Involvement

Human operators are at the center of HITL simulations, making in-the-loop decisions that affect system performance. Participants were selected based on their experience with the systems to be tested, yet to balance experts with novices to capture real-world variations at the skill level of operators.

Training Sessions: The simulated participants were provided with a small amount of training before simulations of the system behavior under routine conditions, such as communicating with the automated aspects and decision-making in case of a disruption. The training varies from simulation to simulation so that it is possible to analyze how familiarity affects human responses.

Data Collection during Simulations: Multiple measurement points were gathered from the simulations, including system performance, human decision-making time, stress levels, and communication patterns. The interaction of human participants with the system was recorded and captured through screen recording.

3.4 Data Analysis

Data analysis of the HITL simulations was focused on both system performance and human decision-making:

Quantitative Analysis: System performance data were analyzed using statistical techniques in the robustness analysis, i.e., consistency in task completion in nominal conditions, and resilience analysis, i.e., recovery time and error rate in case of disruption. Human reactions were measured in respect of decision accuracy, recovery time, and frequency of successful intervention.

Qualitative Analysis: Products from such an analysis were post-simulation interviews of subjects regarding their experience, perceptions of their decision-making behaviors, and perceived system vulnerabilities. Such comments revealed human factors which might not be so easily discernible from the quantitative indices.

Comparability Analysis: The results of HITL simulations were compared with those in which human involvement was reduced to a minimum or eliminated entirely in order to compare automated with HITL simulations. This gave a better understanding of the correlation between human decision-making and system robustness and resilience.

3.5 Sensitivity Analysis

Sensitivity analysis decided how various factors influencing system performance, such as system complexity, participant expertise, and interruption severity, were affected. With this, it helped decide the conditions under which human actions maximized system reliability.

3.6 Validation

The results obtained through the simulations were finally validated by cross-testing between different system configurations and sets of participants such that the findings were not applicable to specific situations but could be generalized across different types of systems and operating environments (in Figure 1).

This research approach therefore explained how Human-in-the-Loop simulation might be utilized to construct enhanced robustness and resilience in complicated systems. This study incorporated human decision-making into the simulation platform and experiments with responses against different interruptions, with the purpose of arriving at the knowledge of how to design more dependable systems under operational difficulties in real life.

Figure 1. Suggested model framework

4. Mathematical Model

The mathematical model for system reliability guaranteed in the context of HITL simulations in this instance, added quantification to system resilience and robustness that were integrated with both automated and human decision-making functionalities. The model helped establish with accuracy the performance of the system under conditions of nominal-robustness-and-disruptive-disruption by way of adaptiveness or recovery-resilience. The approach employed performance metrics, human interaction variables, and disruption response characteristics in total system reliability estimation.

4.1 Key Variables and Parameters

The following key variables and parameters were used to model the system behavior in HITL simulations:

$P(t)$: System performance at time t (e.g., task accuracy, output quality).

$H(t)$: Human intervention impact on system performance at time $t$.

$D(t)$: Degree of disruption at time $t$ (e.g., low, medium, high disruption).

$R(t)$: Recovery function, which indicates how quickly the system recovers from normal operation following a disruption.

$\tau_H$: Human decision-delay or response time.

$\tau_R$: Time to return to normal after a disturbance.

$M_A(t)$: Performance of the automated system at time $t$, independent of human action.

$E_h(t)$: Human error probability at time $t$.

$S_c(t)$: Human and automated system capacity or resilience at time $t$, including both human and automated components.

To this purpose, the activity in this study developed a mathematical model of system dynamics with human decision and response to interferences that enabled quantification of HITL simulation effects on system robustness and resilience. This gave a formal description for the operation of a system under normal and interfered states, hence allowing measurement of key measures for robustness and resilience.

(1) Initialization:

Insert all parameters and system variables to simulation.

System Model S: The model is not degradation, but also probable component failure capable of affecting the system.

Nominal Performance $P(0)$: At $t=0$, The system is configured in normal operation at, $P(0)=1$.

System Degradation Rate $\lambda_{\text {sys }}(t)$: Variable rate of degradation with time as a result of normal use and low-level failures. This may be a function of the use of the system or some other external condition.

$\lambda_{\text {sys }}(t)=\lambda_{\text {base }}+\lambda_{\text {ext }}(t)$

(1)

where, $\lambda_{\text {base }}$ is a rate of degradation, and $\lambda_{\text {ext }}(t)$ is an externally controlled degradation factor (e.g., environmental factors) that is constant.

Resilience and Robustness Thresholds: $P_{\text {acceptable }}$ is the threshold for acceptable system performance.

Disruption Events $D_i$: Each event has its own magnitude, $D(t)$ severity $\gamma_{\text {sys }}(t)$, and duration.

$D(t)=D_{\text {init }}\left(1-e^{-\delta t}\right)$

(2)

where, $D_{\text {init }}$ is the initial magnitude of the disruption, and $\delta$ controls the rate at which the disruption builds up.

(2) System Nominal Operation (Pre-Disruption)

Nominal Performance over Time:

$P_{\text {nominal }}(t)=1-\int_0^t \lambda_{\text {sys }}(\tau) d \tau$

(3)

This equation depicts the accumulation with time, for a progressive loss of system performance due to wear during use.

Warnings and Deviations: Small deviations, supplied by a small Gaussian noise term $\epsilon(t)$, are injected to simulate system "awareness":

$\epsilon(t) \sim \mathcal{N}\left(0, \sigma_{\text {noise }}\right)$

(4)

The system is sensitive to possible perturbations in case the deviation drifts beyond a threshold $\sigma_{\text {warn }}$.

4.2 System Model

Considered the S system operates based on a combination of automated processes and human interactions. The system state at any time $t$ is defined by a vector, $X(t)$, such that $X(t)$ denotes all system parameters relevant at any point of time $t$, such as performance measures, resource capacities, and environmental attributes.

$X(t)=\left[x_1(t), x_2(t), \ldots, x_n(t)\right]$

(5)

where, $x_i(t)$ is the i-th parameter of the system.

Its operations will be under nominal conditions and disturbances: Nominal conditions would then be the very usual and normally expected operating condition, while disturbances based on external shocks and failures that always push the system away from stability.

The core of the mathematical model is the system performance function, describing the degree to which a system performs its tasks under given conditions. Let $P(t)$ be the performance of the system at time $t: P(t) \in[ 0,1]$, where it corresponds to full functionality of the system, and 0 corresponds to the complete failure of the system.

Here, the performance of the system is defined both relative to system internal variables and relative to external variables like human interventions and environmental conditions. Under normal operation, i.e., in the absence of interference, the system performance can be represented by a base function:

$P_{\text {nominal }}(t)=1-\lambda_{\text {sys }} t$

(6)

$\lambda_{\text {sys }}$ being the nominal system deterioration rate over time due to internal wear and fatigue or regular operational causes:

The overall performance of the system, $P(t)$, is defined as the performance due to the automated system with the impact of the intervention of humans made up for the effects of disruption:

$P(t)=M_A(t)+H(t)-\lambda D(t)$

(7)

where:

$M_A(t)$ is the system's basic performance.

$H(t)$ is what human intervention brings to the system, being an improvement or a worsening in performance based on the nature of decisions taken.

$\lambda$ is a weight parameter that measures the effect that interruptions have on system overall performance.

$D(t)$ represents the size of disruption at time $t$.

4.3 Human-in-the-Loop Contribution

In HITL simulations, the decision of the human operator determines the temporal trajectory of the system. The movement of the human operator at times can be described by, with automatic control actions given by. Typically, the system input is the combination of human and automatic actions:

$U(t)=\alpha U_h(t)+(1-\alpha) U_a(t)$

(8)

where, $\alpha \in[ 0,1]$ is the relative weight of human control within the system. Autonomous systems have, while HITL systems have $\alpha>0$. The parameter $\alpha$ may change over time or between scenarios, depending on expected levels of human intervention.

4.4 Human Intervention Impact

Human intervention is central to the recovery of performance after disruption. The intervention is presented as a counteractive measure that undoes the impact of the disruption. Let $I(t)$ denote the effect of human intervention on the system performance in the long run, with $I(t) \in[ 0,1]$, where larger values show more effective intervention. This can be modeled as a feedback control system where the size of human intervention is a function of observed loss of system performance and human operator response time. The effect of the intervention has been modeled as:

$I(t)=\alpha \cdot\left(1-P_{\text {disrupted }}(t)\right) \cdot e^{-\beta\left(t-\tau_{\text {taman }}\right)}$

(9)

where:

$\alpha$ is the operator's human effectiveness factor of recovery from system failures,

$\beta$ is the rate of decay over time for any action that is delayed.

$\tau_{\text {human }}$ is the response time of the human operator to the disturbance.

Above all, the impact of human action, $H(t)$, depends both on the timeliness and correctness of decisions. The dynamic model of this dependence involves both a factor of response time and a factor of decision correctness:

$H(t)=\left(1-E_h(t)\right) \cdot e^{-\alpha \tau H}$

(10)

where:

$E_h(t)$ is the likelihood of human error, greater in stressful or highly disordered conditions,

$\alpha$ sensitivity parameter used to modify the weight of time delay in decision-making,

$e^{-\alpha \tau_H}$ is the fading effect of delayed human action-that is, with an increasing delay, the worth of human decisions decreases.

Human intervention comes at time $t_{\text {disruption}}+\tau_{\text {human}}$.

Human Intervention Effect: Human intervention effect is depicted as a dynamic function of system degradation and time passed since disruption:

$I(t)=\alpha \cdot\left(1-P_{\text {disrupted }}(t)\right) \cdot e^{-\beta\left(t-\tau_{\text {humman }}\right)} \cdot\left(1+\epsilon_{\text {human }}(t)\right)$

(11)

where:

$\alpha$: Intervention effectiveness scaling factor.

$\beta$: Decay rate of intervention with time.

$\epsilon_{\text {human }}(t) \sim \mathcal{N}\left(0, \sigma_{\text {cognitive }}\right)$ represents cognitive biases or human decision-making mistakes with time.

System Performance with Human Intervention:

$P(t)=P_{\text {disrupted }}(t)+I(t)$

(12)

This equation normalizes the system performance proportionally to the human intervention.

Recovery Criterion: The recovery is when the performance crosses the acceptable threshold:

$P(t) \geq P_{\text {acceptable}}\text {, mark system as recovered at time} \quad t_{\text {recover }}$

(13)

4.5 Post-Recovery System Monitoring

After system recovery, the system performance is observed for residual effects:

Residual Effects Modeling:

$P(t)=P_{\text {nominal }}(t)- \text{Residual Effects} $

(14)

Residual impacts can emerge as a function of duration and severity of the disruption:

$\text {Residual Effects} =\lambda_{\text {residual }} \cdot \int_{t_{\text {recover }}}^\tau \gamma_{\text {sys }}(\tau) d \tau$

(15)

where, $\lambda_{\text {residual }}$ is the residual long-term degradation due to disruption.

4.6 Disruption and System Recovery

The model represents disruptions as exogenous shocks to the system reducing performance by a factor of, where $D(t)$ is the size of the disruption. Resilience-or the recovery ability of the system from such disruptions-is viewed as the recovery function $R(t)$. The recovery function is usually an exponential function specifying how rapidly the system returns to its nominal value:

$R(t)=e^{-\beta \tau_R}$

(16)

where:

$\beta$: Intrinsic resilience coefficient of the system, expressing the recovery speed.

$\tau_R$: Time the system takes to recover after an interruption.

For disruptions, the performance of the system is impacted by external environment stressors. Let us define a disruption function $D(t)$, where $D(t) \in[ 0,1]$ is the disruption value at time $t$, and 1 is highest scale and 0 is no disruption. The system's performance during disruptions is given as:

$P_{\text {disrupted }}(t)=P_{\text {nominal }}(t)-D(t) \cdot \gamma_{\text {sys }}(t)$

(17)

Here, $\gamma_{\mathrm{sys}}(t)$ is a system vulnerability factor that measures the system sensitivity to the disruption, with higher values indicating greater sensitivity to the disruption.

After a disruption event at time, the performance of the system traces the recovery path:

$P(t)=P\left(t_0\right) \cdot R\left(t-t_0\right) \quad \text{for} \quad t>t_0$

(18)

This function describes the performance deterioration after a disruption in, and restoration over time of, the system resilience coefficient.

At some point $t=t_{\text {disruption }}$, cause a disruption:

Disrupted System Performance:

System performance is reduced by the impact of the disruption, a function of severity in terms of time.

Disruption Magnitude and Impact:

$D(t)=\left\{\begin{array}{cc}D_{\text {init }}\left(1-e^{-\delta\left(t-t_{\text {disrupton }}\right)}\right) & \text { if } t \geq t_{\text {disruption }} \\ 0 & \text { otherwise }\end{array}\right.$

(19)

This models how the disruption becomes stronger with time. The size of determines the growth in strength of the disruption.

4.7 Dynamics of the System under Nominal Conditions

During nominal operating conditions-no perturbations-the system follows a stable path. The evolution of the state of the system obeys the following differential equation:

$\frac{d X(t)}{d t}=f(X(t), U(t))-d(X(t))$

(20)

where:

$f(X(t), U(t))$ is the natural dynamics of the system under control inputs $U(t)$.

$d(X(t))$ represents minor disturbances or inefficiencies in system behavior (depicted as noise).

The sentiment is robust if, under nominal conditions, state vector $X(t)$ remains within a satisfactory range of performance $X_{\text {nominal }}$, i.e.,

$X_{\text {nominal }}^{\min } \leq X(t) \leq X_{\text {nominal }}^{\max }$

(21)

4.8 System Response to Disruptions

The perturbation at time $t_d$ perturbs the system state and forces the system to a transient regime with degraded performance. Assume the magnitude of the perturbation is characterized by a shock function $D(t)$, in that:

$D(t)=\left\{\begin{array}{cc}0 & \text { if } t<t_d \\ \delta(t) & \text { if } t \geq t_d\end{array}\right.$

(22)

with $\delta(t)$: The size and duration of the disruption.

The performance of the system after the disruption is regulated by the following altered dynamics:

$\frac{d X(t)}{d t}=f(X(t), U(t))-d(X(t))-D(t)$

(23)

4.9 Robustness Metric

System robustness refers to the system's ability to maintain consistent performance when nominal conditions exist. It is expressed by the variation of system performance:

$\text{Robustness} =\frac{1}{T} \int_0^T(P(t)-\bar{P})^2 d t$

(24)

where:

$T$ denotes the overall time monitored.

$\bar{P}$ is the average system performance under normal conditions.

A smaller robustness measure shows the system is stable or uniform under normal operating conditions.

Robustness is the power of the system to continue functioning under normal conditions despite internal deterioration or minor external disturbances. Robustness $R_{\text {robust }}$ we describe as a time dependence of the deviation in system behavior from its nominal value:

$R_{\text {robust }}=\frac{1}{T} \int_0^T\left(1-\left|P_{\text {nominal }}(t)-P_{\text {disrupted }}(t)\right|\right) d t$

(25)

This measure has a value between 0 and 1, where 1 represents no deviation from nominal performance, i.e., optimal robustness, and values near 0 represent poor robustness with extensive performance deterioration.

4.10 Resilience Metric

Then, the resilience is quantified by the following: A system's ability to get back to acceptable levels of performance after being disturbed: A clear quantitative measure of resilience -indicates that percent restoring performance upon disruption incident will be:

$R_{\text {resilience }}=\frac{P\left(t_{\text {recover }}\right)}{P\left(t_{\text {disiruption }}\right)} \cdot 100$

(26)

where, $t_{\text {disruption }}$ is the time at which the disruption begins, and $t_{\text {recover }}$ is the time at which the system returns to a predetermined level of acceptable performance (e.g., 90% of nominal performance). This defines the system resilience, with the disruption experienced by the system, in an environment that can recover after the shock, recovering towards acceptable levels within some given time $T_r$. Mathematically, such time-dependent state recovery functions represent:

$R(t)=\frac{\left|X_{\text {nominal }}-X(t)\right|}{\left|X_{\text {nominal }}-X\left(t_d\right)\right|}$

(27)

where, $R(t)$ is the system recovery ratio at any time t after the interruption. The system is resilient if $R\left(T_r\right) \approx 1$ in a certain time $T_r$ for its restoration.

Resilience is represented by the ability of the system to recover from disruptions. This can be represented in terms of the recovery time $\tau_R$ and the system's performance following the interruption as opposed to its nominal performance:

$\text{Resilience} =\frac{1}{\tau_R} \int_{t_0}^{t_0+\tau_R} \frac{P(t)}{P\left(t_0\right)} d t$

(28)

where:

$t_0$: time of occurrence of the disruption.

$\tau_R$: time of recovery following the disruption.

$P\left(t_0\right)$: performance of the system at the time of disruption.

A greater resilience score reflects quicker recovery and higher post-disruption performance.

(1) Measuring resilience: Measure the speed of recovery and effectiveness of the system after the disruption.

(2) Resilience metric:

$R_{\text {resilience }}=\frac{P\left(t_{\text {recover }}\right)}{P\left(t_{\text {disruption }}\right)} \times 100$

This index measures recovery in percentage of original level of performance and approximates the recovery speed and efficiency.

4.11 General Reliability Index

In order to measure general system reliability, calculate the weighted sum of the measurements of the robustness and the resilience:

$R_{\text {combined }}=\omega_1 \cdot R_{\text {robust }}+\omega_2 \cdot R_{\text {resilience }}$

(29)

where, $\omega_1$ and $\omega_2$ are weights that rely on the system's context.

4.11.1 Total reliability model

The overall reliability of the system, combining robustness and resilience, is given as:

$\text{Reliability} =w_1 \cdot \text{Robustness} +w_2 \cdot\text{Resilience}$

(30)

where, $w_1$ and $w_2$ are weighting factors relative to the relative necessity for robustness versus resilience in respect to the given system. Numerical values of $w_1$ and $w_2$ can be variable relative to operating environment and system criticality and whether performance is required to be sustained or not.

4.12 Highly Specialized Systems Training and Knowledge Transfer

To address the problem of training and knowledge transfer from non-experts to experts for highly technical systems, we can create a model that considers learning, human performance improvement, and risk reduction due to knowledge transfer. The goal is to reduce expert dependency by improving the skill level of non-expert participants gradually, thus overall system reliability as well as reducing risks due to expert over-dependency.

4.12.1 Major elements of the model

(1) Non-Expertise of Experts$\left(P_{\text {non-expert}}(t)\right)$:

The level of performance of non-experts in the long term, as they are trained and advised by experts.

(2) Expert Contribution($P_{\text {expert}}$):

The uniform level of performance of an expert. This could be taken to be a constant value as experts are supposed to perform at a high level as they are experts.

(3) Training Efficiency $((\eta(t))$:

The rate at which the performance of non-experts increases with the passage of time due to training.

It can be represented as a function of effort and training duration.

(4) Mentorship Impact $(M(t))$:

The positive effect of expert influence on non-experts. It can include transferring knowledge, facilitating decision-making, and experiential mentoring that facilitates the learning process for the non-expert.

(5) System Reliability $(R(t))$:

The overall system reliability will be the product of the compounded performance of the experts and non-experts, and the effectiveness with which the non-experts learn and contribute.

4.12.2 Breaking down in steps

Improvement in the performance of the non-expert over time can be measured by a learning curve representing incremental improvement through training and mentoring. The performance of the non-expert at time $t$ can be measured as:

$P_{\text {non-expert }}(t)=P_{\text {initial }}+\eta(t) \cdot\left(P_{\text {expert }}-P_{\text {initial }}\right)$

(31)

where:

$P_{\text {initial }}$ is the non-expert initial level (before training).

$\eta(t)$ is the training efficiency function, increasing with time because the non-expert improves.

$P_{\text {expert }}$ is the performance level of the expert (the level to which the non-experts are trying to approximate).

Training efficiency $\eta(t)$ may be a function of effort and time, perhaps with parameters such as frequency of training and instructional quality:

$\eta(t)=\alpha \cdot\left(1-e^{-\beta t}\right)$

(32)

where:

$\alpha$ is the learning rate constraint.

$\beta$ is a learning rate control parameter (the higher the value, the faster the learning).

$t$ is learning and training time.

The expert mentoring effect is indicated by $M(t)$, the assistance that the expert provides over time. The effect tapers off as the non-expert improves but is worthwhile in continuous improvement:

$M(t)=\gamma \cdot\left(1-e^{-\delta t}\right)$

(33)

where:

$\gamma$ is the optimal mentoring effect.

$\delta$ is the parameter the rate of decline of the mentorship effect with respect to time (as expertise grows for the non-expert) depends upon.

$t$ is the length of time for which the non-expert has been trained.

The overall system reliability $R(t)$ depends on the expert and the non-expert performance. It can be conceptualized as the weighted sum of the two, having regard to the contribution of experts and non-experts towards overall system performance:

$R(t)=\omega_{\text {expert }} \cdot P_{\text {expert }}+\omega_{\text {non-expert }} \cdot P_{\text {non-expert }}(t)$

(34)

where:

$\omega_{\text {expert }}$ and $\omega_{\text {non-expert }}$ are the weights of relative contributions of the non-expert and the expert to the overall system reliability (where $\omega_{\text {expert }}+\omega_{\text {non-expert }}=1$).

$P_{\text {expert }}$ is the expert's performance (time-invariant).

$P_{\text {non-expert }}(t)$ is the temporal performance of the non-expert as a result of training and advice.

The model also captures the risk of over-trust in non-experts in the system. The more competent the non-expert is, the more trustful the system is, but over-trust in non-experts when there is no governance would lead to instability. To capture this risk, we introduce a risk factor Risk ($t$), which increases if the non-expert's performance is too different from the expert's performance:

$\operatorname{Risk}(t)=\kappa \cdot\left|P_{\text {expert }}-P_{\text {non-expert }}(t)\right|$

(35)

where:

$\kappa$ is a system parameter specifying the system's sensitivity to performance changes.

$\left|P_{\text {expert }}-P_{\text {non-expert }}(t)\right|$ is a non-expert to expert difference in performance.

The effectiveness of the training and knowledge transfer process can be quantified by the rate of success in training $S(t)$, defined as the variation of system reliability over time:

$S(t)=\frac{R(t)-R(0)}{R_{\max }-R(0)}$

(36)

where:

$R(t)$ is system reliability at time.

$R(0)$ is initial system reliability (before training).

$R_{\max }$ is optimal attainable system reliability when the performance of the non-expert equals that of the expert.

The model approximates how the performance of the non-experts changes for better with training and coaching toward enhancing overall system reliability. It also looks at the danger of becoming overly dependent on non-experts, and it suggests that excellent training and knowledge transfer schemes should be in place to take care of risks in complex systems. The model can be adjusted by altering the training efficiency and mentoring factors to suit separate systems and environments to calculate how total risk and reliability evolve with time.

4.13 Cognitive Load and Decision-Making Model

Overall, human operator capability for effective and timely decision-making is influenced by the $C L(t)$. Human mental burden models as a function of systems complexity $C$ and severity of disruption at time $D(t)$ and operators encounter $E$:

$C L(t)=\alpha C+\beta D(t)-\gamma E$

(37)

where:

$C$ is system complexity, e.g., number of variables to be monitored by the operator,

$D(t)$ is the magnitude of disruption, increasing cognitive load at high-level disturbances,

$E$ is the skill level of the operator, decreasing cognitive load,

$\alpha$, $\beta$, and $\gamma$ are relative significance weighting coefficients of these factors.

With additional cognitive load, human input $h(t)$ may be delayed or inaccurate and thus weaken the resilience of the system. This is modeled with a decision delay function:

$\Delta t_h=\tau(C L(t))$

(38)

where, $\Delta t_h$ is human intervention delay and $\tau(\cdot)$ is a function of higher cognitive load.

4.14 Overall System Performance Model

The system performance model of the HITL system should be evaluated in terms of an aggregate of robustness and resilience weighted by operation context. Let $P$ represent the system performance model, a weighted aggregate of $R$ and $\rho$:

$P=\lambda_1 R+\lambda_2 \rho$

(39)

where, $\lambda_1$ and $\lambda_2$ are operationally dependent weights that prefer robustness or resilience depending on the operating needs of the system. For example, in safety-critical environments, resilience may receive greater emphasis over robustness.

4.15 Stochastic Disruption Model

$d(t)$ are modeled as a stochastic process to model uncertainty in real conditions. We assume disruptions to be having a probability distribution $p(d(t))$, with mean $\mu_d$ and variance $\sigma_d^2$ defining probability and severity of disruptive events.

For low-level disruptions $d_l(t)$, we assume a Gaussian distribution:

$d_l(t) \sim \mathcal{N}\left(\mu_l, \sigma_l^2\right)$

(40)

For high-level disruptions $d_h(t)$, one might use a heavier-tailed distribution such as the Gumbel distribution for extreme events:

$d_h(t) \sim \operatorname{Gumbel}\left(\mu_h, \sigma_h\right)$

(41)

4.16 Iteration Across Varying Scenarios

Perform the simulation under varying scenarios of disruptions (e.g., low-level, medium-level, high-level disruptions) to analyze how the system reacts under different conditions:

For each scenario:

- Modify the magnitude $D(t)$ and severity $\gamma_{\text {sys }}(t)$.

- Analyze the impact on system performance with changing human interventions (e.g., time delays $\tau_{\text {human }}$, cognitive errors $\epsilon_{\text {human }}(t)$).

- Compare each scenario's robustness, resilience, and combined reliability measures.

4.17 Sensitivity Analysis

To see how the impact of such and other factors may vary, say, disruption magnitude, involvement of human action, or recovery rates of the system, a sensitivity analysis is conducted. This can be done by varying parameters such as $\lambda$, $\alpha$, $\beta$, and $E_h(t)$ to test for the sensitivity of overall system reliability to changes in disruption magnitude, human mistake, and system robustness.

It would supply the mathematical context within which these human and automated contributions are expressed, providing quantification in a HITL simulation model and thus estimating general systems and system reliability under nominal operation or recovering from disturbance. Such modeling is an overall strategy to general performance assessment based on system dynamics coupling human decision-making with random disturbances. Further studies can perhaps develop a framework where, excluding specific human behavioral aspects, adaptive learning models and real-time feedback loops could be created for even more complex systems in order to make them more reliable.

Additional Sensitivity and Complexity Problems:

• Stochastic Modeling: Incorporating stochastic elements such as random perturbations and cognitive human mistakes helps the simulation become more realistic. These could be modeled with Gaussian noise or Poisson processes in order to emulate sudden and unexpected events.

• Adaptive Human Behavior: A higher-level model, which emulates learning human behaviors, could choose over time based on past disruptions and modify their actions. It is possible by applying Reinforcement Learning (RL), where human actions change the intervention strategy based on past success or failure.

• Component Dependencies: For those systems with many components, components do not necessarily fail independently. Including correlated failures or network effects among system components would have a significant impact on performance.

• Time-Dependent Recovery: The recovery of the system could be modeled as a nonlinear function rather than an exponential function, based on the types of disruption. For instance, recovery could be more time-consuming or in the shape of an S-curve after profound disruptions.

This more sophisticated mathematical model enhances sensitivity by introducing randomness in system parameters, human behavior, and the response of systems. It produces a more developed and complex form of simulation that is used to greater effect in the real world in health care, aviation, or infrastructures at an enormous scale.

4.18 Suggested Model Algorithm

The algorithmic steps to perform HITL simulations to produce more robust and resilient systems are provided below. The algorithm would then simulate nominal operation and failure events, integrate human interventions, and determine system performance in terms of measures of robustness and resilience. The following steps in Figure 2 define the HITL simulation operations:

Figure 2. Suggested model algorithm steps

Algorithm Steps

(1) Initialization:

• Input system parameters:

System model $S$ with nominal performance parameters.

Initial system performance $P(0)$, where $P(0)=1$ full functionality.

System degradation rate $\lambda_{\text {sys}}$.

Resilience and robustness thresholds.

• Set simulation parameters:

Simulation time horizon $T$.

Disruption events $D_i$, their magnitudes $D(t)$, and timings $t_{\text {disruption}}$.

Human intervention parameters $\alpha$, $\beta$, $\tau_{\text {human}}$

• Initialize system variables:

Set system performance $P(t)=P(0)$.

Set human intervention state $I(t)=0$.

(2) System Nominal Operation (Pre-Disruption):

A. For each time step $t \in\left[ 0, t_{\text {disruption }}\right]$.

Calculate nominal system performance:

$P_{\text {nominal }}(t)=1-\lambda_{\text {sys }} t$

(42)

B. Disruption tracking for any small departures or first warnings of interruptions.

C. Record performance $P_{\text {nominal }}(t)$.

(3) Disruption Detection:

• At $t=t_{\text {disruption }}$, introduce a disruption event $D(t)$.

• Alter system performance in response to the interruption:

$P_{\text {disrupted }}(t)=P_{\text {nominal }}(t)-D(t) \cdot \gamma_{\text {sys }}(t)$

(43)

• Capture the impact on system performance $P_{\text {disrupted }}(t)$.

(4) Human Intervention:

• At time $t_{\text {disruption}}+\tau_{\text {human}}$, human operator detects the disruption and intervenes.

• For each time step $t \geq t_{\text {disruption}}+\tau_{\text {human}}$:

D. Calculate the human intervention effect:

$I(t)=\alpha \cdot\left(1-P_{\text {disrupted }}(t)\right) \cdot e^{-\beta\left(t-\tau_{\text {human }}\right)}$

(44)

E. Adjust system performance with the intervention effect:

$P(t)=P_{\text {disrupted }}(t)+I(t)$

(45)

F. If $P(t) \geq P_{\text {acceptable }}$, mark the system as recovered and record the recovery time $t_{\text {recover }}$.

G. Continue adjusting $P(t)$ until the perturbation is fully removed or the system is optimized to the point of maximum recovery.

(5) Post-Recovery System Monitoring:

• Be on the lookout for system performance after recovery for any lingering effects or secondary disruptions.

• Update system performance:

$P(t)=P_{\text {nominal }}(t)-\text{Residual Effects}$

(46)

• Record post-recovery performance trajectory.

(6) Robustness Assessment:

• Estimate system robustness $R_{\text {robust }}$ in terms of measurement of nominal performance deviation over the whole time horizon:

$R_{\text {robust }}=\frac{1}{T} \int_0^T\left(1-\left|P_{\text {nominal }}(t)-P_{\text {disrupted }}(t)\right|\right) d t$

(47)

• Greater $R_{\text {robust }}$ value represents higher robustness, with minimal variation of performance in normal and disrupted operations.

(7) Resilience Evaluation:

• Establish resilience $R_{\text {resilience }}$ based on the rate of recovery and quality of the system:

$R_{\text {resilience }}=\frac{P\left(t_{\text {recover }}\right)}{P\left(t_{\text {disruption }}\right)} \cdot 100$

(48)

• Measure the time $t_{\text {recover }}$ and determine how quickly and effectively the system returns to a satisfactory level of functioning after a disruption.

(8) Combined Reliability Index:

• Calculate the combined reliability index $R_{\text {combined }}$ based on the combined robustness and resilience measures:

$R_{\text {combined }}=\omega_1 \cdot R_{\text {robust }}+\omega_2 \cdot R_{\text {resilience }}$

(49)

• Weight the weights $\omega_1$ and $\omega_2$ based on the relative importance of resilience vs. robustness for the system in question.

(9) Iteration for Multiple Scenarios:

• Store and compare the results for each scenario, tracking the effect of varying magnitudes of disruption and of human interference on overall reliability.

The proposed algorithm would demonstrate a structured process for simulation and system reliability analysis within HITL platforms in terms of resilience and robustness. Together, human action, disruption management, and recovery dynamics offer integrated insights into the behavior of the system under practical scenarios and help design more reliable and resilient systems.

5. Application

This is an illustration of HITL simulations in the disaster management system (DMS) using a natural disaster scenario, i.e., earthquake. The system depends on computer decision-making algorithms for resources allocation and rescue, yet incorporates human operators to address surprise interference and help the system become safe, as in the case of the earthquake. There is a 7.0-level earthquake that hits a densely populated urban area, causing extensive damage to infrastructure and crippling all means of communications. Rescue missions, resource distribution, and evacuation routes are managed by the DMS.

System Overview

(1) Automated modules:

Resource allocation algorithm (RAA): Distribute rescue teams and medical supplies.

Evacuation route planner (ERP): Calculate safe evacuation routes using road network data.

(2) Human operators:

Train emergency managers review automated decisions and intervene where the algorithms are wrong (e.g., when data is incomplete or where changes are not consistent with recognized patterns).

Disruptions

(1) Data disruptions: Link between the crisis areas and the RAA are disrupted, giving incomplete data.

(2) Dynamic disruptions: Unplanned road blockages due to aftershocks render the initial evacuation schemes of ERP impracticable.

Objective

(1) Quantify the resilience of the DMS under nominal and disrupted operations.

(2) Test resilience by having human operators intervene to get back into action.

(3) Compute system reliability by aggregating robustness and resilience indicators.

A disaster management system distributes resources in rescue troops, medical resources, and equipment against disruptions such as communication disruption, lack of resources, and fluctuating needs over affected areas.

Assumptions

$N$ affected areas which need to be supported exist.

The system operates over $T$ time periods.

Human intervention operators are available during disruptions.

Two types: data disruption, i.e., disrupting communication and dynamic disruption, i.e., fluctuating needs or conditions.

Decision Variables

$x_{i j}^t$: Amount of resource $j$ allocated to region $i$ at time $t$ .

$y_i^t$: Binary variable indicating whether region $i$ is fully supported at time $t\left(y_i^t=1\right)$ or not $\left(y_i^t=0\right)$.

$z^t$: Binary variable indicating whether human intervention occurs at time $t\left(\mathrm{z}^{\mathrm{t}}=1\right)$.

Objective Function

Maximize system performance, balancing robustness and resilience:

$\max \mathrm{Z}=\underbrace{\frac{1}{T} \sum_{T=1}^T \sum_{i=1}^N y_i^T}_{\text {Robustness }}+\underbrace{\frac{1}{T} \sum_{t=1}^T R^t}_{\text {Resilience }}$

(50)

where, $R^t$ represents the system's recovery level at time $t$, modeled as a function of disruption impact and human intervention:

$R^t= \begin{cases}1-\alpha D^t, & \text { if } z^t=0 \\ 1-\beta D^t, & \text { if } z^t=1\end{cases}$

(51)

$D^t$: D Severity of disruption at time (e.g., loss of data or lack of resources) also disruption severity at time $t$ (e.g., resource shortfall or data loss).

$\alpha$: R Rate of recovery without human intervention.ecovery rate without human intervention.

$\beta>\alpha$: R Rate of recovery with human intervention ecovery rate with human intervention.

Constraints

(1) Resource Allocation Constraints:

$\sum_{j=1}^M x_{i j}^t \geq D_i^t \cdot y_i^t, \quad \forall i \in N, \forall t \in T$

(52)

$M$: Total number of resources.

$D_i^t$: Resource demand for region $i$ at time $t$ .

(2) Resource Availability Constraints:

$\sum_{i=1}^N x_{i j}^t \leq R_j^t, \quad \forall j \in M, \forall t \in T$

(53)

$R_j^t$: Total available amount of resource $j$ at time $t$.

(3) Human Intervention Constraints:

$z^t \in\{0,1\}, \quad \forall t \in T$

(54)

$\sum_{t=1}^T z^t \leq H$

(55)

$H$: Maximum number of human interventions allowed.

(4) Binary Constraints for Region Support:

$\mathrm{y}_{\mathrm{i}}^{\mathrm{t}} \in\{0,1\}, \forall i \in N, \forall t \in T$

(56)

(5) Performance Threshold Constraint:

$\sum_{\mathrm{i}=1}^{\mathrm{N}} \mathrm{y}_{\mathrm{i}}^{\mathrm{t}} \geq \lambda, \forall t \in T$

(57)

$\lambda$: Minimum acceptable system performance level.

Case Parameters:

$N=5$ (regions), $T=10$ (time periods), $M=3$ (resources).

Resource availability: $R_j^t=[ 100,80,120]$,

Demand matrix $D_i^t$: Randomly generated with values between 10 and 50.

$\alpha=0.05, \beta=0.2$ (recovery rates).

Disruption severity $D^t:[ 0.2,0.4,0.3,0.5,0.1, \ldots]$.

Maximum interventions $H=4$.

Case Objective Function Value:

Without intervention: $Z=0.78$

With intervention: $Z=0.94$

Human interventions enhanced resilience scores by 20%, specifically in high-severity disruptions. Resource optimization ensured higher demand areas were prioritized to address robustness. The proposed mathematical model captures disaster management scenario robustness and resilience dynamics, showing the significant impact of HITL configurations. The numerical example shows the necessity of immediate human intervention and efficient allocation of resources for the assurance of system reliability. The issue seems to be within unmatched dimensions between the demand matrix and resource availability matrix. Availability of resources needs to be addressed separately for each resource type, while it's used as such during comparison with regional demands. Table 1 is the summary for 100 simulation runs:

Table 1. Statistical summary of the simulation data

Metric	Mean	Std Dev	Min	25%	50%	75%	Max
Robustness (Without Intervention)	1.000	0.000	1.000	1.000	1.000	1.000	1.000
Robustness (With Intervention)	1.000	0.000	1.000	1.000	1.000	1.000	1.000
Resilience (Without Intervention)	0.985	$1.12 \times 10^{-16}$	0.985	0.985	0.985	0.985	0.985
Resilience (With Intervention)	0.964	$1.12 \times 10^{-16}$	0.964	0.964	0.964	0.964	0.964
Objective Value (Without Intervention)	1.985	$4.46 \times 10^{-16}$	1.985	1.985	1.985	1.985	1.985
Objective Value (With Intervention)	1.964	$2.23 \times 10^{-16}$	1.964	1.964	1.964	1.964	1.964

Robustness consisted of persistent value over generations (all regions met their demands-normalized value of 1). The resilience score increased slightly with interventions, highlighting the human positive impact in times of disruptions. The systems with interventions had slightly lower objective scores, which showed their slightly lower robustness but higher resilience. This mathematical example showed how Human-in-the-Loop simulations turned a disaster management system into a more robust and resilient one. The results invoked the intervention of human beings in interruption management, reducing recovery time while upholding the reliability of the system in simple uncertain states.

The resilience score improved slightly with interventions, underlining the positive impact of human involvement during disruptions. The systems with interventions had slightly lower objective scores, reflecting their slightly reduced robustness but increased resilience. This numerical example demonstrated how Human-in-the-Loop (HITL) simulations added to the robustness and resilience of a disaster management system. The results highlighted the contribution of human intervention in the handling of disruptions, reducing recovery times while maintaining the reliability of the system in complex and uncertain situations.

Simulation Steps:

(1) Nominal Conditions (0–6 hours)

The DMS operates at $P_{\text {nominal }}=1.0$.

No disruptions or human interventions occur.

(2) Data Disruption (6–12 hours)

The performance of RAA decreases due to incomplete data, reducing system functionality to $(t)=0.7$.

At $t=8$ hours, human operators intervene, providing missing data manually.

Performance begins to recover at a rate of 0.05 per hour, reaching $P(t)=0.9$ by $t=12$ hours.

(3) Dynamic Disruption (14–24 hours)

ERP’s evacuation routes fail due to road closures, reducing system performance to $P(t)=0.6$.

Human operators detect the failure at $t=16$ hours and manually update evacuation routes.

Recovery begins, reaching $P(t)=0.95$ by $t=24$ hours.

5.1 Numerical Results

5.1.1 Robustness calculation

Using the performance curve over the time horizon:

$P_{\text {nominal }}(t)=1.0$ for all $t$.

$P(t)$ varies with disruptions and recovery:

$P(t)=0.7$ (data disruption: $6 \leq t \leq 8$).

$P(t)$ recovers linearly to 0.9 (human intervention: $8 \leq t \leq 12$)

$P(t)=0.6$ (dynamic disruption: $14 \leq t \leq 16$).

$P(t)$ recovers linearly to 0.95 (human intervention: $16 \leq t \leq 24$).

Numerical integration of $R_{\text {robust}}$ :

$R_{\text {robust }}=0.86$

5.1.2 Resilience calculation

For data disruption:

$R_{\text {resilience, data }}=\frac{P(12)}{P(6)} \times 100=\frac{0.9}{0.7} \times 100=128.57$

5.1.3 Combined reliability calculation

Resilience and robustness weights are allocated as $\omega_1=0.6$ and $\omega_2=0.4$:

$R_{\text {combined }}=0.6 \cdot 0.86+0.4 \cdot\left(\frac{128.57+158.33}{2 \times 100}\right)=0.994$

The system was relatively robust, having a small amount of deviation when interruptions happened.

Automation was good under nominal conditions, yet dynamic and data disruptions affected performance significantly.

Human intervention became necessary in assisting with resilience, with $R_{\text {resilience, data }}=128.57 \%$ and $R_{\text {resilience,}}$ $_{\text {dynamic }} = 158.33\%$. Speed of intervention raised the resilience scores. The total reliability score $\left(R_{\text {combined }}=0.994\right)$ highlights the importance of integrating HITL mechanisms into disaster management systems to deal with uncertainties and aid in recovery. The error and confidence interval discussion analysis provided in depth in Table 2.

Table 2. Sources of error

Source	Description	Impact
Modeling Assumptions	The model assumes fixed demand distributions and linear recovery rates $(\alpha, \beta)$	Oversimplification may underestimate real-world complexities, e.g., nonlinear disruptions
Random Demand Variability	Demands $D_i^t$ are randomly sampled in [ 10,50], introducing stochastic variability	May introduce statistical noise between runs, although decreased by running 100 simulations
Human Intervention Timing	Interventions are taken to happen optimally (e.g., at $t=8$, $t=16$)	In real-world cases, detection and response delays could worsen outcomes
Resource Allocation Simplification	Resources are treated as homogeneous within every resource category across all regions	Actual systems would face differential utility of resources in various regions
Finite Interventions ($H=4$)	Limited human interventions restrict recovery opportunities	May result in moderate underestimation of the optimum resilience potential of the system
Metric Estimation Errors	Robustness $R_{\text{robust}}$ estimation via integration and resilience $R_{\text{resilience}}$ estimation via ratios contains numerical errors of approximation	However, in deterministic cases, they are small

The standard deviations are virtually zero (essentially numerical noise, $\sim 10^{-16}$ ), i.e., the results from simulation are strongly consistent from one run to the next. Robustness and resilience estimates in this setup lack appreciable random error.

5.1.4 Confidence interval calculation

Owing to the near-zero standard deviations, confidence intervals for significant measures are extremely narrow. For completeness, presenting a straightforward 95% Confidence Interval (CI) calculation for mean measures:

$\mathrm{CI}=\mu \pm z \times \frac{\sigma}{\sqrt{n}}$

where:

$\mu=$ sample mean,

$\sigma=$ standard deviation,

$n=100$,

$z=1.96$ (for 95% confidence).

Resilience (With Intervention):

$\mathrm{CI}=0.964 \pm 1.96 \times \frac{1.12 \times 10^{-16}}{10}$

$\mathrm{CI}=0.964 \pm 2.19 \times 10^{-17}$

Therefore, confidence interval is effectively $0.964 \pm 0.0000000000000000219$, which is really negligible. Likewise for all the measures, confidence intervals are essentially equal to the means.

5.1.5 Discussion on robustness and resilience metrics

Robustness level ($R_{\text {robust }}$) was extremely high, at the level of 1 , throughout all simulations. This is a confirmation that disaster management system (DMS) was performing steadily under normal operations with planning for evacuation and resource allocation as usual without any requirements for substantial interferences. Resilience ($R_{\text {resilience }}$) was significantly enhanced by incorporating human intervention. In addition, resilience was added by another 28%-58% during disruption incidents, demonstrating that human operators contributed significantly to minimizing system failure. The ($R_{\text {combined }}=$ 0.994) reliability showed that the overall system, integrating robustness and resilience, was extremely reliable even in the presence of communication failures and alterations in road networks. Human-in-the-Loop (HITL) operators functioned successfully on incomplete or erroneous data, and they resolved unconventional dynamic failures without much trouble, and they established the key position towards maintaining system stability in emergency situations. It was with respect to good results for performance but some problems of interpreting error and uncertainty bounds of this study. The experiment was conducted in an artificially constrained environment, which had predetermined perturbations, recovery rates, and interventions. Thus, randomness of the world, i.e., communication latency, human error, or intricate road network failures was not exactly simulated.

The model was based on discrete time periods (e.g., hourly analysis) to analyze choices of resources and performance, so it might overlook more abrupt or even smooth shifts in system behavior that would be exposed in a continuous-time model. Another constraint is that four human interventions ($H = 4$) are being drawn upon as required, and this might be too limiting to the flexibility and quick response that could be achieved in actual disaster response situations. Thus, the resilience predictions would appear slightly pessimistic relative to what might actually be achieved in Table 3.

Table 3. Summary table

Aspect	Results
Error Margin	Zero (due to deterministic model + 100 replications)
Confidence Intervals	Very narrow (nearly the sample mean)
Robustness	High, consistent
Resilience	Much improved with HITL
Combined Reliability	Very high (0.994)
Principal Error Sources	Model simplifications, fixed intervention timing
Limitations	No randomness in real-world and nonlinear effects

The outcomes of the analysis integrated the key conclusions of simulations and numerical examples, investigated the system robustness, and resilience for different cases by conducting HITL simulations. The test results reaped in terms of the Key Performance Indicators illustrated the importance of human intervention towards guaranteeing system reliability.

5.2 System Robustness

Robustness refers to the capacity of the system to behave in a predictable manner under normal circumstances and to resist small perturbations.

• Baseline performance: The performance of the systems averaged nearly optimal under nominal situations whereas the systems had been optimized against. Therefore, robustness scores between 0.85 and 0.95, depending on the context.

• Disruption effect: Disruptions in data as well as dynamic disruptions induced by extreme losses in performance.

In the case study of disaster management, data disruption lowered the performance of the resource allocation algorithm by as much as 30%, thus reducing the robustness of this phase.

Autonomous systems in the case study of a vehicle caused up to 40% deviation in performance in case of sudden obstacle events.

5.3 System Resilience

Resilience is the ability of the system to recover from disruptions, either autonomously or by human intervention.

Human intervention effectiveness:

- In the disaster management example, human intervention recovered system performance from 70% to 90% within 6 hours for data disruptions and from 60% to 95% within 8 hours for dynamic disruptions.

- In the autonomous vehicle scenario, human operators might avoid collisions and restore functionality with a recovery time of between 3 seconds and 2 minutes, depending on disruption complexity.

- Quick intervention restored performance to 90%-95% in most scenarios.

- Resilience scores fell below 0.5 for major disruptions in the no-intervention scenario.

Recovery time analysis:

- Minor disruptions took on average 2-6 hours to recover from for disaster management and 3-10 minutes for autonomous systems.

- Major disruptions took at most 6-12 hours to recover from.

Resilience metrics:

- For the scenarios where timely and effective human intervention was conducted, resilience metrics were above 0.8 and even above 0.9, while for no human intervention, resilience metrics were below 0.5.

5.4 Combined Reliability

The composite reliability measure takes into account the overall system performance by integrating robustness and resilience.

• Disaster management system: The combined reliability was 0.99 owing to superb coordination among automated systems and human operators. Despite extreme disruptions, the system demonstrated almost optimal recovery performance.

• Autonomous vehicle system: The combined reliability was 0.94, indicating high reliability with few delays in human interventions.

The addition of HITL considerably improved the reliability, when compared to fully automated systems.

5.5 Sensitivity Analysis

Parameters for Sensitivity Analysis

The following parameters were varied to assess their impact:

(1) Human intervention delay $\left(\tau_{\text {human}}\right)$: The time required for human operators to detect and respond to disruptions.

(2) Disruption severity ($D_{\text {severity}}$): The percentage performance degradation caused by disruptions.

(3) Recovery rate ($R_{\text {rate}}$): The speed at which performance is restored after disruptions.

(4) Automation reliability ($A_{\text {reliability}}$): The robustness of the automated systems in nominal conditions.

Sensitivity Analysis Results

A. Human Intervention Delay $\left(\tau_{\text {human}}\right)$

Observation: Longer intervention delays significantly reduce resilience scores.

Example: In the disaster management scenario:

$\tau_{\text {human}}=1$ hour: Resilience score $R_{\text {resilience}}=0.93$.

$\tau_{\text {human}}=4$ hours: Resilience score $R_{\text {resilience}}=0.78$.

$\tau_{\text {human}}=8$ hours: Resilience score $R_{\text {resilience}}=0.63$.

Minimizing intervention delays is critical to maintaining high resilience.

B. Disruption Severity ($D_{\text {severity}}$)

Observation: More severe disruptions lead to larger performance drops and longer recovery times.

Example: In the autonomous vehicle scenario:

$D_{\text {severity}}=20 \%$ : Recovery time $=5$ minutes.

$D_{\text {severity}}=50 \%$ : Recovery time $=15$ minutes.

$D_{\text {severity}}=80 \%$ :Recovery time $=40$ minutes.

System designs should prioritize resilience to high-severity disruptions.

C. Recovery Rate ( $R_{\text {rate}}$ )

Observation: Faster recovery rates improve resilience scores significantly.

Example: In disaster management:

$R_{\text {rate}}=0.05$ per hour: Resilience score $R_{\text {resilience}}=0.83$

$R_{\text {rate}}=0.1$ per hour: Resilience score $R_{\text {resilience}}=0.91$.

$R_{\text {rate}}=0.2$ per hour: Resilience score $R_{\text {resilience}}=0.97$.

Recovery mechanisms, whether automated or human-assisted, should focus on improving recovery speed.

D. Automation Reliability $\left(A_{\text {reliability}}\right)$

Observation: Higher automation reliability reduces the frequency of human intervention but does not eliminate the need for it.

Example:

$A_{\text {reliability}}=0.95$ : Human intervention required for $15 \%$ of disruptions.

$A_{\text {reliability}}=0.85$ : Human intervention required for $35 \%$ of disruptions.

Even highly reliable automated systems benefit from human oversight in complex scenarios.

The sensitivity analysis confirmed the following, as the most critical human intervention delay was the most dominant factor influencing system resilience. More rapid detection and response mechanisms must be established. Disruption severity and recovery rate are both directly proportional to recovery time and ultimate performance levels. Reducing severe disruptions and enhancing recovery mechanisms is where attention must be directed. Automation reliability is necessary but not sufficient for system-wide reliability. The Human-in-the-Loop designs are critical to resilience in dynamic and uncertain situations in Table 4.

Table 4. Sensitivity analysis summary

Parameter	Impact on Resilience	Recommended Focus
Human intervention delay	High	Reduce delay through real-time alerts
Disruption severity	High	Design systems for severe disruptions
Recovery rate	Moderate to High	Optimize recovery mechanisms
Automation reliability	Moderate	Ensure human oversight for critical tasks

Based on these factors, systems could prioritize high robustness, resilience, and reliability for real-world applications because disruptions could be entirely avoided.

5.6 Comparison of Results

This section was dedicated to providing a comparison across scenarios/configurations with added insight into effects related to robustness, resilience, and overall system reliability stemming from the implementation of HITL systems, automation reliability, and variance in parameters.

5.6.1 Robustness comparison

A. HITL versus Fully Automated Systems

• HITL Systems:

Maintained robustness scores between 0.85-0.95 across scenarios.

Performance degradation during disruption was mitigated by human interventions, limiting performance drops to a maximum of 30%-35%.

• Fully Automated Systems:

There was more robustness under nominal conditions at 0.90-0.98 but they were very vulnerable under disruptions, with 40%-50% decline in performance due to a lack of adaptability.

HITL systems realize a trade-off between automation and adaptability and are therefore more robust in dynamic environments.

5.6.2 Resilience comparison

A. HITL Systems with Human Intervention Delays

• Resilience scores were sensitive to intervention delays $\left(\tau_{\text {human }}\right)$:

$\tau_{\text {human }}=1$ hour: $R_{\text {resilience }}=0.93$

$\tau_{\text {human }}=8$ hours: $R_{\text {resilience }}=0.63$.

B. Fully Automated Systems

• Resilience scores were low for significant disruptions ($R_{\text {resilience }}<0.5$), as recovery mechanisms lacked the flexibility that was afforded by human oversight.

Resilience in HITL systems was far higher, if human interventions were timely and effective.

5.6.3 Recovery time comparison

HITL systems recovered faster and more stably compared to fully automated systems as in Table 5.

Table 5. Recovery time comparison results

Scenario	Disruption Type	HITL Recovery Time	Fully Automated Recovery Time
Disaster Management	Data Disruption	6 hours	No recovery
Disaster Management	Dynamic Disruption	8 hours	No recovery
Autonomous Vehicles	Sensor Failure	5 minutes	30 minutes
Autonomous Vehicles	Road Blockage	10 minutes	No recovery

5.6.4 Combined reliability comparison

HITL systems achieved significantly higher combined reliability due to effective human interventions and flexibility under disruptions as in Table 6.

Table 6. Combined reliability comparison results

System Configuration	Scenario	Combined Reliability
HITL System	Disaster Management	0.98--0.99
HITL System	Autonomous Vehicles	0.92--0.96
Fully Automated System	Disaster Management	0.70--0.80
Fully Automated System	Autonomous Vehicles	0.65--0.75

5.6.5 Sensitivity analysis comparison

A. Human Intervention Delay ($\tau_{\text {human}}$)

HITL systems with shorter delays performed better in terms of resilience and recovery times.

B. Automation Reliability ($A_{\text {reliability}}$)

High automation reliability $A_{\text {reliability }}=0.95$ lowered the necessity for human intervention but could not completely make up for catastrophic disruptions.

C. Disruption Severity ($D_{\text {severity}}$)

Prolonged disruptions caused longer recovery time and decreased resilience, emphasizing the importance of robust recovery mechanisms.

D. Recovery Rate ($R_{\text {rate}}$)

Higher recovery rates significantly improved resilience scores independent of system configuration.

HITL systems outperformed fully automated systems on all metrics, particularly the time for resilience and recovery. Timely human interventions played a key role in achieving high reliability in HITL systems. Fully automated systems exhibited high robustness under nominal conditions but could not adapt and recover from extreme disruptions. Sensitivity analysis reaffirmed that human intervention delay and recovery rate were the most influential parameters controlling resilience and recovery.

HITL-supported human-machine systems are capable of responding to disruptions and uncertainties with ease. Integration of user-friendly interfaces and real-time diagnostic support has the potential to reduce response time and enhance the effectiveness of human intervention. Effectiveness of human interventions is strongly dependent on the training of operators and their experience with the system. Well-trained operators achieved higher resilience scores through immediate and appropriate intervention in case of disruptions. Data disruptions brought a significant impact on system performance. In such interruptions, robustness could be increased by strengthening communication networks and creating redundancy.

These results of the analyses constitute the basis of the primary contribution of Human-in-the-Loop simulation to system reliability. While automation provides a basis of robustness under nominal operations, human operators are essential to ensure resilience in the presence of disruptions. Automation and human oversight together create not just robust but also adaptive and reliable systems under uncertain and dynamic conditions. Such synergy is particularly worthwhile in high-consequence applications, such as disaster responses, autonomous systems, and safety-critical applications.

6. Conclusions

This study discusses the contribution of Human-in-the-Loop (HITL) simulations, particularly to increase the reliability of complex systems in two important dimensions, namely robustness and resilience. The current approach put the human operator within the simulation and control loop in order to assess system performance under normal as well as perturbing conditions. Through the inclusion of human decision within the system, its resistance and recovery from unexpected occurrences were improved significantly.

To quote an example from nominal operating conditions, an Autonomous Vehicle Control System provided extremely high robustness, which was quantified as nearly constant performance value almost exactly at desired output with small deviations. Human interventions were not needed in such conditions and this proved that the system performed nicely in nominal conditions.

Human intervention was considered to be significantly important under disrupting circumstances, such as sensor failures or unexpected obstacles, where the system recovered faster and evaded failure. One demonstration showed how human input decreased recovery time and greatly improved system robustness. The positive effect of human decision-making, when dealing with unexpected circumstances, was articulated in terms of increased recovery performance metrics. Existing chatopols of strength have the tendency to summarize multi-dimensional system behaviors in broad metrics, which could overlook significant factors involving critical human judgment and situational variability. This work fills the gap by proposing HITL simulations as a more integrated approach for the analysis of system dependability under in-vivo conditions.

HITL simulations have also shown that even though self-executing systems could perform most of the routine tasks with high reliability, humans play an important role in enhancing system resilience in the event of an unanticipated interruption. Human competence for real-time decision-making and adaptability enables the system to exceed its capability during autonomous mode of operation for safety and reliability.

The mathematical model in the present study offered a methodical way of measuring robustness and resilience taking into account the impact of human intervention. The model was able to show quantifiable human impact convincingly and provided a conceptual framework for assessing reliability in HITL systems. Other state-of-the-art Hyper-converged Infrastructure (HCI) approaches could be used to decrease the response time and improve decision accuracy to enable human interventions. The concept being explained here will generalize readily to other mission-critical areas, like Industrial Automation, cyber security, and medical care systems, in which HITL would be valuable to guarantee the trustworthiness of those systems. The design of adaptive systems that dynamically alters the level of autonomy in the complexity of the environment and occurring disturbances might increase robustness and resilience even more in operational applications.

In conclusion, HITL simulations are a key facilitator in building system reliability through the enhancement of robustness and resilience. This will ultimately aid HITL in both predictable and unpredictable situations that would help operations become much safer and reliable. It is this hybrid technique that would become most important in the realms of failure due to high costs or risks, hence providing a direction towards which human perception is the integral core within system performance.

Data Availability

The data used to support the research findings are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

1.

J. Zhang, T. Wang, Y. M. Goh, P. He, and L. Hua, “The effects of long-term policies on urban resilience: A dynamic assessment framework,” Cities, vol. 153, p. 105294, 2024. [Google Scholar] [Crossref]

2.

D. Zhang, C. Li, H. H. Goh, T. Ahmad, H. Zhu, H. Liu, and T. Wu, “A comprehensive overview of modeling approaches and optimal control strategies for cyber-physical resilience in power systems,” Renew. Energy, vol. 189, pp. 1383–1406, 2022. [Google Scholar] [Crossref]

3.

B. Pawar, M. Huffman, F. Khan, and Q. Wang, “Resilience assessment framework for fast response process systems,” Process Saf. Environ. Prot., vol. 163, pp. 82–93, 2022. [Google Scholar] [Crossref]

4.

S. M. Rezvani, N. M. de Almeida, M. J. Falcão, and M. Duarte, “Enhancing urban resilience evaluation systems through automated rational and consistent decision-making simulations,” Sustain. Cities Soc., vol. 78, p. 103612, 2022. [Google Scholar] [Crossref]

5.

S. Moradi, M. M. Khan, N. U. I. Hossain, M. Shamsuddoha, and A. Gorod, “Modeling and assessing seismic resilience leveraging systems dynamic approach: A case study of society 5.0,” Int. J. Crit. Infrastruct. Prot., vol. 43, p. 100639, 2023. [Google Scholar] [Crossref]

6.

M. Yang, H. Sun, and S. Geng, “On the quantitative resilience assessment of complex engineered systems,” Process Saf. Environ. Prot., vol. 174, pp. 941–950, 2023. [Google Scholar] [Crossref]

7.

M. Leštáková, K. T. Logan, I.-S. Rehm, P. F. Pelz, and J. Friesen, “Do resilience metrics of water distribution systems really assess resilience? A critical review,” Water Res., vol. 248, p. 120820, 2024. [Google Scholar] [Crossref]

8.

L. Xu, Q. Guo, Y. Sheng, S. M. Muyeen, and H. Sun, “On the resilience of modern power systems: A comprehensive review from the cyber-physical perspective,” Renew. Sustain. Energy Rev., vol. 152, p. 111642, 2021. [Google Scholar] [Crossref]

9.

C. Zhang, Y. Zhou, and S. Yin, “Interaction mechanisms of urban ecosystem resilience based on pressure-state-response framework: A case study of the Yangtze River Delta,” Ecol. Indic., vol. 166, p. 112263, 2024. [Google Scholar] [Crossref]

10.

W. Li, R. Jiang, H. Wu, J. Xie, Y. Zhao, Y. Song, and F. Li, “A system dynamics model of urban rainstorm and flood resilience to achieve the sustainable development goals,” Sustain. Cities Soc., vol. 96, p. 104631, 2023. [Google Scholar] [Crossref]

11.

T.-H. Tseng and B. Stojadinović, “CI-STR: A capabilities-based interface to model socio-technical systems in disaster resilience assessment,” Int. J. Disaster Risk Reduct., vol. 111, p. 104763, 2024. [Google Scholar] [Crossref]

12.

J. J. Magoua and N. Li, “The human factor in the disaster resilience modeling of critical infrastructure systems,” Reliab. Eng. Syst. Saf., vol. 232, p. 109073, 2023. [Google Scholar] [Crossref]

13.

W. Gu, J. Qiu, J. Hu, and X. Tang, “A Bayesian decision network–based pre-disaster mitigation model for earthquake-induced cascading events to balance costs and benefits on a limited budget,” Comput. Ind. Eng., vol. 191, p. 110161, 2024. [Google Scholar] [Crossref]

14.

R. He, R. L. K. Tiong, Y. Yuan, and L. Zhang, “Enhancing resilience of urban underground space under floods: Current status and future directions,” Tunn. Undergr. Space Technol., vol. 147, p. 105674, 2024. [Google Scholar] [Crossref]

15.

S. Gong, Y. Ye, X. Gao, L. Chen, and T. Wang, “Empirical patterns of interdependencies among critical infrastructures in cascading disasters: Evidence from a comprehensive multi-case analysis,” Int. J. Disaster Risk Reduct., vol. 95, p. 103862, 2023. [Google Scholar] [Crossref]

16.

A. Gros, L. De Luca, F. Dubois, P. Véron, and K. Jacquot, “From surveys to simulations: Integrating Notre-Dame de Paris’ buttressing system diagnosis with knowledge graphs,” Autom. Constr., vol. 170, p. 105927, 2025. [Google Scholar] [Crossref]

17.

T. Gebhard, B. J. Sattler, J. Gunkel, M. Marquard, and A. Tundis, “Improving the resilience of socio-technical urban critical infrastructures with digital twins: Challenges, concepts, and modeling,” Sustain. Anal. Model., vol. 5, p. 100036, 2025. [Google Scholar] [Crossref]

18.

M. A. Aghazadeh, E. Zarei, A. Ghahramani, and H. Li, “A dynamic system reliability analysis model on safety instrumented systems,” J. Loss Prev. Process Ind., vol. 92, p. 105455, 2024. [Google Scholar] [Crossref]

19.

N. Wang, M. Wu, and K. F. Yuen, “Modelling and assessing long-term urban transportation system resilience based on system dynamics,” Sustain. Cities Soc., vol. 109, p. 105548, 2024. [Google Scholar] [Crossref]

20.

U. Lagap and S. Ghaffarian, “Digital post-disaster risk management twinning: A review and improved conceptual framework,” Int. J. Disaster Risk Reduct., vol. 110, p. 104629, 2024. [Google Scholar] [Crossref]

21.

W. Sun and W. Liao, “Risk propagation in emergency supply chain during public health events - From a reliability perspective,” Heliyon, vol. 11, no. 1, p. e41423, 2025. [Google Scholar] [Crossref]

22.

Z. Alipour, M. S. Monfared, and S. E. Monabbati, “Developing a bi-objective maintenance optimization model for process industries by prioritizing resilience and robustness using dynamic Bayesian networks,” Comput. Ind. Eng., vol. 189, p. 109993, 2024. [Google Scholar] [Crossref]

23.

B. Pawar, S. Park, P. Hu, and Q. Wang, “Applications of resilience engineering principles in different fields with a focus on industrial systems: A literature review,” J. Loss Prev. Process Ind., vol. 69, p. 104366, 2021. [Google Scholar] [Crossref]

24.

B. Gu and J. Liu, “Port resilience analysis based on the HHM-FCM approach under COVID-19,” Ocean Coast. Manage., vol. 243, p. 106741, 2023. [Google Scholar] [Crossref]

25.

Y. Du, J. Zhang, Y. Chen, Z. Liu, H. Zhang, H. Ji, C. Wang, and J. Yan, “Impact of electric vehicles on post-disaster power supply restoration of urban distribution systems,” Appl. Energy, vol. 383, p. 125302, 2025. [Google Scholar] [Crossref]

26.

E. Shittu, A. Tibrewala, S. Kalla, and X. Wang, “Meta-analysis of the strategies for self-healing and resilience in power systems,” Adv. Appl. Energy, vol. 4, p. 100036, 2021. [Google Scholar] [Crossref]

27.

A. Hosseini, “Max-type reliability in uncertain post-disaster networks through the lens of sensitivity and stability analysis,” Expert Syst. Appl., vol. 241, p. 122486, 2024. [Google Scholar] [Crossref]

28.

L. Chen and B. Wang, “Robustness assessment of weakly coupled cyber-physical power systems under multi-stage attacks,” Electr. Power Syst. Res., vol. 231, p. 110325, 2024. [Google Scholar] [Crossref]

29.

N. Maddah and B. Heydari, “Building back better: Modeling decentralized recovery in sociotechnical systems using strategic network dynamics,” Reliab. Eng. Syst. Saf., vol. 246, p. 110085, 2024. [Google Scholar] [Crossref]

30.

B. Arnold Urken, A. B. Nimz, and M. Tod Schuck, “Designing evolvable systems in a framework of robust, resilient and sustainable engineering analysis,” Adv. Eng. Inform., vol. 26, no. 3, pp. 553–562, 2012. [Google Scholar] [Crossref]

31.

B. Yang, L. Zhang, B. Zhang, Y. Xiang, L. An, and W. Wang, “Complex equipment system resilience: Composition, measurement and element analysis,” Reliab. Eng. Syst. Saf., vol. 228, p. 108783, 2022. [Google Scholar] [Crossref]

32.

T. Landwehr, A. Dasgupta, and B. Waske, “Towards robust validation strategies for EO flood maps,” Remote Sens. Environ., vol. 315, p. 114439, 2024. [Google Scholar] [Crossref]

33.

S. Caldera, S. Mostafa, C. Desha, and S. Mohamed, “Integrating disaster management planning into road infrastructure asset management,” Infrastruct. Asset Manag., vol. 8, no. 4, pp. 219–233, 2021. [Google Scholar] [Crossref]

34.

E. Hossain, S. Roy, N. Mohammad, N. Nawar, and D. R. Dipta, “Metrics and enhancement strategies for grid resilience and reliability during natural disasters,” Appl. Energy, vol. 290, p. 116709, 2021. [Google Scholar] [Crossref]

35.

R. T. Khameneh, K. Barker, and J. E. Ramirez-Marquez, “A hybrid machine learning and simulation framework for modeling and understanding disinformation-induced disruptions in public transit systems,” Reliab. Eng. Syst. Saf., vol. 255, p. 110656, 2025. [Google Scholar] [Crossref]

36.

C. Mazur, Y. Hoegerle, M. Brucoli, K. van Dam, M. Guo, C. N. Markides, and N. Shah, “A holistic resilience framework development for rural power systems in emerging economies,” Appl. Energy, vol. 235, pp. 219–232, 2019. [Google Scholar] [Crossref]

37.

A. Jafarian, T. A. Granberg, and R. Z. Farahani, “The effect of geographic risk factors on disaster mass evacuation strategies: A smart hybrid optimization,” Transp. Res. Part E Logist. Transp. Rev., vol. 193, p. 103825, 2025. [Google Scholar] [Crossref]

38.

M. Rijal, P. Luo, B. K. Mishra, M. Zhou, and X. Wang, “Global systematical and comprehensive overview of mountainous flood risk under climate change and human activities,” Sci. Total Environ., vol. 941, p. 173672, 2024. [Google Scholar] [Crossref]

39.

P. Jain, H. J. Pasman, S. Waldram, E. N. Pistikopoulos, and M. S. Mannan, “Process Resilience Analysis Framework (PRAF): A systems approach for improved risk and safety management,” J. Loss Prev. Process Ind., vol. 53, pp. 61–73, 2018. [Google Scholar] [Crossref]

Cite this:

APA Style

IEEE Style

BibTex Style

MLA Style

Chicago Style

GB-T-7714-2015

Turgay, S. & Şahin, M. S. (2025). Ensuring System Reliability Through Human-in-the-Loop (HITL) Simulations: A Robustness and Resilience Approach to Disaster Management. J. Intell. Manag. Decis., 4(3), 187-211. https://doi.org/10.56578/jimd040302

cc

©2025 by the author(s). Published by Acadlore Publishing Services Limited, Hong Kong. This article is available for free download and can be reused and cited, provided that the original published version is credited, under the CC BY 4.0 license.

pdf

Figure 1. Suggested model framework

Table 1. Statistical summary of the simulation data

Citations