A Bio-Inspired Multi-Modal State Evaluation and Game-Theoretic Coordination Approach for Active Safety in Intelligent Public Transport Systems
Abstract:
Ensuring the safety of public transport systems has become increasingly challenging with the growing complexity of traffic environments and vehicle–road–driver interactions. Conventional approaches that rely on single-source information are often insufficient to support comprehensive monitoring and coordinated response. This study proposes a bio-inspired multi-modal state evaluation approach for active safety in intelligent public transport systems. Drawing on principles of biological multi-sensory integration, the proposed method integrates driver physiological signals with heterogeneous road perception data through a multi-sensor fusion framework, enabling real-time assessment of traffic safety states. On this basis, a game-theoretic coordination strategy is developed to support collaborative prevention and response among vehicle, driver, and road-side elements under dynamic traffic conditions. The approach is evaluated across urban roads, expressways, and intersection scenarios. Experimental results show that the proposed method achieves improved accuracy, recall, and real-time performance compared with baseline methods, while maintaining stable performance under noisy and incomplete data conditions. This work provides a system-oriented approach for integrating multi-source sensing and coordinated decision-making in intelligent public transport safety management.1. Introduction
The continuous progress of multimodal biometrics recognition technology and road perception technology provides a new solution for intelligent transportation system (ITS). Biometrics are able to monitor a driver's physiological condition in real time, such as heart rate, breathing rate, fatigue, etc., to help assess the driver's health, while road sensing technology obtains information about road conditions, obstacles and traffic flow through sensors. The combination of the two can provide more comprehensive and accurate data support for traffic safety. However, how to efficiently integrate these multisource heterogeneous information and achieve coordinated prevention and control in a complex traffic environment is still a challenging problem.
As a key technology of ITS, human-vehicle-road collaboration focuses on real time interaction and information exchange among vehicles, roads, people and external environment to further improve the operational efficiency and safety performance of the entire system. Through advanced algorithms such as reinforcement learning (RL), the system can learn and optimize decision making strategies independently to cope with complex traffic situations. However, most of the existing studies focus on single mode data analysis and local collaborative decision making, but ignore the integration of multimode data and comprehensive collaborative prevention and control strategies. Therefore, how to comprehensively consider drivers' biometrics and road perception information through innovative algorithm framework and model to achieve more accurate state assessment and efficient collaborative decision making is the core issue that needs to be solved urgently.
The objective of this paper is to propose a state assessment model based on multimodal biometrics recognition and road perception, combined with RL-driven passenger-vehicle-road collaborative decision-making algorithm, to achieve the goal of active prevention and control of public transportation safety. Specifically, the innovations of this paper include:
•A multimodal data fusion framework is proposed, combining driver biometrics.
•A decision optimization algorithm based on RL is proposed. Through game theory and multi-objective optimization method, the collaborative decision making between people, vehicles and roads is realized, and the response speed and accuracy of traffic safety prevention and control are effectively improved.
•Several experimental scenarios are designed, and the effectiveness and superiority of the proposed method are verified through experiments, and the application potential of the method in the actual traffic system is demonstrated.
The design philosophy of this research is inspired by the adaptive sensing and cooperative decision-making observed in biological systems. In the human nervous system, multi-sensory integration enables rapid and precise decision-making by fusing visual, auditory, and proprioceptive signals—similar to how our proposed framework fuses biometric and road perception data for comprehensive traffic safety assessment. In nature, collective behaviors such as flocking birds, schooling fish, or cooperative foraging in ants demonstrate how local interactions and simple behavioral rules can yield robust global coordination. Analogously, the vehicle–road–person coordination strategy in this work employs game-theoretic decision-making, akin to swarm intelligence, to dynamically adapt to complex and changing traffic conditions. This bioinspired perspective not only enhances the interpretability of the proposed model but also contributes to its robustness and adaptability under uncertain and dynamic environments.
With the continuous progress of ITS, improving public transportation safety and prevention and control efficiency has become the focus of current research. In recent years, multimodal data fusion and intelligent decision-making technology have gradually become the key means to achieve this goal. At present, the research mainly focuses on driver biometrics recognition, road perception technology, and human-vehicle road collaborative control and decision optimization.
Biometrics technology, as an important research direction in ITS, can monitor the health status of drivers in real time, so as to prevent traffic accidents caused by fatigue or health problems of drivers. The relevant researches mainly focus on the detection of physiological indicators such as heart rate, respiratory rate and eye movement monitoring. For example, noncontact physiological monitoring technology based on Remote Photoplethysmography (rPPG) [1], [2] integrates multisource physiological data such as Heart Rate Variability (HRV), Electrodermal Activity (EDA [3] and eye tracking, and combines deep neural networks (DNN) to achieve accurate detection of driver fatigue, stress and cognitive alertness [4], [5], [6]. Noncontact ECG signal measurement [7] with frequency modulate continuous wave radar, real-time monitoring of driver status based on multi-modal biological information acquisition technology [8], [9], [10], [11], and detection model of driving cognitive alertness based on uncertain self-supervised learning [12], Galvanic Skin Response (GSR), and vehicle positioning [13] etc.
Road awareness technology plays an important role in autonomous driving and intelligent traffic management, acquiring real-time information about the road environment through a network of sensors, including radar, cameras and lidar, among others. For example, the application of the intelligent inspection program launched by Tencent Automatic Driving in the road section under the jurisdiction of the Futian Administration of Shenzhen Municipal Transport Bureau has significantly improved the inspection efficiency and disease identification accuracy. In addition, Gaoxing's environmental adaptive method and roadside perception technology, as well as the environmental perception technology of driverless vehicles, all demonstrate how traffic operation efficiency and safety can be improved through advanced sensors and algorithms. The information includes road conditions, obstacles, traffic signs, traffic flow and more. In order to better perceive the road environment, many studies have proposed the method of multi-sensor data fusion. For example, multi-modal road perception framework integrating vision, radar and high-precision map data [14], feature fusion method of infrared thermal radiation data [15], multi-sensor environment perception system [16], road state classification model based on attention mechanism [17], road information acquisition based on tire sensor [18], vehicle-road-cloud collaborative perception architecture [19], Image-based road information detection technology [20], [21], multi-task perception network based on LiDAR [22], [23]. These studies have significantly improved the environmental perception accuracy of road safety management. However, the integration of perception data and driver biometric information is still a field that needs to be further explored.
Human-vehicle-road cooperation is a key concept in ITS, which can improve traffic safety and efficiency by realizing dynamic cooperation between vehicles, roads and people. The existing researches mainly focus on information exchange and coordination between vehicles and roads, and realize collaborative control by using VANETs technology. For example, multi-modal vehicle-road collaborative perception framework based on Vision Transformer (ViT) [24], Federated Reinforcement Learning (FRL) driven collaborative perception strategy [25], multi-view graph convolutional network optimized by deep RL [26], decentralized multi-agent path planning strategy [27], Collaborative driving automation simulation and real vehicle system [28], building a non-zero-sum differential game model [29], an Enhanced Graph Reinforcement Learning algorithm based on a Vehicle Collaboration Graph [30]. Despite remarkable achievements in the field of vehicle-road cooperation and traffic flow optimization, most algorithms are still limited to traffic flow and signal optimization, failing to fully integrate driver biometrics with road perception information, thus affecting the wide adaptability of the system in complex traffic environments.
RL is increasingly used in ITSs, especially in the field of traffic management and collaborative decision making. By interacting with the environment and learning the optimal strategy from the feedback, RL has shown great potential in dynamic and complex traffic scenarios. For example, enhancing road safety and network security [31], collaborative traffic signal automation [32], network traffic flow direction design and signal control collaboration [33], signal-speed collaborative optimization [34], multi-lane automatic driving layered control [35], flexible traffic signal control [36], RL methods for traffic signal control [37], [38], [39], [40], [41], [42], Robust RL strategy [43], inter-section traffic flow prediction and signal timing optimization [44], combine RL and deep learning for learning traversing policies at unsignalized intersections [45]. However, most of the current RL methods focus on traffic flow optimization or isolated vehicle-road collaboration, and few studies combine driver biometric information with road perception data to achieve comprehensive optimization of vehicle-road collaborative decision-making strategies.
Multimodal data fusion technology plays a vital role in the field of traffic safety. Its core lies in the comprehensive perception of the environment and the extraction of effective decision information by integrating multiple sensor data (including biometrics, road information, vehicle status, etc.). For example, cross-device safety authentication integrating iris, fingerprint and voice print [46], dynamic weighted multi-biometrics quality perception fusion model [47], fingerprint and palm print template encryption and matching [48], multi-modal driver anger recognition combined with in-car visual, voice and physiological signals [49], Real-time acquisition and joint authentication of finger-print-ECG multi-modal data in multi-user mode [50], joint optimization of signal denoising and feature extraction based on wavelet transform and convolutional neural network (CNN) [51], synchronous acquisition of static vital signs and dynamic pose features using millimeter wave radar [52], fusion of iris texture and voicing spectrum [53], Integrated multi-modal fatigue detection of eye tracking, steering wheel grip sensing and ECG signal [54]. To build-in uncertainty quantification makes multi-modal object detection more transparent and trustworthy, enhancing safety for autonomous driving applications [55]. Although multi-modal fusion has made some achievements in intelligent transportation, how to effectively integrate data of different modes under the framework of collaborative decision making is still an urgent problem to be solved.
2. Problem Formulation and Optimization Methodology
This paper proposes an innovative public transportation safety prevention and control system, which integrates multi-modal biometric recognition technology and road perception data, aiming to realize human-vehicle road collaboration and improve public transportation safety. By integrating driver biometrics and traffic environment perception information, the framework uses multi-modal learning and deep RL technology to carry out real-time risk assessment and prevention and control strategy generation. The framework consists of five main modules: Data acquisition (real-time data collection through multi-modal biometric sensors and road sensing devices), data preprocessing and fusion (preprocessing collected biometric data and environmental data, and optimizing data representation through data fusion algorithms), state assessment and risk analysis (using deep learning models to assess drivers' health and road safety conditions, Generate safety risk assessment results), collaborative decision-making and prevention and control (according to the risk assessment results, generate prevention and control strategies through the optimization algorithm, adjust driving control or start the automatic driving system in real time), prevention and control execution (according to the generated prevention and control strategy, implement the corresponding safety measures, and feedback the results to the control system). Each module undertakes specific functions, and ultimately realizes real-time monitoring and active prevention and control of traffic safety through coordination work.
In order to realize the active prevention and control of public traffic safety based on multi-modal biometrics recognition and road perception, this paper proposes a state assessment model that comprehensively considers driver physiological state and road environment information. For example, ErgoLAB human-machine environment synchronization platform V3.0 synchronously records the driver's physiological data in real time, and combines the hybrid model of principal component analysis and artificial neural network, which can achieve fast and accurate driver fatigue detection, with an average accuracy of more than 97%. The core purpose of the model is to provide basic data support for subsequent collaborative decision-making and prevention and control strategies by accurately evaluating drivers' health status and road safety conditions. The model is divided into two main parts: driver physiological state assessment and road safety status assessment, and the two are comprehensively evaluated through multi-modal data fusion technology.
Physiological state evaluation model in order to realize high-precision evaluation of driver's physical signs, this study proposed a physiological state evaluation model based on combination of deep learning and timing modeling. This model combines the essence of Long Short-Term Memory (LSTM) and CNN, and introduces Self-Attention mechanism and integrated learning strategy, so as to realize multi-dimensional comprehensive evaluation of drivers' physical signs and states. The establishment process of the driver's physical sign state assessment model is to extract the characteristic values helpful to evaluate the driver's physical sign state from the original data after biometric data collection, data preprocessing and feature extraction, and directly reflect the driver's fatigue degree, distraction and potential health problems, so as to establish the physical sign state assessment model. The specific model establishment process is described as follows:
Step 1: Data timing consistency preprocessing. The embodiment of the invention provides a driver factor intelligent state monitoring method, including collecting and preprocessing multi-modal physiological data of the driver, such as heart rate, respiratory rate, eye movement, skin electric response, etc., and ensuring data quality and consistency through data cleaning, de-noising, normalization, timing synchronization and other steps.
Step 2: Model training and feature selection. CNN is used for feature extraction and dimensionality reduction. The network can effectively extract spatial features through convolutional layers, such as periodic changes of heart rate waveform and fluctuation patterns of respiratory rate. For example, in driver fatigue assessment, CNNS are used to extract time-dependent features from multi-channel Electroencephalogram (EEG) signals and fuse spatial features through dense layers to achieve classification. These features are fed into the LSTM model, which is able to capture the long-term dependencies in the data from a time series perspective, revealing the evolution of the driver's physical state. During training, the model uses backpropagation algorithms to optimize parameters and minimize classification errors, so that it learns to accurately capture patterns closely related to fatigue or dangerous states from physiological signals such as heart rate and breathing rate.
CNNS and LSTM can be used in combination to efficiently process data that contains both spatial and temporal information. In deep learning networks, CNNS are mainly responsible for extracting spatial features of input data, such as local features in images, while LSTM are good at capturing dynamic changes in timing in time series data, such as long-term dependencies. The way they are combined is by designing a hybrid model architecture, input physiological signal data, and extract local spatial features in the data through a series of convolutional layers and pooling layers. These features are passed to the LSTM layer for time series modeling. The feature sequence output by CNN is passed into the LSTM network, and LSTM conducts time series modeling for these features, captures the time series dependence of physiological signals, and finally generates the evaluation result. In the implementation process of the model, the CNN layer is responsible for learning the spatiotemporal local pattern of the signal, while the LSTM layer effectively combines the timing information with the local features, and finally evaluates the driver's physical sign state. Its core algorithm is to replace the matrix multiplication operation in the traditional LSTM unit with the convolution operation:
where, $*$ denotes the convolution operation, $\circ$ denotes the Hadamard product. $W_f$, $W_i$, $W_C$ and $W_o$ are the convolutional kernel weights of the forget gate, input gate, candidate cell state, and output gate, respectively. Their dimensions are $k \times k \times c_{\text{in}} \times c_{\text{out}}$, where $k$ is the size of the convolutional kernel, $c_{\text{in}}$ is the number of input channels, and $c_{\text{out}}$ is the number of output channels, thus expressing the spatiotemporal characteristics of the learned data. $b_f$, $b_i$, $b_C$ and $b_o$ are the bias terms of the forget gate, input gate, candidate cell state, and output gate, respectively. Their dimensions match the output dimensions of the convolutional kernel. $x_t$ is the input data at the current time step, typically physiological signals or environmental perception data. $h_{t-1}$ is the hidden state of the previous time step, which contains the model's memory of the time series data up to that moment. $C_{t}$ is the cell state of the previous time step, recording the long-term memory. $\sigma$ is the sigmoid activation function, whose output value is between 0 and 1 and is used to control the flow of information.
$[h_{t-1}, x_t]$ denotes the concatenation operation between the hidden state of the previous time step and the input data of the current time step.
The algorithm can automatically learn local spatial information in the process of time series modeling, especially for complex spatiotemporal data scenes such as videos and images. Compared with traditional LSTM and CNN, which process spatiotemporal information separately, the algorithm reduces the computational complexity by sharing parameters, and can learn spatiotemporal dependence more efficiently.
Step 3: Multi-modal data fusion. A Self-Attention mechanism is used to weight physiological data from different sources. The self-attention mechanism dynamically adjusts the weights of different features by calculating their relevance in the current context. When the driver is in a high stress state, the breathing rate may have a greater impact on the evaluation results than the heart rate, and the system automatically assigns a higher weight to the breathing rate.
Step 4: Classification and decision output. The fused features are input into a DNN for the final classification of physical signs and states. Through multi-layer nonlinear mapping, DNN can obtain the driver's current sign state, and according to the state output corresponding risk assessment (normal, fatigue, danger) and corresponding safety warning. The system ensures the accuracy and comprehensiveness of the assessment through indicators such as precision rate and recall rate. The model uses softmax function for classification, outputs the probability distribution of each physical sign state, and determines the final classification result according to the highest probability value.
The multi-modal data fusion process in this work is conceptually inspired by biological multi-sensory integration, where the human brain continuously combines visual, auditory, and proprioceptive cues to form a coherent perception of the environment. In the proposed model, heterogeneous biometric signals (e.g., heart rate, respiratory rate, skin conductance) and road perception data are dynamically weighted through a self-attention mechanism, analogous to how the nervous system adjusts sensory priorities based on environmental context—for example, prioritizing visual cues under poor auditory conditions, or focusing on tactile feedback when visibility is low. This bio-inspired weighting improves the system’s adaptability and robustness in real-time safety assessment.
Rad safety assessment is an important analytical process based on multi-modal data. This paper designs and describes a multi-dimensional evaluation model, which combines vehicle perception, road conditions, environmental factors and traffic conditions to comprehensively evaluate road safety conditions, so as to provide drivers with effective safety early warning information and decision support. When assessing road safety conditions, a series of safety risk indicators are constructed according to various factors (such as road conditions, traffic flow, weather conditions, vehicle behavior, pedestrian/passenger behavior, etc.). Each indicator is used to assess a different source of risk and to provide support for a comprehensive assessment. The following are the main security risk indicators used in this article:
Road Condition Risk Index (RCRI): This index is based on the comprehensive evaluation of road safety factor, slope, road condition and traffic signs and marking. The safety factor is obtained through real-time road conditions, and the road conditions use advanced image processing algorithms, which can accurately identify potential road safety hazards such as cracks and potholes.
Traffic Flow Risk Index (TFRI): This index combines information such as traffic flow, vehicle density, and speed to assess traffic congestion and potential collision risk. Combined with on-board radar, camera monitoring data and traffic signal information, the system can calculate traffic flow in real time, and use professional models to accurately predict traffic conditions based on data such as traffic density and speed.
According to the research, adverse weather conditions such as fog, icy road surface, snow and rain have a significant impact on highway traffic safety. Through an external weather monitoring system, real-time weather data is obtained and weighted algorithms are used to calculate the impact of weather on traffic safety.
Driver Physiological Risk Index (DPRI): The physiological state of the driver has an important influence on driving safety. Using multi-modal biometric recognition technology, we collect the driver's heart rate, breathing rate and facial expression data, combined with psychological load and fatigue index, comprehensive assessment of the driver's physiological health, and based on the analysis of the impact on driving safety.
Vehicle Behavior Risk Index (VBRI): Vehicle behavior reflects the driver's reaction and behavior pattern under the current traffic conditions. Behaviors such as sharp braking and swerving can significantly increase road safety risks. The index assesses the safety of driving behavior through information such as vehicle acceleration and steering Angle.
In order to comprehensively consider various risk factors in road safety assessment, this paper designs a multi-level and multi-modal data fusion and risk assessment model.
Suppose there are number $n$ different risk indicators $R_1, R_2, \cdots, R_n$, and they represent the contribution of different risk sources to road safety. Each risk indicator has a weight $\omega_1, \omega_2, \cdots, \omega_n$, which $\omega_i$ represents the weight of the $i$ indicator, and the sum of all weights is 1 , $\omega_1+\omega_2+\cdots+\omega_n=1$. The weighted average method is used to carry out risk fusion, and a comprehensive road safety risk score is obtained:
where, $S_{\text {total }}$ is the final comprehensive safety score.
In risk fusion $\omega_1, \omega_2, \cdots, \omega_n$ is the key issue how to determine these weights. Traditional methods, which often rely on expert experience or manually setting the weights, may not be universally applicable. In order to improve the adaptability and accuracy of the model, this paper adopts the automatic weight learning method based on machine learning, and uses the Support Vector Regression (SVR) algorithm to identify the learning weights from the historical data.
Suppose there is a training data set $D=\left\{\left(R_1^{(k)}, R_2^{(k)}, \ldots, R_n^{(k)}, S_{\text {total }}^{(k)}\right)\right\}_{k=1}^K$, where $K$ is the number of training samples and $S_{\text {total }}^{(k)}$ is the true comprehensive safety score of the number $k$ sample. The optimal weight vector can be learned by minimizing the following loss functions $\omega=\omega_1, \omega_2, \cdots, \omega_n$:
By minimizing this loss function, the optimal weight $\omega=\omega_1, \omega_2, \cdots, \omega_n$ can be obtained, making the fusion effect of the model more accurate.
In practical applications, the safety evaluation results often contain uncertainty and fuzziness, and a single weight fusion method may not be able to deal with complex and fuzzy evaluation conditions. Therefore, in order to further enhance the robustness and accuracy of the model, we introduce a fuzzy logic controller (FLC) to provide decision support. Fuzzy logic is able to handle imprecise, non-linear situations, and expert rules and fuzzy reasoning mechanisms are added to the evaluation process. In order to further improve the accuracy and robustness of the evaluation model, we adopt the integrated learning method XGBoost, which integrates the prediction results of multiple learning algorithms to improve the overall efficiency of the model. Among them, the features extracted by CNN and other risk assessment indicators are used as inputs, and finally an integrated safety score is output.
In the active prevention and control system of public traffic safety based on human-vehicle-road coordination, how to make efficient decision and respond to traffic situation in real time is the key to the successful implementation of the system. Especially under the constraints of multi-party information and complex environment, the traditional decision-making algorithm often can not adapt to the dynamic changing traffic conditions. Therefore, the collaborative decision algorithm based on RL becomes an innovative and efficient solution. In this section, RL-driven collaborative decision-making algorithm for people, vehicles and roads will be introduced in detail, including model design, algorithm implementation and optimization strategy.
RL is a machine learning method that learns the optimal decision strategy by interacting with the environment. Through the interaction between the agent and the environment, RL uses the reward signal to guide the learning process, and finally realizes the learning of the optimal strategy. The collaborative decision-making algorithm driven by RL proposed in this study can optimize the decision-making process in the public transportation system by integrating the three major elements of people, vehicles and roads, effectively reducing the accident risk and improving the traffic efficiency.
The design of human-vehicle-road collaborative decision algorithm needs to consider three main subjects: driver (human), vehicle (car) and road (road, including pedestrians, etc.). After the driver, vehicle, and road state modeling has been completed, the decision process of RL can be formalized as a joint decision problem, in which the driver, vehicle, and road state are integrated into multidimensional inputs of the environment, and the agent continuously optimizes the decision strategy by interacting with these environments. Concretely, the state of the environment at each moment $t$ can be expressed as:
The decision problem of the agent is: in a given state $s_t$, choose the optimal action $a_t$ to maximize the future cumulative reward.
In order to realize human-vehicle-road collaborative decision making, this paper uses the method of Deep Reinforcement Learning (DRL), DNN and Deep Q-learning (DQN) for strategy learning. In the traditional Q-learning, the state-action value function $Q(s, a)$ is used to estimate the expected return of an action taken in a certain state. To solve the complex challenge of high-dimensional state space, this paper uses DNN to approximate the Q-value function, and then constructs a strategy network $Q(s, a ; \theta)$, where the neural network parameters are represented by $\theta$.
The goal of DQN is to learn Q values by minimizing the following loss functions:
where, $\gamma$ is the discount factor, $r$ is the reward, $\bar{\theta}$ is the parameter of the target network, $\max _{a^{\prime}} Q\left(s^{\prime}, a^{\prime} ; \bar{\theta}\right)$ is the maximum $Q$ value of the next state.
In order to avoid overfitting or high variance in the training process, DQN introduces experiential replay and target network mechanism. Experiential playback breaks the correlation between the data by storing the experience generated during the agent's interaction with the environment (i.e. the quadruple of state, action, reward, next state) and training by randomly drawing batches from it. The target network is used to calculate the target $Q$ value, which avoids introducing too large gradient update in the training process and improves the stability of the algorithm.
In an ITS, decisions need to consider not only a single goal, but must take into account multiple goals, including maximizing traffic efficiency, minimizing traffic accidents, maximizing system efficiency, maximizing driver health, minimizing energy consumption, etc. For this reason, the strategy optimization process needs to adopt the multi-objective optimization method. The goal of multi-objective optimization is to find a balanced scheme by weighing the contradictions and conflicts between different goals, so that each goal can reach a better level. In order to flexibly adjust the weight of each object according to the specific traffic situation, a multi-objective optimization method with adaptive weight adjustment is proposed in this paper. In particular, the weight of each objective function $\omega k(t)$ is set to change with the time $t$. The update of the weight is adjusted according to the feedback of the current status and environment.
The adaptive weight adjustment strategy is calculated by the following formula:
where, $\omega k(t-1)$ is the weight of the last time, $\eta$ is the learning rate, and $\delta \omega k(t)$ is the weight adjustment calculated based on real-time feedback, reflecting the change in importance of the objective function relative to other objectives. When the traffic flow is high, the system may be more inclined to optimize the efficiency and adjust the weight to reduce the waiting time; during bad weather or sudden traffic accidents, the system may increase the weight of safety goals.
The adaptive multi-objective optimization mechanism is inspired by swarm intelligence in natural systems, such as flocking birds or schooling fish, where individual agents adjust their movement based on local interactions while collectively maintaining optimal group formations. Similarly, in our vehicle–road–person coordination framework, each decision-making agent (representing human, vehicle, or infrastructure) continuously updates its strategy weights based on real-time environmental feedback. This local adaptation, coupled with global optimization via game theory, mirrors the distributed yet coordinated decision-making observed in biological collectives, ensuring rapid and stable system-level responses to dynamic traffic conditions.
3. Experiments and Results Analysis
Through a series of experiments, this paper verifies the effectiveness and advantages of the proposed proactive prevention and control model of public transportation safety based on multi-modal biometric recognition and road perception. This achievement is supported by the research in the field of biometric identification technology. By combining various biometric characteristics, this technology improves the accuracy and robustness of recognition and effectively prevents identity fraud. Specifically, this paper designed several experimental scenarios to evaluate the model's performance in different environments, including driver sign monitoring, road safety status assessment, and the synergistic effect of decision algorithms. By comparing with the traditional method, the advantages of the proposed method in traffic safety, decision-making accuracy and system response speed are demonstrated.
In order to obtain biometric data and road perception data, a variety of sensors were used in this experiment, including heart rate sensors, respiratory rate sensors, accelerometers, satellite positioning systems, vehicle cameras and lidar devices. Thanks to the support of relevant departments, the experiment successfully recruited 62 bus, taxi and passenger shuttle drivers from Guangzhou to participate in the test. Each of them put in an average of 7 hours a day, and together accumulated 34 days of valuable data. The data collection process includes driver physiological data monitoring and road environment information collection, and all the data are transmitted to the experimental platform in real time for processing and analysis. In the driver monitoring and safety evaluation system, the driver's physical signs data is collected every 10 seconds to ensure that the driver's physiological and psychological state changes can be captured in time. At the same time, the collection frequency of road perception data is once per second to ensure real-time monitoring of road conditions and provide instant information for driving safety. The experimental environment adopts a complex public transportation network, covering a variety of complex traffic scenarios such as urban roads, expressways and traffic intersections.
In this paper, several typical traffic scenarios such as urban roads, expressways, and traffic intersections are selected for experiments to comprehensively evaluate the performance of the proposed safety early warning model in practical applications. In order to ensure the diversity and authenticity of the experiment, the volunteer group includes drivers of different transportation modes, such as fixed route operation and non-fixed route operation, presence and absence of pedestrians. The data of these drivers can reflect the driving behavior characteristics and physical data in different traffic environments.
This paper synthesizes several key data, Including driver signs monitoring indicators (such as heart rate, respiratory rate, skin electric response, etc.), road safety perception indicators (such as traffic flow, speed, traffic signal status, etc.), driving behavior analysis indicators (such as rapid acceleration, sudden braking, etc.), traffic environment warning indicators (such as weather conditions, road events, etc.), and system status and performance indicators (such as vehicle health status, ADA S function state, etc.) to form multi-modal data fusion and collaborative decision-making indicators. By comparing with traditional methods in model prediction and error evaluation indicators (accuracy rate, recall rate, false alarm rate, missing alarm rate, response time, processing speed), the effectiveness and efficiency of the proposed method are tested.
During the experiment, this paper focused on typical traffic scenarios such as lane change, traffic congestion and sudden accidents, and tested the model's performance under these scenarios respectively. In each scenario, the model's predictive performance was compared with traditional methods (traditional rules-based algorithms, single-sensor systems), and the evaluation indicators included:
Accuracy: The rate at which the model correctly identifies traffic safety events.
Recall: The percentage of all actual safety incidents that the model correctly identifies.
False Alarm Rate: Measures the frequency of false alarms by the model.
Missing Alarm Rate: measures the percentage of actual security events that the model failed to alert.
Response Time: The time from the occurrence of a security event until the model produces a response.
Processing Speed: The time required for the model to process and make decisions.
In the urban road scenario, the safety assessment effects of lane change, traffic congestion and sudden accidents are mainly investigated. Specifically, the experiment conducted a comprehensive test of the vehicle's driving state and emergency response ability under intersection turns, traffic congestion and sudden accidents. The experiment results of safety assessment in urban road scenarios are shown in Table 1.
Scenarios | Indicators | The Method Proposed in This Paper | Traditional Method |
Lane changes | Accuracy | 0.9325 | 0.8644 |
Recall rate | 0.9415 | 0.8502 | |
False alarm rate | 0.0324 | 0.0911 | |
Missing alarm rate | 0.0432 | 0.1020 | |
Response time | 0.82 s | 1.48 s | |
Processing speed | 2.32 frames/s | 1.52 frames/s | |
Traffic congestion | Accuracy | 0.9085 | 0.8275 |
Recall rate | 0.9151 | 0.8092 | |
False alarm rate | 0.0422 | 0.1044 | |
Missing alarm rate | 0.0521 | 0.1232 | |
Response time | 0.90 s | 1.56 s | |
Processing speed | 2.12 frames/s | 1.31 frames/s | |
Breakdowns | Accuracy | 0.9213 | 0.8441 |
Recall rate | 0.9322 | 0.8152 | |
False alarm rate | 0.0293 | 0.0820 | |
Missing alarm rate | 0.0421 | 0.1150 | |
Response time | 0.78 s | 1.39 s | |
Processing speed | 2.44 frames/s | 1.42 frames/s |
In the lane-changing scenario, the proposed method showed significant advantages in accuracy and recall rate of 93.25% and 94.15%, respectively, which improved by 6.81% and 9.13% compared with the traditional method. The significant decrease of false alarm rate and missing alarm rate indicates that the model can identify potential risks more accurately and respond quickly in complex and changeable lane changing scenarios. The response time is only 0.82 seconds, which is a significant improvement over the traditional method of 1.48 seconds, and the processing speed is also significantly optimized (2.32 frames per second versus 1.52 frames per second).
In the traffic jam scenario, the proposed method also shows high performance. The accuracy rate and recall rate reached 90.85% and 91.51% respectively, which increased by 8.10% and 10.59% compared with the traditional method. Although traffic congestion made the prediction task more difficult, the proposed method was able to accurately identify and reduce false alarms (4.22% vs 10.44%) and missing alarms (5.21% vs 12.32%). For example, the high-order multivariable Markov model can effectively improve the recognition rate of traffic congestion prediction. For example, in the case prediction study of Section 3 of Renmin South Road in Chengdu City, the prediction result shows that the recognition rate of traffic congestion prediction is 48% and the misjudgment rate is 16%. The response time and processing speed also have great advantages, which are shortened by 0.66 seconds and 0.79 frames per second respectively.
For the handling of emergencies, the proposed method is superior to the traditional method in terms of accuracy (92.13%) and recall rate (93.22%). Especially in terms of missing alarm rate and false alarm rate, the proposed method is obviously superior to the traditional method, indicating that the model can identify and deal with the emergency more effectively. The optimization of response time shows that the model can make emergency decision faster, thus reducing the risk after the accident.
In the expressway scenario, the impact of lane change, emergency stop and sudden traffic accident on the safety assessment is mainly evaluated. The test includes the response and judgment of the model in the case of fast lane change and emergency accident. The experiment results of safety assessment under highway scenarios are shown in Table 2.
Scenarios | Indicators | The Method Proposed in This Paper | Traditional Method |
Lane changes | Accuracy | 0.9415 | 0.8785 |
Recall rate | 0.9532 | 0.8513 | |
False alarm rate | 0.0250 | 0.0782 | |
Missing alarm rate | 0.0324 | 0.0852 | |
Response time | 1.01 s | 1.80 s | |
Processing speed | 2.62 frames/s | 1.74 frames/s | |
Traffic congestion | Accuracy | 0.9252 | 0.8524 |
Recall rate | 0.9382 | 0.8290 | |
False alarm rate | 0.0301 | 0.0842 | |
Missing alarm rate | 0.0441 | 0.1037 | |
Response time | 1.05 s | 1.92 s | |
Processing speed | 2.42 frames/s | 1.63 frames/s | |
Breakdowns | Accuracy | 0.9472 | 0.8857 |
Recall rate | 0.9552 | 0.8573 | |
False alarm rate | 0.0213 | 0.0721 | |
Missing alarm rate | 0.0352 | 0.0865 | |
Response time | 1.03 s | 1.79 s | |
Processing speed | 2.54 frames/s | 1.65 frames/s |
In the lane change scenario of highway, the proposed method shows high accuracy and recall rate, which are 94.15% and 95.32% respectively, which are improved by 6.30% and 10.19% compared with the traditional method. Compared with the traditional method, the false alarm rate and missing alarm rate are greatly reduced, especially under the condition of high speed driving, the proposed method significantly reduces the probability of false alarm and missing alarm. In terms of response time and processing speed, the proposed method shows a response time of 1.01 seconds, which is reduced by 0.79 seconds compared with the traditional method, and can deal with emergencies more quickly.
In the traffic jam scenario of expressway, the proposed method still maintains high performance. The accuracy rate was 92.52 percent and the recall rate was 93.82 percent, which improved 7.28 percent and 10.92 percent compared with the traditional method. The false alarm rate dropped to 3.01 percent and the missing alarm rate to 4.41 percent, a significant decrease that further enhanced the reliability of the system. In terms of processing speed, the proposed method improved by 0.8 frames per second compared with the traditional method, and the response time was also reduced by 0.87 seconds.
In the identification and treatment of emergency, the proposed method shows the most obvious advantages. The accuracy rate is 94.72%, the recall rate is 95.52%, which is 6.15% and 9.79% higher than the traditional method. The reduction of false alarm rate and missing alarm rate makes the model more reliable in emergency situations on expressways, which can reduce unnecessary early warning and improve the timeliness of decision making.
The traffic intersection scenario covers complex situations such as lane change, traffic signal switching and pedestrian crossing. In this environment, sudden accidents and traffic congestion greatly increase the traffic risk, which requires the model to have stronger emergency response ability. The experiment results of safety assessment under traffic intersection scenarios are shown in Table 3.
Scenarios | Indicators | The Method Proposed in This Paper | Traditional Method |
Lane changes | Accuracy | 0.9132 | 0.8351 |
Recall rate | 0.9215 | 0.8022 | |
False alarm rate | 0.0392 | 0.0961 | |
Missing alarm rate | 0.0513 | 0.1082 | |
Response time | 0.85 s | 1.58 s | |
Processing speed | 2.32 frames/s | 1.41 frames/s | |
Traffic congestion | Accuracy | 0.8972 | 0.8192 |
Recall rate | 0.9044 | 0.7913 | |
False alarm rate | 0.0453 | 0.1011 | |
Missing alarm rate | 0.0562 | 0.1252 | |
Response time | 0.92 s | 1.73 s | |
Processing speed | 2.22 frames/s | 1.51 frames/s | |
Breakdowns | Accuracy | 0.9052 | 0.8314 |
Recall rate | 0.9122 | 0.8068 | |
False alarm rate | 0.0364 | 0.0896 | |
Missing alarm rate | 0.0482 | 0.1043 | |
Response time | 0.88 s | 1.62 s | |
Processing speed | 2.42 frames/s | 1.53 frames/s |
In the lane change scenario of traffic intersections, the proposed method has high accuracy (91.32%) and recall rate (92.15%), which are 7.81% and 11.93% higher than the traditional method, respectively. Both false alarm rate and missing alarm rate are significantly reduced (false alarm rate reduced by 5.69%, missing alarm rate reduced by 5.69%), thus reducing the burden of traffic management personnel. Response time and processing speed showed faster processing speed in the intersection environment, and the model was able to respond faster in the complex traffic environment.
In the case of traffic congestion, the accuracy rate of the proposed method is 89.72% and the recall rate is 90.44%, which are improved by 7.80% and 11.31% compared with the traditional method. The false alarm rate and missing alarm rate are significantly reduced, and the model can effectively identify traffic congestion and make corresponding treatment. The model also showed faster response times and processing speed (0.81 seconds less response time and 0.69 frames per second faster processing).
In the aspect of emergency handling, the proposed method shows high accuracy, the accuracy rate is 90.52%, the recall rate is 91.22%, which is 7.38% and 10.54% higher than the traditional method. The false alarm rate and missing alarm rate are greatly reduced, indicating that the method has strong reliability in emergency situations, and can respond in time and judge the accident type accurately.
This section comprehensively evaluated the performance of the proposed collaborative prevention and control model based on multi-modal biometrics recognition and road perception in complex traffic scenarios through a comprehensive discussion of model efficacy, ablation experimental results and robustness analysis. Through a series of experimental designs, the actual effect of the model is verified, and the contribution of each module to the overall performance is revealed.
In this section, the effectiveness of the proposed collaborative prevention and control model based on multi-modal biometric recognition and road perception in various complex scenarios will be deeply analyzed based on diversified evaluation indicators, especially the actual performance of the model in driver sign monitoring and road safety assessment tasks. By comparing the traditional method and the proposed method under F1-Score and multi-objective optimization system comprehensive efficiency index, the superiority of the model is fully demonstrated.
The F1-Score is a harmonic average of accuracy and recall rates, which takes into account the accuracy of such predictions and the balance of recalls, and is particularly suitable for applications that are highly dependent on accurate predictions. In order to demonstrate the effectiveness of the proposed model in different traffic environments, three scenarios of urban roads, expressways and traffic intersections were evaluated with F1-Score. The experiment results of F1-Score analysis are shown in Table 4.
Type of Scene | This Paper Proposes the Model F1-Score | The Traditional Method F1-Score | False Alarm Rate of the Model in This Paper | False Alarm Rate of Traditional Method |
Urban roads | 0.9052 | 0.8365 | 0.0423 | 0.0731 |
Highways | 0.9241 | 0.8553 | 0.0315 | 0.0622 |
Traffic intersection | 0.8842 | 0.8122 | 0.0543 | 0.0811 |
Through data comparison, it is obvious that the proposed model shows significantly higher F1-Score than the traditional method in all test scenarios. Especially in the highway scenario, by using advanced prediction models, such as KAN neural network, improved VMD-GAT-GRU and GA-LSTM models, the F1-Score of the model is increased by about 7 percentage points, which indicates that the prediction accuracy of the model has been significantly improved in an environment with fast speed and frequent changes, such as the highway.
The comprehensive performance evaluation of the system is the comprehensive embodiment of a variety of evaluation indicators, which can fully reflect the overall performance of the model. By combining the accuracy rate, recall rate, F1-Score, false alarm rate and response time and other indicators, the comprehensive performance score of each scenario is calculated to comprehensively analyze the comprehensive performance of the proposed model.
The comprehensive performance score is based on the weighted average of several evaluation indicators. In order to comprehensively consider the impact of each evaluation index on the system performance, this experiment is based on the experience of industry experts, and the accuracy rate, recall rate, F1-Score, false alarm rate and response time are weighted, with the weights of 25%, 25%, 20%, 15% and 15% respectively. Based on the above scoring methods, this paper calculates the comprehensive performance score for each scenario, and compares the proposed model with the traditional method. The experiment results of comprehensive performance evaluation of the system are shown in Table 5.
Type of Scene | Comprehensive Efficiency Score of the Proposed Method in This Paper | Comprehensive Efficiency Score of the Traditional Method |
Urban roads | 0.8925 | 0.7848 |
Highways | 0.9028 | 0.8212 |
As can be seen from the table, the comprehensive efficiency score of the proposed model in each traffic scenario is significantly higher than that of the traditional method, indicating that its comprehensive performance in practical applications has significant advantages. Especially in high-risk scenarios such as complex traffic intersections and expressways, the proposed model can effectively improve safety and reduce false alarm rate, and has higher applicability and reliability.
This paper verifies the influence of different modules on traffic safety early warning system through ablation experiment, including sign monitoring module, road perception module, multi-modal data fusion module and decision and early warning module. In order to comprehensively evaluate the contribution of each module, we carefully selected the following six core indicators for in-depth analysis: Accuracy (Accuracy), Recall (Recall), False Alarm Rate (false Alarm), Missing Alarm Rate (missing alarm), Response Time (Response Time), and Processing Speed (Processing Speed). These indicators can comprehensively reveal the performance, accuracy and processing power of the model in different application scenarios. In order to verify the functions of different modules, ablation tests were carried out on four modules, namely, the physical signs monitoring module, the road sensing module, the multi-modal data fusion module, and the decision and early warning module. The above modules were replaced by traditional modes respectively, and the performance indicators after each replacement were recorded. In the experiment, three typical scenarios of urban roads, expressways and traffic intersections were also selected for testing. Model accuracy analysis in different scenarios under ablation experiment is shown in Table 6.
Type of Scene | Accuracy of Complete Model | Accuracy of Alternative Biometric | Accuracy of Alternative Road Perception | Accuracy of Alternative Multimodal Fusion | Accuracy of Alternative Decision Module |
Urban roads | 0.9252 | 0.8627 | 0.8485 | 0.8234 | 0.8113 |
Highways | 0.9432 | 0.8795 | 0.8665 | 0.8425 | 0.8325 |
Traffic intersection | 0.9010 | 0.8431 | 0.8314 | 0.8054 | 0.7927 |
Accuracy
Accuracy measures the proportion of all predictions the model classifies correctly. As can be seen from the table above, the accuracy rate dropped the most after replacing the signs monitoring module, especially in the two complex scenarios of urban roads and traffic intersections. By monitoring the driver's physiological state in real time, the physical signs monitoring module can accurately determine whether there is fatigue driving, excessive pressure or other health problems, so that preventive measures can be taken in time. For example, according to relevant studies, fatigue driving is one of the main causes of traffic accidents, especially between 12 a.m. and 6 a.m. and in the evening, the death rate of fatigue driving accidents is as high as 83%. Therefore, the monitoring system can help drivers stop and rest in time when they feel tired, and avoid traffic accidents caused by fatigue driving. This kind of information is crucial for accurately determining whether a driver is in a dangerous state. For example, tired driving can lead to unresponsive driving and can be at risk even if there is no obvious traffic accident occurring in the environment. If this module is replaced, the system will not be able to capture the driver's health in real time, resulting in reduced accuracy.
The replacement of the road awareness module also resulted in a significant drop in accuracy. The road awareness module is mainly responsible for collecting real-time information about the traffic environment, such as lanes, obstacles, and the location of other vehicles. Without the module, the system cannot fully understand the complexity of the traffic environment to effectively identify potentially dangerous situations (such as vehicles suddenly changing lanes ahead or obstacles on the road). As a result, the accuracy rate also drops after the module is replaced, especially in complex environments such as traffic intersections, where traffic signals and lane information are critical.
Although the replacement of the multimodal data fusion module has a relatively small impact, it will still lead to a certain degree of performance degradation. This is because multimodal data fusion can combine driver biometrics and road perception data to provide more comprehensive information to the decision module. Without the fusion module, the system can only rely on a single data source to make judgments, and such judgments are affected by missing information, which reduces accuracy.
Although the substitution of decision module is not as significant as the previous several modules, it will also cause the overall effect of the system to decline. The main function of decision module is to convert the information after multi-modal fusion into specific early warning signals. Without this module, the system will be difficult to issue early warning quickly and effectively, and thus hinder the improvement of accuracy. Analysis of model recall in different scenarios under ablation experiment is shown in Table 7.
Type of Scene | Recall of Complete Model | Recall of Alternative Biometric | Recall of Alternative Road Perception | Recall of Alternative Multimodal Fusion | Recall of Alternative Decision Module |
Urban roads | 0.9172 | 0.8578 | 0.8332 | 0.8020 | 0.7854 |
Highways | 0.9361 | 0.8726 | 0.8581 | 0.8355 | 0.8190 |
Traffic intersection | 0.8942 | 0.8325 | 0.8152 | 0.7831 | 0.7713 |
Recall
The recall rate reflects the ability of the system to identify all real cases (i.e., actual hazardous events). The impact of alternative biometrics and road awareness modules on recall rates is very clear, especially in the two complex driving scenarios of urban roads and traffic intersections.
After the physical signs monitoring module was replaced, there was a significant drop in recall rates. This is because the physical signs monitoring module can monitor the physiological state of the driver in real time and determine whether the driver is in a state of fatigue or high stress, which are directly related to the risk of traffic accidents. Without biometric data, the system could miss red flags identified by health status (such as a driver's reduced attention span or slow reaction times). As a result, recall rates can drop significantly, especially in scenarios where drivers are driving for extended periods.
After replacing the road awareness module, the recall rate also dropped significantly. The module is essential for a comprehensive perception of the traffic environment, especially in a dynamic and changing traffic environment. The road awareness module can capture the abnormal movements of other vehicles and potential road obstacles in real time, ensuring that early warning signals are sent in time. If this module is replaced, the system will not have a full picture of potential threats in the environment, resulting in a lower recall rate.
In complex driving scenarios such as traffic intersections, the replacement of the road awareness module and the sign monitoring module significantly reduces the system's recall rate. This is because the complex traffic signals, pedestrian dynamics, and unexpected situations in this scenario place high demands on the driver's health and accurate perception of the traffic environment. Analysis of false alarm rate of the model in different scenarios under ablation experiment is shown in Table 8.
Type of Scene | Complete Model | Alternative Biometric | Alternative Road Perception | Alternative Multimodal Fusion | Alternative Decision Module |
Urban roads | 0.0452 | 0.0680 | 0.0732 | 0.0811 | 0.0872 |
Highways | 0.0382 | 0.0521 | 0.0572 | 0.0645 | 0.0724 |
Traffic intersection | 0.0537 | 0.0653 | 0.0717 | 0.0823 | 0.0891 |
False Alarm Rate
False alarm rate is the percentage of normal conditions that the system mistakenly misidentifies as dangerous. Although the false alarm rate generally increases after the replacement of biometric and road awareness modules, studies on the reduction of false alarm rate of biometric technology show that in some applications, false positives can be reduced through technical optimization, indicating that the rise in the risk of system false positives is not irreversible.
In the highway, a relatively simple and high-speed driving environment, the false alarm rate increased significantly after replacing the sign monitoring module. In general, a driver's health status changes smoothly while driving on the highway, but when a health problem arises, it is more likely to lead to an unexpected reaction quickly. Without a sign monitoring module, the system would not be able to recognize fatigue or reduced concentration in a timely manner, which could lead to false positives when there is no real danger. The rise in false alarms, therefore, suggests that biometric data is crucial to reducing false alarms.
The replacement of the road perception module also has a great impact on the false alarm rate, especially in complex traffic scenes such as traffic intersections. The road awareness module is responsible for monitoring the traffic environment and identifying possible hazards ahead (e.g., approaching vehicles, changes in traffic signals, etc.). Without the module, the system could incorrectly judge routine traffic situations as dangerous, which could lead to false alarms. Analysis of model missing alarm rate in different scenarios under ablation experiment is shown in Table 9.
Type of Scene | Complete Model | Alternative Biometric | Alternative Road Perception | Alternative Multimodal Fusion | Alternative Decision Module |
Urban roads | 0.0322 | 0.0472 | 0.0528 | 0.0678 | 0.0633 |
Highways | 0.0240 | 0.0352 | 0.0414 | 0.0462 | 0.0512 |
Traffic intersection | 0.0414 | 0.0544 | 0.0593 | 0.0681 | 0.0720 |
Missing Alarm Rate
The missing alarm rate reflects the percentage of the system that fails to issue an early warning and represents the sensitivity of the system. In the ablation experiment, the increase in the missing alarm rate directly reflects the failure of the system to respond in time to the actual danger after the module replacement.
After replacing the physical signs monitoring module, the alarm leakage rate increased significantly, especially when the driver was in a state of high stress or fatigue, the system failed to recognize these states in time, and thus failed to issue an alarm in advance. For example, tired drivers are slow to react, and the system needs to judge the driver's health status through biometrics. If this module is replaced, the system will not monitor the driver's health status enough, and the number of missing alarms will increase.
After the replacement of the road awareness module, the false alarm rate also increased. Without the road awareness module, the system cannot accurately perceive potential threats to the traffic environment, such as vehicles suddenly changing lanes in front of them, obstacles, etc. As a result, the system fails to recognize dangerous situations in a timely manner, resulting in missing alarms. Analysis of model response time in different scenarios under ablation experiment is shown in Table 10.
Type of Scene | Complete Model(s) | Alternative Biometric(s) | Alternative Road Perception(s) | Alternative Multimodal Fusion(s) | Alternative Decision Module(s) |
Urban roads | 0.8520 | 1.1204 | 1.1502 | 1.2237 | 1.3140 |
Highways | 0.7221 | 0.9527 | 1.0218 | 1.0807 | 1.1821 |
Traffic intersection | 0.9518 | 1.2351 | 1.3017 | 1.3514 | 1.4526 |
Response Time
Response time is a measure of how long it takes a system to go from identification to issuing an alert. The response time of the system generally increases after any module is replaced. In particular, the response time increases the most after replacing the decision module.
The function of the decision module is to make real-time decision according to the information after the fusion of multi-modal data, and to issue alarm quickly. After the module was replaced, the system response slowed down, resulting in significantly longer decision times. This is especially important for high-speed driving scenarios, where a timely response is essential in case of danger. Analysis of model processing speed in different scenarios under ablation experiment is shown in Table 11.
Type of Scene | Complete Model (FPS) | Alternative Biometric (FPS) | Alternative Road Perception (FPS) | Alternative Multimodal Fusion (FPS) | Alternative Decision Module (FPS) |
Urban roads | 28.3154 | 30.1548 | 29.6651 | 27.4457 | 26.2781 |
Highways | 32.7267 | 34.5421 | 33.8147 | 31.9525 | 30.8147 |
Traffic intersection | 25.6268 | 27.2582 | 26.4545 | 24.8871 | 23.6127 |
Processing Speed
The processing speed reflects how efficiently the system processes data. In the ablation experiment, after replacing the physical signs monitoring module, the system response time and stability were optimized, and the processing speed was slightly improved. The processing of biometric data is usually more complex than environmental data (such as vehicle speed, road conditions, etc.), and the replacement of this module reduces the processing burden on the system, thereby improving the overall processing speed.
However, the replacement of the road awareness module did not significantly improve the processing speed. Although the replacement of the module reduced the amount of computation required for the perceptual data, the overall system's processing speed was not significantly improved. This is because the road sensing module is relatively low in computational complexity and its contribution to system performance is critical, so the replacement of this module will not significantly improve the processing speed.
According to the results of ablation experiments, the physical signs monitoring module, the road sensing module and the multi-modal data fusion module play a crucial role in the system performance. The sign monitoring module has an important impact on the accuracy, recall rate and false alarm rate of the system, especially in the monitoring of the driver's health status. The road perception module is the key for the system to accurately understand the traffic environment and identify potential hazards in time. The multi-modal data fusion module enhances the information integration ability of the system and improves the adaptability to complex scenes. Although the decision module has limited impact in terms of accuracy, it plays a key role in improving the system response speed and ensuring real-time performance.
As shown in Figure 1, through the use of advanced driver status detection technology, the system can effectively improve the accuracy of various driving states after replacing different module signals, such as the accuracy of the driver danger level R1, R2 and R3 reached 95.8%, 94.5% and 96.3%, respectively. Four curves represent the changes in system accuracy after replacing signals from the signs monitoring module, the road perception module, the multimodal data fusion module and the decision and warning module, respectively. The curve can reveal the importance of each module in different driving states. The trend of the curve can show how much the performance of the system deteriorates in various scenarios when a module is replaced.

Figure 2 shows the contribution of different module signals (sign monitoring module, road perception module, multi-modal data fusion module, decision and warning module) to the system performance under different driving states. This can be represented by the weight change curve of each mode signal, showing the importance of each module in different driving states. This curve can reveal how the safety active prevention and control system adaptively adjusts the signal weight under different driving scenarios and highlights the contribution of each signal.

In addition, to strengthen the analysis, Figure 3 shows the correlation change curve between the driver's reaction time under different driving states and the physiological state detected by the system, shows the relationship between the reaction time and the fatigue or tension state identified by the system, reveals the correlation between the driver's reaction time and the physiological state changes, and helps to show how the safety active prevention and control system captures these changes.

The physical signs monitoring module has a significant impact on the accuracy and recall rate of the model, especially in the evaluation of driver health status. After replacement, the false alarm rate of the system increases, resulting in a corresponding weakening of the early warning ability. Nevertheless, the replacement of this module has certain positive effects on response time and processing speed, so it can be regarded as a module with greater impact. The road perception data module is crucial for accurately assessing road conditions, and the recall rate and accuracy of the system after replacement significantly decreased, especially in complex traffic scenarios. Since the replacement of this module has the most significant negative impact on system performance, it can be concluded that it is the core component of the system. After replacing the multi-modal data fusion module, the system will lose the multi-dimensional data support, and the system may only rely on a single data source, which leads to the decline of the system's prediction accuracy, recall rate, response time and other indicators; The system may have limitations in recognizing and processing different situations, which will lead to a decrease in accuracy and recall rate; The system's risk warning ability may be significantly reduced, especially in complex traffic scenarios, and the system may not be able to fully evaluate the interaction between the driver's physiological state and road conditions, thus affecting the accuracy and timeliness of traffic safety prediction. After replacing the decision-making algorithm module, the accuracy and recall rate of the system will be significantly reduced, and different data sources cannot be effectively integrated, affecting the effectiveness of decision-making. The role of decision algorithm in the system can not be replaced, especially in the integration of multi-modal data and the realization of risk warning function.
To further validate the bio-inspired principles embedded in the proposed system, additional experiments were conducted focusing on two aspects: sensory integration robustness and cooperative adaptability. For sensory integration robustness, partial sensor failure and noisy data conditions were simulated by randomly masking 20–40% of input channels (either biometric or road perception streams). Results show that the multi-modal fusion architecture maintained over 90% of its baseline accuracy, reflecting the fault tolerance observed in biological nervous systems during partial sensory loss. For cooperative adaptability, dynamic traffic scenarios with rapidly changing road conditions and heterogeneous driver states were simulated, analogous to environmental perturbations in animal groups. The game-theoretic coordination strategy demonstrated stable decision-making with less than 5% degradation in response time and accuracy, indicating its resilience to environmental uncertainty. These findings support the analogy between the proposed vehicle–road–person coordination mechanism and swarm intelligence in nature, where decentralized entities adjust behaviors through local interactions to achieve globally optimal outcomes.
4. Conclusion and Future Work
The public transport safety prevention and control system based on multi-modal biometric identification technology in this paper significantly improves traffic safety and risk warning ability by integrating drivers' biometric data and road perception information. The experimental verification of the system in a number of complex traffic scenarios shows that its accuracy and recall rate are higher than traditional methods, and the response time is shorter, showing great potential in traffic safety early warning. However, despite the achievements, the practical application of the system still faces several challenges, including the following:
First, the accuracy of biometric data is limited by both sensor performance and individual driver differences. The variability of the driver's physical signs, affected by a variety of external factors, may lead to the fluctuation of the sensor's detection accuracy, and then weaken the accuracy of the early warning system. In view of the above problems, future research should focus on improving sensor technology and optimizing data processing algorithms, in order to reduce the negative impact of individual differences on system performance and ensure that reliable monitoring data can be obtained under various driver states.
Secondly, real-time performance and accuracy are another challenge facing road perception information. At present, the system relies on traffic monitoring equipment and sensor networks to provide real-time road condition information. However, the accuracy of data collection may be limited in complex urban traffic environments or inclement weather conditions. Therefore, future research should focus on further improving the accuracy and real-time performance of road perception technology, and actively explore more efficient data acquisition approaches, such as data fusion by combining vehicle networking technology and high-precision maps.
In addition, although the collaborative decision algorithm designed in this paper can cope with multi-objective optimization problems well, the existing algorithm still has room for improvement when dealing with more complex and dynamic traffic environments. For example, how to deal with extreme situations such as sudden traffic accidents, unpredictable weather changes or physiological abnormalities of drivers will be the focus of future research. In particular, how to achieve real-time dynamic adjustment in the decision-making process to flexibly respond to complex and changing environmental challenges will be the core of future algorithm optimization.
This study proposes a public traffic safety prevention and control method based on multi-modal biometric recognition and road perception, aiming to improve the accuracy and timeliness of traffic safety early warning. By integrating drivers' biometric data and road perception information, a collaborative decision making algorithm driven by RL is designed, which effectively enhances the intelligent decision making ability of collaborative decision making between people and vehicles. The experimental results show that the proposed method has high accuracy and low response time in multiple complex scenarios, especially in changeable and complex environments, and can significantly improve traffic safety. For example, the study of real-time system response time test method shows that response time is one of the important indicators to measure the performance of real-time system, and it is directly related to the reliability and effectiveness of the system. In traffic systems, rapid response time can reduce errors and delays in system processing and improve the reliability of the system.
The research of this paper provides a new way of thinking for risk prediction and decision optimization in ITS. By monitoring the driver's physiological state in real time and combining with road information, the safety of public transportation system is further improved. Although traffic safety education has achieved good results in primary school experiments, such as simulating real traffic environment and games to improve students' traffic safety awareness, more verification is still needed in practical applications to ensure the reliability and adaptability of education measures, and optimize the education effect to ensure that it can cope with more complex and changeable traffic environments.
Although this study has made some progress in the field of traffic safety prevention and control, there is still room for further improvement. Future research can be expanded from the following directions:
Optimization and innovation of sensor technology: More types of biometric sensors and higher precision road sensing devices can be explored in the future to enhance the data acquisition capability of the system. At the same time, how to reduce the sensor interference factors, improve its adaptability in a variety of complex environments, to further improve the accuracy of data and reliability of the system.
Algorithm optimization and deep learning application: In the future, deep learning methods can be introduced, especially self-supervised learning and transfer learning technologies, to improve the system's adaptive ability to complex traffic scenarios. Through the learning of large-scale data, the system can better understand the traffic behavior pattern and improve the intelligence and individuation of the decision-making process.
Multi-scenario verification and deployment: Future research should expand the scope of experiments to carry out verification in a variety of actual traffic scenarios, especially under high traffic, extreme weather and emergency situations, to further test the stability and reliability of the system. For example, in the research and development of intelligent driving assistance system, Extrestone Automobile adopts advanced sensor technology, strict durability test and reliability verification to ensure that the system can operate stably in various harsh environments. In addition, based on the reliability analysis of urban road traffic system operating status evaluation research, including the construction of lane road smooth rate, traffic accident rate and other reliability index system, to evaluate the road traffic system operating status, for the improvement of system stability provides a theoretical and practical basis. Based on this, it can provide more comprehensive data support for the actual deployment. In order to meet the needs of real-time traffic safety and improve the real-time performance and response speed of the system, the computing efficiency and hardware architecture of the algorithm can be optimized in the future. For example, the adoption of high-performance computing platforms such as edge computing can significantly reduce the response time and ensure that the system can still perform efficient processing even in the case of heavy traffic. Case studies of ITSs show that through real-time monitoring and intelligent scheduling, traffic flow can be effectively improved and traffic safety enhanced.
Data privacy protection and security research: With the popularization of ITS, how to ensure the security of biometric data of drivers and passengers and avoid privacy disclosure is an urgent problem to be solved in the future. Future research can strengthen the exploration of data privacy protection technology, such as encryption technology, differential privacy methods, etc., to ensure the security and legitimacy of data.
In general, public transportation safety prevention and control system based on multi-modal biometrics recognition and road perception has broad application prospects in the field of intelligent transportation, can effectively improve traffic safety, and provide new solutions for future traffic management and public safety. With the continuous progress of technology, the system proposed in this paper will play a greater role in the real world and promote the further development and application of intelligent transportation technology.
Conceptualization, L.W. and J.F.Y.; methodology, J.C.X.; software, W.T.J.; validation, W.T.J., and L.F.L.; data curation, W.T.J. and L.H.Z.; writing—original draft preparation, L.W. and J.F.Y.; writing—review and editing, W.T.J. and Y.Y.W.; visualization, W.T.J. All authors have read and agreed to the published version of the manuscript.
The datasets generated and analyzed during the current study are owned and managed by the local government authorities. Due to data sensitivity and confidentiality agreements, the datasets are not publicly available.
The authors declare that no financial or other contractual agreements between the company and the authors or their institutions influenced the study design, results, interpretation, or reporting of this work.
The authors declare that no generative artificial intelligence (AI) or AI-assisted technologies were used in the writing, analysis, interpretation, or scientific content generation of this manuscript.
Baidu Translate was used solely as a language translation tool to assist with translation and language refinement. The authors take full responsibility for the accuracy, integrity, and originality of the content presented in this manuscript.
