Milling Chatter Detection Based on an Optimized iTransformer-BiGRU-Random Forest Ensemble Model

hongdan shen; haining gao; lin yang; rongyi li; yang yong; shule xing

Outline

Open Access

Research article

Milling Chatter Detection Based on an Optimized iTransformer-BiGRU-Random Forest Ensemble Model

Hongdan Shen^1,2

,

Haining Gao¹^*

,

Lin Yang²

,

Rongyi Li²

,

Yang Yong¹

,

Shule Xing¹

¹

Henan Province International Joint Laboratory of New Energy Digitalization Technology, Huanghuai University, 463000 Zhumadian, China

²

Key Laboratory of Advanced Manufacturing and Intelligent Technology, Harbin University of Science and Technology, 150080 Harbin, China

Precision Mechanics & Digital Fabrication

|

Volume 2, Issue 4, 2025

|

Pages 222-233

https://doi.org/10.56578/pmdf020402

Received: 08-10-2025,

Revised: 09-25-2025,

Accepted: 10-09-2025,

Available online: 10-16-2025

View Full Article|

Download PDF

Abstract:

To address the inadequate accuracy in chatter detection during milling operations, this study proposes a novel milling chatter detection methodology based on an optimized iTransformer-BiGRU-Random Forest (iTBU-RF) hybrid model. Initially, sensitivity analysis of time-frequency domain features is conducted employing Pearson correlation coefficients and significance levels to identify the features most sensitive to chatter detection. Subsequently, a chatter detection model integrating iTBU and RF is constructed. The hyperparameters of the ensemble model are optimized through the Ivy optimization algorithm. Following hyperparameter optimization, the model’s accuracy is substantially enhanced, achieving a maximum improvement of 2.40% compared to the pre-optimized configuration. Upon feature optimization, the model maintains superior classification performance while simultaneously reducing training time from 153.83 seconds to 116.74 seconds, thereby improving computational efficiency by approximately 24.11%. In comparison with benchmark methodologies, the proposed approach demonstrates optimal performance across all evaluation metrics, including accuracy. This investigation provides a novel technological framework for enhancing the precision of chatter detection in milling operations.

Keywords: Chatter detection, Feature selection, Hyperparameter optimization, Ensemble model

1. Introduction

Chatter represents the most detrimental manifestation of self-excited vibrations in milling operations, precipitating severe degradation of workpiece surface integrity, accelerated tool wear, and potentially catastrophic equipment failure, thereby substantially compromising machining efficiency and product quality. To address this critical challenge, researchers have predominantly pursued two complementary avenues of investigation: chatter prediction [1] and chatter detection [2]. The predictive approach leverages stability models to facilitate the judicious selection of machining parameters. However, owing to the inherent complexity and variability of practical machining conditions, chatter phenomena may still materialize even with theoretically optimized parameters. In contrast, chatter detection methodologies enable real-time monitoring of the machining state with instantaneous warning capabilities, garnering considerable attention within the research community due to their superior temporal responsiveness and diagnostic accuracy.

Signal acquisition constitutes the foundational step in chatter detection frameworks. Commonly employed sensing modalities encompass vibration signals [3], cutting force signals [4], and acoustic emissions [5], [6]. Empirical evidence substantiates that both cutting force and acceleration signals exhibit remarkable efficacy for chatter identification [7]. Nevertheless, subsequent investigations have revealed that acceleration signals demonstrate superior suitability for chatter detection applications relative to cutting force measurements, primarily attributable to their reduced computational complexity and implementation requirements [8]. Presently, acceleration-based sensing has emerged as the predominant measurement paradigm in chatter detection methodologies.

Chatter feature extraction constitutes the subsequent critical step in chatter detection. Prevalent methodologies encompass time-domain methods, frequency-domain methods, and time-frequency domain approaches. Time-domain features predominantly comprise dimensionless characteristics, such as root mean square [2], [9], [10], mean values [2], [9]. Gao et al. [9] formulated a composite time-domain feature by integrating mean value, standard deviation, and kurtosis. Frequency-domain features predominantly include spectral mean [2], [9] and spectral centroid frequency [2]. Time-frequency domain methodologies have garnered extensive adoption owing to their inherent capability of furnishing simultaneous temporal and spectral information, encompassing techniques such as wavelet energy entropy [2], continuous wavelet transform [11], and synchrosqueezing transform [12], [13].

Chatter detection classification algorithms predominantly encompass traditional machine learning paradigms and deep learning architectures. Among conventional machine learning approaches, support vector machines [14], [15] and random forests [5], [10] have achieved the most widespread deployment. Tran et al. [5] employed random forest algorithms to train multi-sensor fusion data for machining state identification. Yan and Sun [10] leveraged random forests in conjunction with optimally selected feature subsets to accomplish intelligent diagnosis of machining stability. Nevertheless, these methodologies exhibit fundamental dependence on manual feature engineering, thereby constraining their capacity to autonomously extract deep nonlinear patterns and temporal dependencies.

In recent years, deep learning has attracted considerable scholarly attention by virtue of its formidable automatic feature learning and classification capabilities, progressively permeating the domain of chatter detection. Gao et al. [16] developed a hybrid neural network architecture integrating dual-scale parallel convolutional neural networks, BiGRU, and multi-head attention mechanisms for chatter detection. Unver and Sener [17] proposed a transfer learning framework that synergistically combines analytical solutions with convolutional neural networks for chatter identification. Jauhari et al. [18] introduced a novel network architecture that amalgamates Stem modules, Inception modules, and embedded residual attention mechanisms for chatter detection. Despite demonstrating exceptional performance in feature representation, deep learning approaches remain susceptible to overfitting under small-sample scenarios and exhibit insufficient model interpretability.

Existing chatter detection methodologies are encumbered by three critical limitations. Firstly, prevailing research predominantly relies upon monolithic models, which struggle to maintain robust performance across complex and variable operational conditions. Secondly, although ensemble models have demonstrated substantial efficacy in other fault diagnosis domains [19], [20], their application in chatter detection remains notably limited. Thirdly, contemporary hyperparameter optimization techniques [21] are primarily tailored for individual models, lacking systematic mechanisms for the synergistic optimization of hyperparameters across multiple constituent models within ensemble architectures, as well as their corresponding weighting coefficients.

To address the aforementioned challenges, this study presents an optimized iTransformer-BiGRU-Random Forest (iTBU-RF) ensemble model for milling chatter detection. The principal contributions of this methodology encompass: (1) The simultaneous optimization of hyperparameters across both the iTBU model and the RF model through an integrated optimization algorithm, whereby a global search strategy facilitates the acquisition of optimal model configurations while circumventing the suboptimal solutions inherent in conventional stepwise optimization procedures. (2) The development of a dynamic weighting ensemble framework that synergistically integrates the iTBU model and the RF model, wherein the contribution weights of individual constituent models are adaptively modulated in accordance with varying operational conditions and data characteristics, thereby substantially enhancing the generalization capability and detection accuracy of the ensemble architecture.

2. Ivy Optimization Algorithm

The Ivy algorithm was proposed by Ghasemi et al. [22]. This algorithm primarily emulates the distinct life stages of Ivy plants, including growth, climbing, and propagation within the Ivy population.

Upon algorithm initialization, the initial positions of the Ivy population within the search space are stochastically determined using Eq. (1).

$ I_i=I_{\min }+rand (1\text{,}\ D) \odot\left(I_{\max }-I_{\min }\right)\text{,}\ i=1\text{,}\ \cdots\text{,}\ N_{p o p} $

(1)

where, $I_{\max }$ and $I_{\min }$ denote the upper and lower bounds of the search space, respectively. $\odot$ represents the Hadamard product of two vectors. ${rand}$(1, $D$) denotes a D-dimensional vector comprising uniformly distributed random numbers within the interval [0, 1]. $N_{pop}$ signifies the total population size of Ivy plants.

Drawing upon data-intensive experimental and simulation procedures, the difference equation governing the growth velocity $G v_i(t)$ of member $I_i$ is formulated as Eq. (2).

$ \Delta G v_i(t+1)=rand^2 \odot\left(N(1\text{,}\ D) \odot \Delta G v_i(t)\right) $

(2)

Vectors $\Delta G v_i(t + 1)$ and $\Delta G v_i(t)$ represent the growth rates of the discrete-time system at time instants $t$ and $t+1$, respectively. $rand^2$ denotes a random number from a random variable whose probability density function is equivalent to $1/(2 \sqrt{x}$). $N$(1, $D$) signifies a random vector of dimensionality $D$.

The equation governing the climbing and logical movement of member $I_i$ toward the light source direction via leveraging member $I_{i i}$ is expressed in Eq. (3).

$ \left\{\begin{array}{c} I_i^{\text {new}}=I_i+|N(1\text{,}\ D)| \odot\left(I_{i i}-I_i\right)+N(1\text{,}\ D) \odot \Delta G v_i\text{,}\ i=1\text{,}\ 2\text{,}\ \cdots\text{,}\ N_{p o p} \\ \Delta G v_i=\left\{\begin{array}{cc} I_i \oslash\left(I_{\max }-I_{\min }\right)\text{,} & Iter=1 \\ rand^{\text {2}} \odot\left(N(1, D) \odot \Delta G v_i\right)\text{,}\ & Iter>1 \end{array}\right. \end{array}\right. $

(3)

where, $|N(1\text{,}\ D)|$ denotes the absolute value of $N(1\text{,}\ D)$. $\oslash$ represents the Hadamard division of two vectors.

Member $I_i$ explores the vicinity of member $I_{\text {Best }}$ in search of superior optimal solutions. The computational procedure is presented in Eq. (4).

$ \left\{\begin{array}{l} I_i^{\text {new }}=I_{\text {Best }} \odot\left(rand(1\text{,}\ D)+N(1\text{,}\ D) \Delta G v_i\right) \\ \Delta G v_i^{\text {new}}=I_i^{\text {new }} \oslash\left(I_{\max }-I_{\min }\right) \end{array}\right. $

(4)

When the objective function value $f\left(I_i\right)$ of member $I_i$ is less than a multiple of $f\left(I_{\text {Best }}\right)$, the parameter $\beta$ = (2 + $rand$)$/$2. Subsequently, the Ivy commences expansion of its branch and leaf width (as governed by Eq. (3)). Conversely, the Ivy undergoes upward growth and climbing (as prescribed by Eq. (4)).

3. Chatter Detection Model

3.1 iTransformer Mode

The iTransformer model constitutes a refined neural network architecture predicated upon the Transformer framework [23]. In contrast to conventional Transformer models, iTransformer treats individual time series as tokens and leverages the self-attention mechanism to capture multivariate correlations. Furthermore, it employs layer normalization and feedforward neural network modules to more effectively acquire the global representation of sequences for time series forecasting applications. These architectural enhancements enable iTransformer to exhibit superior performance in processing long-sequence tasks with enhanced versatility. The specific architecture of the iTransformer model is illustrated in Figure 1.

Figure 1. Model architecture of iTransformer

The architecture of iTransformer encompasses local layer normalization, feedforward neural networks, and self-attention modules. The normalization operation is applied to the time series representations of variables, normalizing all tokens to a normal distribution, thereby facilitating the reduction of variance introduced by measurements and addressing the non-stationarity problem.

The multivariate attention mechanism ensures that highly correlated variables obtain elevated weighting values in subsequent representation interactions. This computational process involves calculating attention scores, scaling, applying the softmax function, and performing weighted summation. The formulation is expressed in Eq. (5).

$ Attention (Q\text{,}\ K\text{,}\ V)=soft\ \text{max}\left(\frac{Q K^T}{\sqrt{d_k}}\right) $

(5)

where, $Q$, $K$, and $V$ denote the query vector, key vector, and value vector, respectively. $d_k$ represents the projection dimensionality.

The feedforward neural network constitutes another essential component within the iTransformer model, comprising linear transformations and nonlinear activation functions. Its computational procedure is formulated in Eq. (6).

$ FFN(x)=\text{max} \left(0\text{,}\ x W_1+b_1\right) W_2+b_2 $

(6)

where, $x$ denotes the input vector, $W_1$, $b_1$, $W_2$, and $b_2$ represent the learnable weight matrices and bias terms, $\text{max} (0\text{,}\ \cdots)$ signifies the ReLU activation function, and $FFN(x)$ designates the output of the feedforward neural network.

3.2 BiGRU Model

BiGRU addresses the propagation directionality issue by employing two GRU layers [24]. The forward GRU processes the input sequence from the initial to the terminal time step, while the backward GRU processes it in the reverse direction. Each GRU layer maintains its respective hidden state, updating it based on the current input and the hidden state from the preceding time step.

The computational formulation for the hidden layer units within BiGRU is expressed in Eq. (7).

$ \left\{\begin{array}{c} \overrightarrow{h}_t=GRU\left(x_t\text{,}\ \overrightarrow{h}_{t-1}\right) \\ \overleftarrow{h}_t=GRU\left(x_t\text{,}\ \overleftarrow{h}_{t-1}\right) \\ h_t=f\left(W_{\overrightarrow{h}_t} \cdot \overrightarrow{h}_t+W_{\overleftarrow{h}_t} \cdot \overleftarrow{h}_t+b_t\right) \end{array}\right. $

(7)

where, $\overrightarrow{h}_t$ and $\overleftarrow{h}_t$ denote the forward and backward states of the hidden layer at time instant $t$, respectively. $W_{\overleftarrow{h}}$ and $W_{\overrightarrow{h}}$ represent the forward and backward weights of the hidden layer at time instant $t$, while $b_t$ signifies the bias term of the hidden layer at time instant $t$.

3.3 Random Forest

RF constitutes an ensemble learning methodology whose fundamental principle involves employing bootstrap aggregating (Bagging) to aggregate multiple weak classifiers into a robust ensemble classifier [25]. The ultimate prediction is determined through majority voting or averaging across independently distributed constituent weak classifiers.

The core mechanism of RF entails constructing binary decision trees through recursive binary partitioning of the data, wherein each internal node represents a feature attribute, and each terminal node corresponds to either a class label or a numerical value.

3.4 Ensemble Model

ITBU architecture demonstrates superior capability in capturing patterns across diverse temporal scales, thereby enhancing predictive accuracy and robustness. The Random Forest model, characterized by its expeditious training efficiency and formidable generalization capability, furnishes stable predictive performance on time-frequency domain features. Consequently, this study adopts an ensemble framework integrating the iTBU model and RF model for machining state identification.

Initially, the entire dataset is partitioned into training, validation, and test subsets. The iTBU model and RF model are independently trained on the training set, yielding machining state detection models $M_{\mathrm{iTBU}}$ and $M_{\mathrm{RF}}$, respectively. The validation set data are subsequently fed into models $f_{\mathrm{iTBU}}$ and $f_{\mathrm{RF}}$ to obtain accuracies $A_V^{\mathrm{iTBU}}$ and $A_V^{\mathrm{RF}}$. An optimization algorithm is employed to select optimal model hyperparameters, thereby acquiring the optimal predictive models $M_V^{\mathrm{iTBU}}$ and $M_V^{\mathrm{RF}}$ with corresponding maximum accuracies $A_{\max }^{\mathrm{iTBU}}$ and $A_{\max }^{\mathrm{RF}}$.

The ensemble weights are determined based on the validation set errors.

$ \left\{\begin{aligned} W^{\mathrm{iTBU}} & =\frac{A_{\max }^{\mathrm{iTBU}}}{A_{\max }^{\mathrm{iTBU}}+A_{\max }^{R F}} \\ W^{R F} & =\frac{A_{\max }^{R F}}{A_{\max }^{\mathrm{iTBU}}+A_{\max }^{R F}} \end{aligned}\right. $

(8)

When a substantial accuracy discrepancy exists between the two models on the validation set, ensemble integration would compromise overall predictive accuracy. Consequently, an error threshold is established. If the accuracy differential between the two models exceeds the predefined threshold, the superior-performing model is designated as the final detection model. Conversely, if the accuracy disparity falls below the threshold, the models are integrated through weighted combination.

The detailed procedure is illustrated in Figure 2.

Figure 2. Ensemble model architecture

The validation set accuracy is employed as the fitness function for hyperparameter optimization of the model (Eq. (9)).

$ \text{max} f=\frac{\sum_{i=1}^N\left(Y_i-Y_{vaild}\right) \times 100}{N} $

(9)

Performance evaluation metrics including accuracy, precision, and recall are adopted to assess the predictive capabilities of the model, as formulated in Eq. (10).

$ \left\{\begin{array}{c} Accuracy=(TP+TN)/(TP+TN+FP+FN) \\ Precision=TP/(TP+FP) \\ Recall=T P /(T P+F N) \\ F1 Score=2 *(Precision * Recall)/(Precision + Recall)\\ Specificity=TN/(TN+FP) \end{array}\right. $

(10)

4. Experimental Investigation

4.1 Experimental Setup

The machining operations were conducted on a VDL-1000E three-axis CNC machine tool manufactured by Dalian Machine Tool Group. A flat-end milling cutter with four flutes and a diameter of 10 mm was employed for the cutting trials. The workpiece material comprised TC4 titanium alloy with dimensions of 200 × 200 × 5 mm. Acceleration signals during the machining process were acquired using a PCB accelerometer featuring a sensitivity of 10.42 mV/g, coupled with a Donghua DH5922 data acquisition system operating at a sampling frequency of 5000 Hz. The milling configuration consisted of climb milling under dry cutting conditions. The experimental setup is depicted in Figure 3. The simulation experiments were executed on a Windows 10 (64-bit) operating system with a hardware platform comprising an Intel® Core™ i9-12900K CPU, NVIDIA GeForce RTX 3080 graphics processing unit, operating at 3.2 GHz clock frequency with 32 GB RAM. The computational environment was established using MATLAB R2023b.

Figure 3. Experimental setup: (a) Experimental site; (b) Vibration data acquisition

4.2 Parameter Configuration

The radial depth of cut was maintained at 0.5 mm, with a feed per tooth of 0.1 mm/tooth. The remaining cutting parameters, machining conditions, and chatter characteristics are comprehensively documented in Reference [21].

5. Experimental Results and Discussion

5.1 Feature Sensitivity Analysis

The Pearson correlation coefficient constitutes a widely adopted quantitative metric in feature sensitivity analysis, providing intuitive numerical characterization of the correlation strength between features and target variables [26]. However, this metric exclusively reflects the magnitude of association while lacking the capability to assess the statistical reliability of the observed correlations. The significance level, ascertained through rigorous statistical hypothesis testing, enables the discrimination of genuine correlations from those potentially arising from stochastic errors, thereby possessing well-defined statistical interpretation and effectively compensating for this inherent limitation of the Pearson correlation coefficient. Predicated upon these considerations, the present investigation synergistically employs both the Pearson correlation coefficient and significance level to conduct feature sensitivity analysis, thereby simultaneously achieving the dual objectives of correlation strength screening and correlation reliability validation. The resultant correlation coefficients and significance levels are illustrated in Figure 4.

Figure 4. Correlation analysis between extracted features and processing states

Note: SD = standard deviation, RMS = root mean square, ABS = absolute mean, PTP = peak to peak value, KUR = kurtosis, SKE= skewness, PAF= peak factor, PUF= pulse factor, WAF= waveform factor, MAF = mean of frequency spectrum, GFFS = gravity frequency of frequency, MSFS = mean square frequency of frequency spectrum,WPE1 = wavelet packet energy entropy 1, WPE 2 = wavelet packet energy entropy 2, WPE 3 = wavelet packet energy entropy 3, WPE 4 = wavelet packet energy entropy 4, WPE 5 = wavelet packet energy entropy 5, WPE 6 = wavelet packet energy entropy 6, WPE 7 = wavelet packet energy entropy 7, and WPE8 = wavelet packet energy entropy 8

As depicted in Figure 4, the gravity frequency of frequency (GFFS) and mean square frequency of frequency spectrum (MSFS) features exhibit negative correlations with the machining state, whereas the remaining time-frequency domain features demonstrate positive correlations. From the perspective of overall association strength across feature dimensions, temporal-domain features manifest the most pronounced correlation intensity, followed by frequency-domain features. In accordance with the correlation strength classification criteria established in the literature [27], correlation coefficients residing within the interval of 0.6 to 0.8 are designated as strong correlations. Should the Pearson correlation coefficient threshold be established above 0.7, approximately 41.7% of the strongly correlated features identified in this investigation would be eliminated, thereby incurring substantial loss of efficacious information. Considering both the definitional criteria for strong correlation and the imperative for information preservation, the present study establishes the Pearson correlation coefficient screening threshold at 0.6, concurrently mandating that correlations attain an extremely significant level. Only features simultaneously satisfying these dual criteria are retained; otherwise, they are discarded. Following the aforementioned correlation screening and significance validation procedures, the following features are ultimately identified as sensitive indicators of machining state: standard deviation (SD), root mean square (RMS), absolute mean (ABS), peak to peak value (PTP), GFFS, wavelet packet energy entropy 3 (WPE 3), mean square frequency of frequency spectrum (MSFS), mean of frequency spectrum (MAF), wavelet packet energy entropy 2 (WPE 2), wavelet packet energy entropy 7 (WPE 7), wavelet packet energy entropy 4 (WPE 4), and wavelet packet energy entropy 5 (WPE 5).

5.2 Threshold Analysis of the Ensemble Model

Figure 5 illustrates the mean classification accuracy achieved via ten-fold cross-validation on the test set across varying threshold configurations.

Figure 5. Variation pattern of mean classification accuracy as a function of threshold value

As illustrated in Figure 5, the mean classification accuracy exhibits a progressive ascending trend as the threshold increases from 0.02 to 0.05, culminating in a peak value at the threshold of 0.05, whereupon it subsequently diminishes with further threshold augmentation. Concurrently, the standard deviation demonstrates a gradual reduction from thresholds 0.02 to 0.04, attaining its minimum at 0.04; a marginal increase manifests at the threshold of 0.05, though the magnitude remains at a comparatively modest level. Upon comprehensive evaluation, the threshold of 0.05 corresponds to the maximal mean accuracy among all investigated threshold values, whilst simultaneously maintaining the root mean square error at a consistently low magnitude, thereby ensuring both optimal overall classification accuracy and robust result stability. Consequently, the threshold value of 0.05 is identified as the optimal selection for the ensemble model.

5.3 Analysis of Chatter Detection Results

The detailed specifications of iTBU-RF model parameters optimized by the Ivy algorithm, encompassing the parameter value ranges and optimization outcomes, are presented in Table 1.

Table 1. Optimization results of model hyperparameters

Category	Number of Position Encoding Vectors	Number of Attention Heads	Number of Hidden Neurons in GRU	Dropout Rate	Learning Rate	Split Feature Number	Decision Tree Depth
Range	1–6	16–128	16–128	0.001–0.5	0.001–0.1	1–6	1–100
Default value	1	4	32	0.01	0.01	1	30
Optimal value	2	4	16	0.216	0.01	2	12

As delineated in Table 1, the iTBU model encompasses 1,128 trainable parameters, of which 400 constitute task-specific parameters, whilst the RF model comprises a total of 552 parameters. The dataset employed in the present investigation incorporates 2,880 training samples, 360 validation samples, and 360 testing samples. The ratio of sample size to the parameter count of the iTBU model approximates 7.2:1, whereas the corresponding ratio for the RF model stands at 5.2:1, both of which conform to the generally established requisites for deep learning model training [28].

Under the optimized hyperparameter configuration of the iTBU-RF model, the classification results for the training, validation, and test sets are illustrated in Figure 6.

As illustrated in Figure 6a, the classification accuracy achieved on the training set reaches 98.75%. Specifically, 6 instances of stable cutting were erroneously classified as slight chatter, 12 instances of slight chatter were misidentified as severe chatter, 12 instances of slight chatter were incorrectly recognized as stable cutting, and 6 instances of severe chatter were mistakenly categorized as slight chatter. Figure 6b demonstrates that the validation set attains a classification accuracy of 96.67%, wherein 6 stable cutting cases were misclassified as slight chatter, and 6 slight chatter instances were erroneously identified as stable cutting. As depicted in Figure 6c, the test set exhibits a classification accuracy of 95%. The misclassification patterns encompass 6 stable cutting samples incorrectly recognized as slight chatter, 6 slight chatter instances misidentified as stable cutting, and 6 severe chatter cases erroneously categorized as slight chatter.

(a)

(b)

(c)

Figure 6. Detection results of machining conditions: (a) Training set; (b) Validation set; (c) Test set

The ten-fold cross-validation results for machining state detection prior to and subsequent to hyperparameter optimization are presented in Table 2.

Table 2. Detection results before and after hyperparameter optimization

Category		Training Set	Validation Set	Test Set
Ten-fold cross-validation	Default values	96.45%	94.25%	93.49%
Ten-fold cross-validation	Optimized values	98.23%	96.65%	95.27%

As evident from Table 2, the model detection performance exhibits comprehensive enhancement following hyperparameter optimization. The accuracies of the training, validation, and test sets are improved by 1.78%, 2.40%, and 1.78%, respectively. The optimized model demonstrates more balanced performance across different datasets, with the accuracy disparity between the training and test sets reduced from 2.94 percentage points prior to optimization to 2.40% points thereafter.

The ten-fold cross-validation results for machining state detection before and after feature selection are presented in Table 3.

Table 3. Detection results before and after feature optimization

Number of Features	Training Set	Validation Set	Test Set	Training Time (s)
12	98.76%	96.63%	95.16%	116.74
20	98.69%	96.61%	95.15%	153.83

As demonstrated in Table 3, the comparative analysis before and after feature optimization reveals that when the feature dimensionality is reduced from 20 to 12, the model exhibits marginal performance enhancement across all datasets. Specifically, the training set accuracy increases slightly from 98.69% to 98.76%, the validation set accuracy improves from 96.61% to 96.63%, while the test set accuracy demonstrates a modest increase from 95.15% to 95.16%. Notably, following feature selection, the model training time is substantially reduced from 153.83 seconds to 116.74 seconds, representing an approximate reduction of 24.11%. These findings indicate that the 12 critical features identified through the feature selection algorithm effectively capture the intrinsic data structure and informational content, thereby achieving substantial computational efficiency improvements while maintaining classification performance.

To validate the reliability and superiority of the proposed model, comparative evaluations are conducted against RF, iTransformer, iTransformer-BiLSTM (iTBM), and iTBU. To mitigate potential errors, the mean results from ten-fold cross-validation are adopted as the final evaluation metrics. The comparative results for the training set are illustrated in Figure 7, while the test set comparison outcomes are depicted in Figure 8.

Figure 7. Detection results on the training set

Nore: iTBM = iTransformer-BiLSTM, iTBU= iTransformer-BiGRU, PM= Proposed method

As illustrated in Figure 7, the proposed methodology demonstrates superior performance across all evaluated metrics. Specifically, the accuracy attains 0.984, precision reaches 0.983, recall achieves 0.985, F1-score stands at 0.983, and specificity measures 0.989. Relative to the benchmark models, the proposed approach realizes improvements of up to 2.9% in accuracy, 2.9% in precision, 3.0% in recall, 2.7% in F1-score, and 1.5% in specificity.

Figure 8. Detection results on the test set

Figure 8 reveals that the proposed methodology sustains optimal performance on the test set, wherein the accuracy attains 0.951, precision reaches 0.952, recall achieves 0.955, F1-score stands at 0.970, and specificity measures 0.986. Compared with the baseline models, enhancements of up to 3.9% in accuracy, 3.8% in precision, 4.0% in recall, 4.2% in F1-score, and 1.2% in specificity are achieved.

The proposed methodology exhibits minimal performance degradation between the training and test sets, with an average decline of approximately 2.19% across the five performance indicators, substantially lower than that observed for alternative comparative methods. This phenomenon substantiates the exceptional generalization capability of the proposed approach and its efficacy in mitigating overfitting phenomena. Notably, the performance decrements in F1-score and specificity are particularly modest (1.32% and 0.30%, respectively), thereby corroborating the methodology's capacity to maintain consistently high-level performance across disparate datasets.

6. Conclusions

(1) The proposed milling chatter detection methodology based on the optimized iTBU-RF ensemble model demonstrates efficacy in identifying three distinct machining states during the milling process: stable cutting, slight chatter, and severe chatter, thereby furnishing reliable technical support for chatter prevention in practical machining operations.

(2) Analysis employing Pearson correlation coefficients and significance levels reveals that the features SD, RMS, ABS, PTP, GFFS, WPE 3, MSFS, MAF, WPE 2, WPE 7, WPE 4, and WPE 5 exhibit robust correlations with machining states.

(3) Following hyperparameter optimization, the model detection performance demonstrates comprehensive enhancement. The accuracies of the training, validation, and test sets improve by 1.78%, 2.40%, and 1.78%, respectively. Concurrently, the optimized model exhibits more balanced performance across diverse datasets.

(4) Upon feature selection, the feature dimensionality is reduced from 20 to 12, achieving a reduction in training time from 153.83 seconds to 116.74 seconds (approximately 24.11% decrease) while maintaining classification performance.

(5) In comparison with baseline methodologies, the proposed model achieves optimal performance across all evaluation metrics. Notably, while maintaining high accuracy (95.1%), it attains superior F1-score (0.970) and specificity (0.986), demonstrating significant advantages in balanced identification of diverse chatter states.

Author Contributions

Methodology, H.D.S. and H.N.G.; software, H.D.S. and H.N.G.; validation, H.D.S. and L.Y.; data curation, H.N.G. and R.Y.L.; writing—original draft preparation, H.D.S., H.N.G., and Y.Y.; writing—review and editing, H.N.G., S.L.X., H.D.S., L.Y., and R.Y.L.; funding acquisition, H.N.G., Y.Y., and S.L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by the Henan Province Young Backbone Teachers Support Program in Higher Education (Grant No.: 2023GGJS157), the Science and Technology Planning Project in Henan Province (Grant No.: 252102220089, 242102240132, 252102520060), Program for Innovative Research Team (in Science and Technology) in University of Henan Province (Grant No.: 24IRTSTHN020) and the key research and development program of Henan Province (Grant No.: 241111111600).

Data Availability

The data used to support the research findings are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

1.

F. Tehranizadeh, K. R. Berenji, and E. Budak, “Dynamics and chatter stability of crest-cut end mills,” Int. J. Mach. Tools Manuf., vol. 171, p. 103813, 2021. [Google Scholar] [Crossref]

2.

H. Gao, J. Wang, H. Shen, H. Shen, S. Xing, Y. Yang, and M. Ali, “Detection framework for milling chatter based on improved coati optimization algorithm,” IEEE Sens. J., vol. 25, no. 17, pp. 32217–32234, 2025. [Google Scholar] [Crossref]

3.

C. A. K. A. Kounta, L. Arnaud, B. Kamsu-Foguem, and F. Tangara, “Deep learning for the detection of machining vibration chatter,” Adv. Eng. Softw., vol. 180, p. 103445, 2023. [Google Scholar] [Crossref]

4.

P. Bakhshandeh, Y. Mohammadi, Y. Altintas, and F. Bleicher, “Digital twin assisted intelligent machining process monitoring and control,” CIRP J. Manuf. Sci. Technol., vol. 49, pp. 180–190, 2024. [Google Scholar] [Crossref]

5.

M. Q. Tran, M. K. Liu, and M. Elsisi, “Effective multi-sensor data fusion for chatter detection in milling process,” ISA Trans., vol. 125, pp. 514–527, 2022. [Google Scholar] [Crossref]

6.

G. S. Sestito, G. S. Venter, K. S. B. Ribeiro, A. Rodrigues, and M. Silva, “In-process chatter detection in micro-milling using acoustic emission via machine learning classifiers,” Int. J. Adv. Manuf. Technol., vol. 120, no. 11, pp. 7293–7303, 2022. [Google Scholar] [Crossref]

7.

W. K. Wang, M. Wan, W. H. Zhang, and Y. Yang, “Chatter detection methods in the machining processes: A review,” J. Manuf. Process., vol. 77, pp. 240–259, 2022. [Google Scholar] [Crossref]

8.

D. E. Matthew, J. Shi, M. Hou, and H. Cao, “Improved STFT analysis using time-frequency masking for chatter detection in the milling process,” Measurement, vol. 225, p. 113899, 2024. [Google Scholar] [Crossref]

9.

H. Gao, H. Shen, L. Yu, Y. Wang, R. Li, and B. Nazir, “Milling chatter detection system based on multi-sensor signal fusion,” IEEE Sens. J., vol. 21, no. 22, pp. 25243–25251, 2021. [Google Scholar] [Crossref]

10.

S. Yan and Y. Sun, “Early chatter detection in thin-walled workpiece milling process based on multi-synchrosqueezing transform and feature selection,” Mech. Syst. Signal Process., vol. 169, p. 108622, 2022. [Google Scholar] [Crossref]

11.

B. Sener, M. U. Gudelek, A. M. Ozbayoglu, and H. Unver, “A novel chatter detection method for milling using deep convolution neural networks,” Measurement, vol. 182, p. 109689, 2021. [Google Scholar] [Crossref]

12.

Y. Song, J. Cao, and Y. Hu, “In-process feature extraction of milling chatter based on second-order synchroextracting transform and fast kutrogram,” Mech. Syst. Signal Process., vol. 208, p. 111018, 2024. [Google Scholar] [Crossref]

13.

K. Jauhari, A. Z. Rahman, M. Al Huda, and T. Prahasto, “Building digital-twin virtual machining for milling chatter detection based on VMD, synchro-squeeze wavelet, and pre-trained network CNNs with vibration signals,” J. Intell. Manuf., vol. 35, no. 7, pp. 3083–3114, 2024. [Google Scholar] [Crossref]

14.

S. Wan, X. Li, Y. Yin, and J. Hong, “Milling chatter detection by multi-feature fusion and Adaboost-SVM,” Mech. Syst. Signal Process., vol. 156, p. 107671, 2021. [Google Scholar] [Crossref]

15.

M. C. Yesilli, F. A. Khasawneh, and A. Otto, “On transfer learning for chatter detection in turning using wavelet packet transform and ensemble empirical mode decomposition,” CIRP J. Manuf. Sci. Technol., vol. 28, pp. 118–135, 2020. [Google Scholar] [Crossref]

16.

H. Gao, H. Wang, H. Shen, S. Xing, Y. Yang, and Y. Wang, “Multi-modal denoised data-driven milling chatter detection using an optimized hybrid neural network architecture,” Sci. Rep., vol. 15, no. 1, p. 3953, 2025. [Google Scholar] [Crossref]

17.

H. O. Unver and B. Sener, “A novel transfer learning framework for chatter detection using convolutional neural networks,” J. Intell. Manuf., vol. 34, no. 3, pp. 1105–1124, 2023. [Google Scholar] [Crossref]

18.

K. Jauhari, A. Z. Rahman, M. Al Huda, A. Widodo, and T. Prahasto, “A hybrid deep learning-based approach for on-line chatter detection in milling using deep stem-inception networks and residual channel-spatial attention mechanisms,” Mech. Syst. Signal Process., vol. 226, p. 112357, 2025. [Google Scholar] [Crossref]

19.

K. Attouri, K. Dhibi, M. Mansouri, M. Hajji, K. Bouzrara, and M. Nounou, “Effective uncertain fault diagnosis technique for wind conversion systems using improved ensemble learning algorithm,” Energy Rep., vol. 10, pp. 3113–3124, 2023. [Google Scholar] [Crossref]

20.

F. Pacheco, A. Drimus, L. Duggen, C. Mariela, C. Diego, and S. René-Vinicio, “Deep ensemble-based classifier for transfer learning in rotating machinery fault diagnosis,” IEEE Access, vol. 10, pp. 29778–29787, 2022. [Google Scholar] [Crossref]

21.

H. Gao, H. Shen, C. Yue, R. Li, S. Y. Liang, and Y. Wang, “A monitoring method of milling chatter based on optimized hybrid neural network with attention mechanism,” Facta Univ. Ser. Mech., vol. 23, no. 2, pp. 227–250, 2025. [Google Scholar] [Crossref]

22.

M. Ghasemi, M. Zare, P. Trojovský, R. Rao, T. Eva, and V. Kandasamy, “Optimization based on the smart behavior of plants with its engineering applications: Ivy algorithm,” Knowl.-Based Syst., vol. 295, p. 111850, 2024. [Google Scholar] [Crossref]

23.

Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “iTransformer: Inverted transformers are effective for time series forecasting,” arXiv preprint arXiv, vol. 2310, p. 06625, 2023. [Google Scholar] [Crossref]

24.

E. K. Ganesh and G. Diwakar, “Prediction of remaining useful life of rolling bearing using hybrid DCNN-BiGRU model,” J. Vib. Eng. Technol., vol. 11, no. 3, pp. 997–1010, 2023. [Google Scholar] [Crossref]

25.

Y. Cheng, S. Zhou, J. Xue, M. Lu, X. Gai, and R. Guan, “Research on tool wear prediction based on the RF optimized by NGO algorithm,” Mach. Sci. Technol., vol. 28, no. 4, pp. 523–546, 2024. [Google Scholar] [Crossref]

26.

D. Domingo, A. B. Kareem, C. N. Okwuosa, P. M. Custodio, and J. W. Hur, “Transformer core fault diagnosis via current signal analysis with Pearson correlation feature selection,” Electronics, vol. 13, no. 5, p. 926, 2024. [Google Scholar] [Crossref]

27.

J. Spierings, V. H. Ong, P. M. J. Welsing, M. Hughes, J. D. Pauling, F. D. Galdo, A. L. Herrick, and C. P. Denton, “P139 self-assessment of scleroderma skin: Validation of the PASTUL questionnaire,” Rheumatology, vol. 63, no. 63, p. 178, 2024. [Google Scholar] [Crossref]

28.

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, 2009. [Google Scholar] [Crossref]

Cite this:

APA Style

IEEE Style

BibTex Style

MLA Style

Chicago Style

GB-T-7714-2015

Shen, H. D., Gao, H. N., Yang, L., Li, R. Y., Yang, Y., & Xing, S. L. (2025). Milling Chatter Detection Based on an Optimized iTransformer-BiGRU-Random Forest Ensemble Model. Precis. Mech. Digit. Fabr., 2(4), 222-233. https://doi.org/10.56578/pmdf020402

cc

©2025 by the author(s). Published by Acadlore Publishing Services Limited, Hong Kong. This article is available for free download and can be reused and cited, provided that the original published version is credited, under the CC BY 4.0 license.

pdf

Figure 1. Model architecture of iTransformer

Table 1. Optimization results of model hyperparameters

Citations

Crossref: 0