Ensemble Learning Applications in Multiple Industries: A Review
This study proposes a systematic review of the application of Ensemble learning (EL) in multiple industries. This study aims to review prevailing application in multiple industries to guide for the future landing application. This study also proposes a research method based on Systematic Literature Review (SLR) to address EL literature and help advance our understanding of EL for future optimization. The literature is divided three categories by the National Bureau of Statistics of China (NBSC): the primary industry, the secondary industry and the tertiary industry. Among existing problems in industrial management systems, the frequently discussed are quality control, prediction, detection, efficiency and satisfaction. In addition, given the huge potential in various fields, the gap and further directions are also suggested. This study is essential to industry managers and cross-disciplinary scholars to lead a guideline to solve the issues in practical work, as it provided a panorama of application domains and current problems. This is the first review of the application of EL in multiple industries in the literature. The paper has potential values to broaden the application area of EL, and to proposed a novel research method based SLR to sort out literature.
Ensemble learning (EL) is a branch of the machine learning (ML), which has a very powerful ability to train weak learners improving their performance, compared with single learners. For past decade, many scholars have successively applied various ensemble learning approaches which have leaded to the exploration and innovation of these models in multiple fields like medical care , insurance business , supply chain , , industrial industry of things  and other areas. In a word, ensemble learning approaches have integrated classical methods, general methods and optimization fusion strategies for improving the performance of the models. Especially, the decision fusion and integrate technology of industrial engineering are aimed at solving multiple problems, by integrating the ensemble learning and process optimization. For a high-efficiency, man-machine interactive, sustainable and machine learning technology-driven production work, it is essential to ensure the low-loss costs and high-return profits of products and machine performance. In this paper, this study gives a comprehensive review of ensemble models. To the best of our knowledge, this is the first comprehensive review paper on ensemble models in multiple industries.
In the current days, the fourth industrial revolution is sweeping the world, improving existing production lines processes that use novel technologies efficiently and exploit their potential fully . With the increasing demand of the diversity and reliability of products, large batch production can no longer meet the demand, thus posing a new challenge to create a self-adjusted production line, which is not only the optimization of the process, but also man-machine interaction mode. A continuous optimization thought of industrial management and data-driven approaches is to lead this strategic goal to the satisfied shore. The machine learning technology, like federated learning and deep learning, has been systematically reviewed in the industrial system , , machine learning application in the production lines  has been widely observed, and artificial intelligence in safety-critical systems . However, no review has focused on that ensemble learning is often used in various industrial fields, and this paper will discuss it in detail.
Industrial management faces new challenges and opportunities. The problems of the industrial production lines can be included: Firstly, Quality control. The quality has always been a core issue concerned by the production lines, and the dynamic as quality control standards and strategic decisions will change as per the stage decision. Machine learning and quality management promote each other in the integration of the two, i.e., transfer learning is used in capturing the relatedness between the process data of two similar products , and AdaBoost ensemble learning is used in reinforcing the learning performance of the LSTM ; Secondly, fault detection. The high fault rate leads to waste and unnecessary sources consumption. For people are increasingly focus on high-quality and low-waste products, expectation for the accuracy of the fault detection is rising day by day. Fault detection and diagnosis by ensemble learning play a key role of the production lines, i.e., a diverse variable weighted model ensemble (DVWME) method has been proposed for industrial fault diagnosis . Thirdly, prediction. For avoiding unnecessary loss and keep the production system sustainable and continue, it’s important to located precisely in the place where the anomaly will happen, i.e., the system by using machine learning methods aims to detect signals for potential failures before they occur . In some industries with missing data or high requirements for forecasting accuracy, forecasting is a strong prerequisite for decision making, enterprise competitiveness, and policy orientation. Lee et al. demonstrates the use of ensemble learning to improve prediction performance under the influence of policy and environmental factors . Finally, efficiency and satisfaction. By solving the above problems, the satisfaction of downstream enterprises' requirements will be improved, or the efficiency of the system will be improved. Of course, efficiency and satisfaction problems that are not achieved by quality, prediction and detection methods, such as optimization problems such as production scheduling, can also be achieved by modeling to improve the efficiency of the whole process, i.e., Estimation of processing time using machine learning and real factory data for optimization of parallel machine scheduling problem . In today's prevailing industrial Internet of Things, meta-computing and cloud service platform, many enterprises and relevant authorities rely on ensemble learning to build their own program platform to provide better services in order to improve work efficiency and support the development of disciplines and achieve strategic development. Based on the existing difficulties and the demand for efficient development, it is urgent to integrate digital driven technology into manufacturing. Therefore, this review has a certain reference role for the application of ensemble learning in various industries, and lays foundation for the subsequent research work.
The application of ensemble learning in industrial management has been expanded to many fields, from the primary industry, the secondary industry, and then to the tertiary industry, ensemble learning has penetrated almost all fields, from the distribution of grain products, semiconductor manufacturing and sensor design, to the development of commercial software, testing services. This research aims to summarize the existing research to provide industrial engineers to fill the gap breakthrough and innovation. The study concludes to with some constructive comments and opinions. This paper is organized as follows: Section 2 presents the research methodology. Section 3 describes the results and application. Section 4 explains the discussion, and Section 5 discusses the gap and further study. Finally, Section 6 presents the conclusions.
2. Research Methodology
This study refers to the Systematic Literature Review (SLR) ,  to formulate the research process of this section. We describe the research questions, production problems, search process and study selection criteria steps. Worth mentioning, this review paper is different than the SLR papers because we systematically summarized the content of various papers based on SLR grouped and categorized to answer the identified specific research questions.
Manufacturing industry refers to the industry that uses certain resources in the era of mechanical industry to transform them into large-scale tools, industrial products and consumer products that can be used and utilized by people through the manufacturing process according to market requirements. Manufacturing is one of the largest secondary industries in China and belongs to the C category shown in Table 1, referring to the division of industries by the National Bureau of Statistics of China (NBSC).
For the penetration of EL in industry, not only manufacturing, but also services, etc. The following will also be the category of the industry to make a collation and show as Table 2.
Driven by the achievement of EL in multi-task and high-performance, it is logical for industrial management to follow it with applications of EL. The creative point is a criterion to judge whether the research has development potential. The innovation point can be a new strategy, an innovative application field or a method. For EL methods in the application, there are few studies using a single classical method or one fusion strategy, but the method innovation is based on these individual methods, this study will find a way to solve the problem required to achieve the goal of the relationship between by the method of use of each institute observes. This question can be expanded to these four aspects: Firstly, what kind of EL classical methods; Secondly, what kind of EL general methods; Thirdly, what kind of EL fusion strategy; Finally, what problems can be solved based on these approaches.
(13) farm and sideline food
(25) fuel processing
(31) ferrous metal
(37) transportation equipment
(32) nonferrous metals
(38) electrical machinery
(15) wine, beverage and refined tea
(27) pharmaceutical products
(39) electronic equipment
(28) chemical fiber
(34) flexible unit
(29) plastic rubber
(35) dedicated device
(42) waste resource
(24) stationery and sporting goods
(30) no metallic mineral
(41) other no mention
Agriculture, forestry, animal husbandry, fishery
The mining industry
Production and supply of electricity, heat, gas and water
The construction industry
Wholesale and Retail
Transportation, warehousing and postal services
Accommodation and Catering
Information transmission, software and information technology services
The financial sector
The real estate industry
Leasing and business services
Scientific research and technical services
Water conservancy, environment and public facilities management
Residential service, repair and other services
Health and social work
Culture, sports and entertainment
Public administration, social security and social organization
The international organization
Modern enterprise production management pay more and more attention to the improving the intelligent management level of enterprises and helping achieve high-quality products at lower costs. The problems that can be solved by EL are expanded as follows.
Since quality is a key determinant of success in modern industries, it’s reasonable for enterprise manager to invest more in quality control. However, traditional methods, such as increasing labor costs, are not conducive to the long-term development of enterprise in today’s diverse and rapidly changing markets. Quality control refers to the operation techniques and activities adopted to achieve quality requirements. Quality control is to eliminate factors that cause nonconformity or satisfactory results at all stages of the quality ring by monitoring the quality formation process. At the government level, national policies also have requirements for quality management control. The corresponding products should conform to the Quality Management System (QMS), referring to the management system that commands and controls the organization in terms of quality.
This study divides fault diagnosis into three basic problems:
• Fault detection: the fault occurred in each functional unit during the detection process, resulting in abnormal behavior of the whole system;
• Fault isolation: locating and classifying different faults;
• Fault identification: Determine the type, size, and cause of a fault.
When a fault occurs, it may cause damage to devices and equipment. Currently, fault detection and diagnosis (FDD) for high-speed trains are receiving greatly increasing attention to improve the reliability and safety . In summary, data-driven technology is widely used in the process industry for process monitoring and diagnosis.
With the increasing needs for production system reliability, it is expected that not only the fault detection, isolation and identification can be provided when it occurs, but also the fault can be forecasted before it occurs. In other words, predicting in advance gives maintenance personnel or self-healing components time to react in early time and avoid a failure. In the current years, an increasing number of enterprises and scholars focus on build an effective prediction model by ensemble learning, at the same time, a lot of materials have been accumulated for this research. It is worth mentioning the choice of method for evaluating a forecasting model, and the evaluation results indicated whether the predictive model was successful in identifying the indicators of potential failures and it can help prevent some production stops from happening.
When applying EL to the service industry, it will incorporate software development, where an important criterion is customer satisfaction. In the era of Industry 4.0, customization will replace mass production as a very obvious dimension to shift from the pursuit of quantity to the pursuit of quality. Customized cloud service platform, database sharing, network information security and other topics have gradually become the direction of improving the core competitiveness of enterprises. Based on this, this study will provide an overview of the downstream customer demand response in order to provide direction for subsequent research.
The studies in this review are searched from the Science Direct, IEEE Xplore search databases. The papers are the result of ensemble learning, ensemble machine learning, machine ensemble learning, production, manufacture keywords. Different combinations of keywords were used to refine the scope of the search. There are a large number of studies using EL as a research method, and 7741 records were preliminarily retrieved in SD. There are 2612 records found in IEEE. The articles where screened based on the title and abstract, followed by the screening of full-text version. After the above preliminary retrieval, the contents of the retrieved articles were analyzed and screened by referring to selection criteria as following.
This study defined the set of the exclusion criteria shown in Table 3 to obtain a comprehensive overview of various industries.
The result of the application of selection criteria is presented in Figure 1.
The full text is not accessible
The paper is not written in English
The paper is a review or not primary research
The paper does not explain EL approaches in detail
The paper does not apply EL in production
3. Results and Application
Here explains the collection of articles to facilitate a descriptive analysis in section 4 and 5. When determining the database, the research identified Science Direct and IEEEXplore, because more than 2500 journals were included in SD , and most of the journals were included by SCI, SSCI and EI, and the impact factors of the articles inside were high, so as to ensure the quality of the review articles. Therefore, we choose the above two data to search for the articles, and define the content of the article through the research questions. After screening the literature of the two databases, 59 papers will be reviewed and analyzed in this study, as shown in Figure 2.
The research shows in Figure 3. that most of the articles have carried on the integration research of ensemble learning around C category and M category. For the traditional IE, the manufacturing industry has always been a huge space for improvement, as the technology has been updating.
The application of ensemble learning in agricultural planting industry retrieved in this study was proposed by Tang et al. , by combining the drawing of the map to form the distribution map of crops, played a very realistic reference role for the planting of crops. In addition, ensemble learning also plays a great role in the detection and prediction of soil materials, i.e., Soil substances, especially carbon content, are detected and pollutants are predicted by Stacking and XGBoost combined with data innovation technology , , which can play a good data support role for environmental protection in sustainable development strategy.
The growth in human population results in an increasing demand for raw materials. the raw materials are processed independent from their source . For the mining industry, the largest source of raw materials, integrated learning has also shown its excellent performance, i.e.,  Lawley et al. combines the available datasets significantly reduces the search space for mineral exploration targeting, as demonstrated by gradient boosting machines (GBM), XGBoost, generalized linear model (GLM), distributed random forest (DRF), extremely randomized forest (XRT), deep neural networks (DNN), stacked ensembles. In the industrial manufacturing industry, the structural materials of industrial products tend to be complex, so the mining of raw materials and the protection of critical raw materials are of great importance. Therefore, a good method of ensemble learning and building a system is provided here for the mining industry.
Application in 3Dprint (41)
predicted with moderate to high accuracy by using rheological measurements and printing parameters as inputs.
not expand practical utility of 3D food printing.
save a huge amount of training time and costs associated with many prints for each design.
This study is expected to lead to progress in AM quality control and management.
present the development of a robust surface roughness prediction model based on ensemble learning with a genetic algorithm.
consider no more processing parameters to improve prediction accuracy.
The application scenarios of six ML methods for predicting dynamic strength of 3d printed parts are analyzed.
The prediction results of ML models based on limited experimental data
Application in no metallic mineral (30)
help the building sector by enabling the progress of economical and fast approaches for calculating material properties.
No data was used for the research described in the article.
created might aid in creating novel UHPC dosages
The training time of random forest is long, which requires high timeliness of experiment.
the prediction made from the RF model manifests a good fit with the experimental results.
laboratory aging and machine learning studies with the advanced applications of optimization algorithms are still a gap.
better understand to the researcher in the field of engineering that the better selection of the input parameters and regressor.
the application of more ensemble algorithms would be more effective.
Strength & Durability prediction
made available online for users to conduct their own prediction studies.
Failed to form a generalization model to be extended to other fields
Application in metal (33)
found two key elements in the dynamic plasma for accurate seam strength measurement.
can be extended to other types of materials and other fusion welding processes.
Soft voting/hard voting/CNN
the ensemble outperforms other seven strategies considered in a comparison in several metrics.
the development of such framework will serve to extend the use of the proposed approach to other classification tasks.
can be expanded to more complicated UMW production scenarios.
the scalability may become a major challenge in these complicated scenarios.
Application in electronic equipment (39)
Stencil printing process
RF-based EWMA/ AdaBoost
prevent solder paste printing defects and reduce the high reworking costs for large-scale production.
can be further extended by suggesting the appropriate machine adjustment.
improve the performance of diagnosis model and achieves highest diagnosis accuracy.
the more complicated compound fault.
soft sensor modeling and requires almost no fine-tuning of parameters to achieve excellent performance.
this paper does not consider the lagged effect of causality.
Application in waste resource (42)
Wastewater treatment plant
the Bayesian fusion strategy
improve the monitoring performance and the generalization ability for process monitoring.
The study is limited to linear non-Gaussian processes.
Iron ore tailings
provided some suggestions for the application of IOTs in cement-based materials.
more experiments on determining the strength activity index of IOTs were needed.
XGBoost/ CatBoost/ LightGBM
improved accuracy at the cost of increasing the computational cost required.
Future works will focus on how to build an ensemble learning model with low computational intensity.
The method application of industrial engineering is also gradually enriched with the update of technology. Ensemble learning also plays a significant role, because most of the research has applied ensemble learning in the manufacturing industry. In the manufacturing industry, the application fields of EL are different. This study has described the subdivision fields of category C in section2, and the article in the distribution of manufacture shown as Figure 4.
To the best of our knowledge, following with the rise of and maturation of EL, it could have very wide popularization and application prospects in manufacture fields. In the food material industry, 3Dprint technology is used and random forest (RF) is integrated to improve the performance. In the food processing industry, RF and gradient boosting regression (GBR) algorithm is used to optimize pellet quality , in addition to the abovementioned fields, In paper industry , chemical industry , pharmaceutical manufacturing , plastic products , ferrous metal , non-ferrous metal , automobile manufacturing industry , transportation industry , through method based on EL innovation, data innovation, domain innovation to enhance and improve the existing manufacturing process has obtained the best benefits. Table 4 shows application of EL has grown by leaps and bounds.
Energy regeneration has always been a hot topic in the environment. In 2022, extreme high temperature and long-term drought occurred in many places in China, which brought about the trans-regional transmission of a lot of energy. B. Wang et al. also made a reasonable analysis of energy poverty alleviation strategies in energy-poor and energy-rich regions . Research has also made contribution to some of the industry's problems. i.e., More accurate and explanatory model is proposed for natural gas storage deliverability by support vector regression (SVR), artificial neural network (ANN), and random forest (RF) algorithms . XGBoost, LightGBM, and multi-layer perceptron (MLP) application to support both flexible and constrained modes of operation . Ensemble method, like stacking, can produce accurate predictions using various prediction sources . The suitable prediction results of wind power achieved by using the ensemble learning models, including boosted trees (BT), random forest (RF), generalized random forest (GRF), considering lagged data . The supply and production forecasting of clean and renewable energy has become a research hotspot in power systems. In the future, it may be possible to solve the problems of other energy systems by generalizing the model.
In the public transportation industry, including urban traffic, postal logistics and other transportation optimization scheduling problems are often the core of industrial engineering optimization scheduling problems. The city government should improve availability and accessibility to promote public transport ridership in the long run . Bagging, boosting, and stacking methods are used to predict bus travel times. This provides a predictive model for optimal bus scheduling . In addition to the scheduling problem, fault diagnosis in the transportation industry is also a problem that poses a threat to the convenience of life. A diverse variable weighted model ensemble (DVWME) method which was applied to Bootstrap sampling and soft voting has been proposed for industrial fault diagnosis, while this model was analyzed with the example of Guangzhou subway to prove the effectiveness of this model in the diagnosis of transportation faults .
In the context of industry 4.0, Internet information services, such as big data analytics (BDA), Internet of Things (IOT), system development, framework building and other services, allow service providers to expand and explore these sub-issues on the basis of demand and downstream customer satisfaction. Sariyer et al propose a model, Clustering Based Classifier Ensemble Method for Cost of Defect Prediction (CBCEM-CoD), incorporating clustering, classification, prediction, and learning techniques of BDA for quality management in the manufacturing industry . Ayvaz and Alpay utilize the data generated from IoT sensors in real-time. the system aims to detect signals for potential failures before they occur by using machine learning methods . Rousopoulou et al present a cognitive analytics platform for anomaly detection, so as to support the emerging and growing needs of manufacturing industry . In order to achieve demand satisfaction and scheduling planning of downstream service providers, Khan et al and Kiangala and Wang respectively built mechanism platforms through gradient boost regression Tree (GBDT), XGBoost and RF , .
In recent five years, the sales area of both commercial housing and residential housing is increasing year by year. By 2021, the sales area of commercial housing is 1.794 billion square meters and the sales area of residential housing is 1.565 billion square meters. It can be seen that the Chinese real estate market is more active in recent years. A BIM and Machine learning (ML) integration framework for automated property valuation was proposed . Predicting the presence of hazardous materials in buildings using RF, XGBoost, CatBoost . The real estate industry plays a great role in the country's economic stability and financial soundness. Ensemble learning can also contribute in areas such as information management for real estate.
With the high-quality development strategy put forward, the development momentum of science and technology service industry is strong. The ensemble learning approach is also applied to management innovation, i.e., policymaking , socio-economic factor analysis , and green business management innovation . In biology and medicine, i.e., predict metal oxide nanoparticle toxicity in immune cells  and detection for COVID-19 . In the aspect of nanocomposite materials, Gradient Boosted Trees is applied . Applying bagging, majority voting to fault detection in the electronics, Catboost, XGBoost in electrical , , In computer science, various ensemble learning methods, like a machine learning-based intrusion detection system (ML-IDS) , are also working to solve problems on local explanation , Cyber-Attacks , . In the monitoring of environmental pollutants PM2.5, the existing studies integrated EL with the satellite high-dimensional visualization method to realize the monitoring and prediction of pollutants , . Forecasting the risk in financial domain by an ensemble approach . Driven by new technologies, new industries and new models, enterprises have increased investment in research and development in cutting-edge technologies and emerging fields.
As shown in Figure 5, prediction of such problems is a large part of the requirement when applied EL is studied. The basic idea of regression problem is to separate the training set and the test set through the existing data, so that the estimated value of the model increasingly approaches the real value. One of the issues to be discussed in this research is that prediction also plays a maintenance role in manufacturing industry. To decrease maintenance costs and to attain sustainable operational management, Predictive Maintenance (PdM) has become important in industries . Many methods in ensemble learning are used in regression prediction, so predictive maintenance is also a promising direction in intelligent manufacturing.
The rapid development of the industrial Internet of Things has brought many opportunities and challenges to many real economies . While serving production efficiency and downstream demand, quality control is often the embodiment of core competitiveness. In this study, through the statistics of existing studies, prediction and detection methods are used to solve the problem most frequently. Therefore, using emerging technologies, cutting-edge methods and theories to ensure efficiency, satisfaction, quality, detection and prediction are the two major paths to achieve the above requirements.
In this study, it can be found that a large number of studies not only use one learner, but different learning methods, and adopt certain fusion strategies and evaluation means to obtain the model with the best performance . Both homogenous and heterogeneous ensemble methods have applications. It is worth mentioning that the methodology of stacking ensemble is frequently used to train a powerful generalization model. However, the test of each model is limited to some industry, and there is no cross-domain verification of the effectiveness of the model. Future research may consider applying the same model to different fields to evaluate its generalization.
The reason that stacking is often used to train an industrial management optimization model is considered in this study. Stacking, a heterogeneous ensemble fusion strategy, trains the primary learner from the initial dataset and then generates a new dataset for training the secondary learner. Stacking iterates repeatedly to avoid overfitting by cross-checking or leave-one-out. Because stacking requires relatively little correlation between base learner and the performance of base learner cannot be too poor, from an industrial management perspective, a good performance model can well meet this requirement, regardless of any industry.
Many scholars integrate multi-methods, multi-algorithms and multi-decisions, on the basis of boosting, bagging and random forest which are classical methods. In a large number of studies, more than just method application, making corresponding technology integration based on the needs of corresponding fields, and commit researches to method innovation. However, the aspect of innovation is not only the special application of method and model, further research can consider the application of inter-disciplinary and cross-domain using ensemble learning, so as to achieve domain innovation and promote the development of this field.
5. The Gap and Further Study
As far as we know, smart city is the ideal form after the in-depth development of industry 4.0. Digital twin has penetrated into various industries for building smart cities, such as construction, manufacturing, transportation, information technology and so on. One of the advantages of digital twinning is to save money and time by improving the planning and scheduling of activities through twinning modeling. At present, a small number of researches have begun to apply artificial intelligence to the research of building smart city and digital twin , , but the application of ensemble learning in a single field cannot meet people's pursuit of intelligent life. Further research could consider how to use ensemble learning with other modules of artificial intelligence to develop an intelligent system to build a small part of a digital smart city.
According to the statistics of this study, the application fields are mainly concentrated in the manufacturing industry, scientific research and information service industry, but also involved in materials, mining, real estate, environmental resource monitoring, electronic equipment fault testing and so on. But the whole industry faces the problem of capacity, efficiency and intelligent production. Ensemble learning can be extended by its high standards of performance and its ability to handle multiple requirements. Further research may consider integrating ensemble learning into multiple aspects to support more industry needs.
This study contributes to conclude application in industrial management and computer science and summarize review of EL but not limited to applications. To our best knowledge, this work is the first time to summarize the development prospects of EL on various fields. Amidst masses of literature, this study has concluded application fields of EL, pros, cons and remained challenges. Further, this study gives the explanation of potential opportunities and existing gap that researchers have done to enrich EL mainly including industry problems and method application. Besides, we also put forward some applications in EL and some develop area with great potential. As a burgeoning technology, EL attracts increasing attention these days. This work benefits to researchers to overcome the remained challenges of EL.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.