A Data Driven Approach to Measure Evolution Trends of City Information Modeling
Abstract:
This work aims to reveal the current status of the city information modeling (CIM) from massive patent data, using the latent Dirichlet allocation (LDA) model, and quantify the evolution trends of future topics by the Hidden Markov Model (HMM). The results show that the CIM technologies can be divided into 17 topics. At the present stage, the technologies related to the Internet of things (IOT), big data and data management are the focus of the research and development (R&D) of CIM patents. Compared with the software technology, further development is needed for the hardware technology supporting CIM, particularly in terms of information acquisition (cameras and sensors), storage, and information transmitters. This study deepens the understanding of the CIM-related technical categories, and clarifies the direction of the development and evolution of CIM technology, providing a strong support to decision-makers in urban management.
1. Introduction
Cities now account for 54% of the world's population, consume 80% of natural resources, and emit 80% of greenhouse gases [1], [2], [3]. Rising urbanization promotes economic and social development while posing significant challenges to long-term urban development [4]. The topic of smart cities, which are distinguished by smart governance and smart growth, has gradually piqued the interest of researchers seeking to promote sustainable urban development [5].
In the last decade, the centralized outbreak of networks, big data, artificial intelligence, modeling, and other technologies, particularly the development of sensors and low-power wide-area network technology, enabled the accurate and real-time representation of the physical world's dynamics in digital format. City Information Modeling (CIM) emerged in response to the concept of digital twins [6], [7].
A digital twin is a virtual entity and subsystem that characterizes a physical device in virtual space using the data from the physical device. CIM is a specific application of digital twin technology in urban management that lays the groundwork for smart city construction [8]. CIM enables data granularity down to a single module within a city building, transforming the traditional static digital city into a perceptible, dynamic, interactive, and intelligent real-world city. As a result, CIM provides critical data support for comprehensive urban management and fine governance [9], becoming an essential cornerstone for smart city operations.
CIM research typically focuses on CIM application scenarios in urban management [8], [10], [11], with few studies of CIM-related technology development and even fewer studies of patent literature research on CIM technology development. Using a scientific analytical framework to efficiently analyze patent documents, gaining technical knowledge, and connecting problems with potential solutions all play important roles in encouraging more effective innovation in this field [12].
The ultimate goal of smart cities is to adhere to sustainable urban development principles and accomplish effective urban management (e.g., urban planning, infrastructure, transportation, energy, services, education, health, and public safety) while meeting public needs [3]. CIM, as a smart city management digital twin system, leverages information and communication technology to make critical components of urban infrastructure and services more interactive, accessible, and effective [13]. As a result, in order to support the development of smart cities, it is vital to understand the technological development trends in CIM.
Patent documents are a valuable source of technical information since their contents are accurate, thorough, and cutting-edge [14]. Patent data is increasingly being used by academics to investigate technological development trends. Pellicer et al. [15] discovered, for example, that basic data gathering, processing, and transmission technologies are required for smart city applications. Furthermore, Wang et al. [16] discovered that network communication and administration technologies are an important development path in the digital twin area by researching digital twin patent data.
However, due to the limits of research methods, previous studies failed to investigate the distribution of topics and the link between topics from a complete standpoint. The building of CIM by municipal managers, in particular, is not restricted to offering services autonomously. Rather, it necessitates the deployment of a comprehensive infrastructure for urban data collecting, transfer, storage, and analysis in order to provide public services [17]. In this context, this study aims to close the gap by throwing light on the technical scope of CIM and investigating its development trend. This study specifically wants answers to the following research questions:
I. What is the most important technology in the field of CIM?
II. What are the most essential CIM-related technologies?
III. What is the goal and direction of future CIM-related technology development?
The work classifies patent literature data by using latent Dirichlet allocation (LDA) and the Hidden Markov Model (HMM) to discover the most relevant CIM technologies. From the standpoint of technology change, the changes in content and co-occurrence of technology topics are explored, technology trends are forecasted, and visual displays are developed. CIM-related businesses can shorten the research and development (R&D) cycle, reduce R&D expenses, comprehend the present technological environment, and gain insight into market trends by researching the evolution of CIM technology themes [18]. Additionally, it assists in defining the course of CIM technology development and evolution and directs industries to occupy the next technological highlands [19].
2. Literature Review
The "smart growth" movement of the 1990s is where the idea of a "smart city" first emerged [5]. Since then, it has expanded to include nearly every type of cutting-edge technological application for urban planning, construction, operation, and management, including traffic improvement [20], [21], environmental sustainability [22], and urban governance [1], [23]. Despite the idea of smart cities becoming more and more popular, no consensus exists on what constitutes a smart city. According to studies by Lara et al. [24] and Mora et al. [25], the most widely used definition of a smart city is a community that systematically promotes the overall well-being of all its members and has sufficient flexibility to adapt and become a better place to live, work, and play in a sustainable manner.
CIM is a sophisticated synthesis of urban information sources and three-dimensional spatial modeling [26], [27]. In a limited sense, it consists of large-scale GIS and BIM data from the development of smart cities [27]. Based on the integration of BIM and GIS technologies, CIM enables data granularity precise to a single module inside the city building, creating an intelligent city and providing crucial data support for thorough urban administration and excellent governance [9].
Unified data standards, urban information models, urban operation data, common supporting platforms, and digital twin applications are just a few of the numerous components that make up a smart city. CIM serves as the key connection in this process by creating a digital twin of a real city model. Digital twin construction emerges as a significant driving force to modernize the capacity of urban governance. It relies on the three-dimensional digital urban backplane of the CIM platform and is highly integrated with real-time perception, simulation, deep learning, and other information technology to carry out omni-dimensional and multi-dimensional smart city application.
To achieve cross-system application integration and cross-departmental information sharing and support the decision-making analysis of smart cities, CIM's extensibility promotes accessing the information resources of many urban public systems (such as population, housing, household water, electricity and gas information, security police data, traffic information, tourism resource information, and public health care).
Traditional techniques evaluate exemplary patents and extract technical topics from patent documents using expert knowledge [18]. The lack of expert resources, the challenge of determining patent representativeness, and the inability to study a large number of patents limit the usefulness of this strategy. Albino et al. [14] used the categorization properties of patents, such as the International Patent Classification number, as their technical topics to examine the evolution features of a particular field, trying to avoid relying on expert experience.
The limited types of patent features, however, compromise the precision of evolution trend analysis. Researchers used the patent co-occurrence network and citation relationship as a strategy to increase the accuracy of patent topics extraction [28]. The timeliness of topic evolution trend analysis cannot be guaranteed by this method due to the delays it produces. Therefore, when mining technical topics from patents and other scientific or technological documents, researchers use the Subject-Action-Object (SAO) structures in semantic similarity recognition, topic modeling, or topic clustering to account for the diversity of technical topics and the timeliness of the analysis [29], [30], [31].
A flexible probabilistic model called LDA was created specifically for topic modeling [32]. In order to represent the process of creating documents, this model uses probabilistic latent semantic analysis to introduce Dirichlet prior distribution and maximize word co-occurrence probability to look for word clustering. Consequently, it clusters documents and effectively extracts hidden topics. However, LDA's dimensionality reduction effect and recognition rate are compromised by its inability to keep the local structural information adequately.
To ensure effective LDA performance, an ideal number of topics must be chosen using a scientific manner. According to a 2012 proposal by Blei [33], the number of themes should be gauged by how perplexing they are. The language probability model's performance can be assessed and its parameters can be improved using perplexity. Perplexity determines the geometric mean of the sentence similarity in the literature set and estimates the information entropy of the probability distribution based on the information theory. Therefore, this work uses perplexity to evaluate and determine the optimal number of topics in the sample literature.
Grey prediction [34] and time series analysis are common methods used by researchers to predict the topics' evolution tendency [35]. However, these analysis techniques typically disregard the random characteristics of the innovation process, despite the fact that randomness is a crucial element of innovation [36], [37]. Ignoring randomness in the quantitative forecast of future technological trends causes an overestimation of the endurance of current technical themes and an underestimation of the exponential emergence of new technologies.
The topic evolution may be influenced by two different processes. The first is the motivation that academics gain from reading about past research breakthroughs and the introduction of fresh concepts during literary change. But because there aren't many records, this process is thought to be a secret sequence that can't be seen. Second, motivated by the first step, researchers efficiently document the research findings in scientific literature, producing observable sequences. The latter process serves as the former's micro-foundation, and the former serves as the latter's macro-performance. Therefore, topic evolution can be seen as the superposition of the two processes.
Baum and Petrie's HMM, a probability and statistical model, was developed in 1966 as a method to capture such superposition [38]. The concept was initially employed in language processing before becoming well-known and being adopted in other disciplines.
3. Methodology
The frequency approach is employed in this study to choose keywords. To create a trustworthy keyword search list, 48 CIM-related articles from 2018 to 2020 were first chosen, and their topics, titles, abstracts, and keywords were examined. Second, additional data was used to extract keyword synonyms and assess the significance of the keywords (using sources like Wikipedia). Four keywords were selected after screening the frequency of CIM-related terms.
On February 20, 2021, "Patentscope" was searched the World Intellectual Property Organization (WIPO) patent database for patents relating to CIM using the following search terms: TS="city information model," "urban information model," "urban digital twins," and "digital city." As a result, 2764 CIM-related patents published between April 1992 and February 2021 were found. According to the period of application, Figure 1 displays the number of CIM patents that have been registered. As depicted in the image, CIM-related patents began to grow rapidly around 2015 and have continued to do so ever since.
LDA is used in this study to model the data from the patent literature. In order to avoid the inefficiency and errors associated with human labeling, the model is first trained to create topics. The perplexity and transition matrices between topics in the evolution of CIM topic evolution are then computed using the state transition matrix in HMM and the probability distribution of the initial state. Finally, it is established how CIM topics have evolved over time and what the future evolution trend will be (Figure 2).
The LDA algorithm's assumptions and requirements are followed in this study. We therefore suppose that each topic follows the hyper parametric Dirichlet prior distribution:
where, $\theta_{d k}$ is the distribution of scientific documents d on topic k; α is the distribution of subject words.
The topic term distribution $\emptyset_k \sim \operatorname{Dir}(\beta)$ is generated for each topic k, and the topic term distribution $\theta_d \sim \operatorname{Dir}(\alpha)$ is generated for each patent document d. Furthermore, the topic $Z_{d n} \sim$ Multinomial $\left(\emptyset_{Z_{d n}}\right)$ is derived for the n-th term in each document. Therefore, the LDA likelihood model can be established as:
Perplexity can be calculated as:
where, $D$ is the test set in the corpus; $M$ is the number of documents; $N_d$ is the number of words in document $d$; $p\left(w_d\right)$ is the probability of the occurrence of $w_d$.
To avoid over fitting, the number of topics and perplexity must be selected carefully. The number of topics associated with the lowest perplexity is considered as the optimal value in LDA model training.
Following Heinrich's parameter estimation, this work sets $\alpha=50 / k$ and $\beta=0.1$. Furthermore, the Gibbs Sampling is used to derive the topic set $K=\left\{k_1, \ldots, k_h\right\}$, and the topic attribution set of each document $D_k=$ $\left\{j_1, \ldots, j_n\right\}$.
A complete HMM model can be described as a tuple $\gamma=(S, \pi, A, B, O)$. The random transition sequence of hidden states can be expressed as:
Let $Q=\left\{q_1, \ldots, q_t\right\}$ denote the randomly generated hidden state sequence, where $q_t \in S$, and $t$ is the number of topics in the hidden state. The change of the hidden state represents the topic state change. Thus, $Q$ represents the set of all possible topic states.
Then, the initial state of the system follows the probability distribution below:
where, $\pi_i$ is the occurrence probability of state $S_i$. The probability distribution of transitions of the research topic from state $S_i$ to $S_j$ can be described as follows:
where, $\alpha_{i j}=P\left\{\left(q_{t+1}=S_j \mid q_t=S_i\right)\right\}, 1 \leq i, j \leq N$, and satisfies $\alpha_{i j} \geq 0, \sum_{j=1}^N \alpha_{i j}=1$. For state $S_i$, the probability distribution of the observed variables can be expressed as:
where, $Q_t$ is the $t$-th observed variable. Thus, the observation sequence can be expressed as:
where, the observed state at time $t$ can be described as a vector $O_t=(\alpha(t), \beta(t))$, with $\alpha(t)$ and $\beta(t)$ being the inflow and outflow frequencies at time $t$, respectively. Therefore, the vector sequence formed by the inflow and outflow of all observed samples is the observation state $O$. Figure 3 shows the relationship between hidden state transition sequence and observation sequence.
The initial training value of the model is set as $O=Q=\left\{p t_1, \ldots, p t_2\right\}$. Then, the Baum-Welch algorithm is used to estimate the model parameters, yielding a single optimal state sequence. The structure of the research topic after $k$ years is obtained by $\hat{O}_{t k}=\sum_{j=1}^N A^K(i, j) E\left(b_j(v)\right)$.
4. Results
The learning effect of the LDA model, a machine learning technique, is highly correlated with the number of iterations. When there are more than 70 iterations in this study, the new iteration's contribution to the Log-Likelihood increase is almost zero. In order to reduce the cost of operation time, this study sets the number of iterations at 70. (Figure 4). The number of topics was also adjusted from six to seventy, and perplexity scores were computed. There is a minimum perplexity of 835.8 when there are 17 topics (Figure 5). Consequently, 17 topics were chosen.
The LDA library (genism) in Python is used in this study to calculate the topic information and provide keywords for each topic. For each topic, 40 keywords are explicitly extracted. The top five keywords with the highest likelihood are chosen to represent the topic after the words are sorted based on the assessed probability. Each topic is labeled and categorized in accordance with the sort of technology it represents to ease learning and ease referencing (Table 1).
No. | KW1 | KW2 | KW3 | KW4 | KW5 | ID | Category |
1 | Module | Control | Communication | Signal | Wireless | Communication control module | Network communication technology |
2 | Information | Management | Monitor | Terminal | Server | Information terminal | Network communication technology |
3 | Datum | Model | City | Information | Urban | Datum model | Heterogeneous application integration technology |
4 | Network | Model | Information | Prediction | Invention | Network model | Heterogeneous application integration technology |
5 | Information | Model | Accord | Feature | Target | Target recognition | IOT technology |
6 | Power | Electric | Control | Circuit | Energy | Circuit control | Network communication technology |
7 | Layer | Storage | Mechanism | Plurality | Computer | Plurality storage | Big data and data management technology |
8 | Card | Camera | Ring | Utility | Sign | Camera signal | IOT technology |
9 | Plate | End | Fix | Rod | Top | Plate fixed | IOT technology |
10 | Case | Computing | Electronic | Equip | Utility | Utility Computing | Cloud computing technology |
11 | Display | Screen | Board | Bus | Information | screen board | IOT technology |
12 | Vehicle | Traffic | City | Parking | Road | Vehicle road | IOT technology |
13 | Body | Machine | Box | Equip | Drive | Body equip | IOT technology |
14 | Sensor | Temperature | Utility | Body | Garbage | Temperature sensor | IOT technology |
15 | Computer | Computing | Automatic | Message | Drive | Autonomic Computing | Cloud computing technology |
16 | Lamp | Street | Light | City | Part | Lamp part | IOT technology |
17 | Water | Pipe | Gas | Detection | Pipeline | Pipeline detection | IOT technology |
A heat map of the topic number and each topic's individual gravity trend is created based on the results that were shown (Figure 6). In Figure 6b, the annual change trend and initial occurrence time of the topics show how the topics are continually subdivided as the research progresses. The first CIM application dates back to the 1994 Amsterdam experiment De Digitale Stad [39]. At this point, network communication data utilization and urban data collection have been investigated, thus technological topics like network model, information terminal, and camera signal are developing [40].
With the advancement of network communication technologies from wired to wireless and mobile wireless networks, mobile information communication capabilities have been significantly improved. The widespread adoption of 2.5G, 3G, 4G, and 5G technologies, which provide network infrastructure for the quick development of new technologies, has had a significant impact on these advancements. Circuit control, cloud computing, and autonomous computing are topics in network communication.
Additionally, the IOT technological topic known as lamp part has been created [8], [41]. The IOT has enabled the CIM information interaction mode, which primarily relies on human-computer interaction, to progress into the ubiquitous computing stage of multi-source real-time information acquisition and intelligent control [42]. As a result, further discussions evolved on the themes of target recognition, temperature sensors, and communication control module.
Parallel to this, the growth of the IOT and its applications has created a number of issues, including the administration, processing, fusion, integration, and mining analysis of huge data from several sources, giving rise to the topic of plurality storage [43], [44], [45]. The storage, administration, processing, sharing, integration, and application of multi-source enormous data present a considerable demand on computer resources with the development of a digital city or smart city. This problem is solved by cloud computing technology, which also gave rise to key topics like utility computing and autonomous computing.
The likelihood of perplexity or transition between topics increases with the similarity of the topics. This paper bases its method of topic word co-occurrence analysis on this idea. We measure and use as a proxy for the degree of similarity across topics the co-occurrences of the first 40 topic words in 17 topics. To this end, the 17 word co-occurrence symmetry matrix was built, and the related heat map is shown in Figure 7. The diagonals are highlighted the most since each topic occurs with itself most frequently, whereas the other parts have varied hues because other topics occur less frequently. The likelihood that a hidden state would be detected as an observable state is displayed in the perplexity matrix. The threshold of transition between patent topics in the evolution process is measured using this probability. The matrix also indicates the likelihood and direction of a topic change at the same time. The perplexity matrix's dark squares correspond to the topics that are subject to change (transition) throughout time. It is apparent from Figure 7 that most CIM topics are straightforward, with a high degree of independence of research content and a high threshold for state transition. In other words, there is a high and clear barrier between most topics.
Topics having a perplexity probability of more than 10% are shown as a perplexity network diagrams to further the analysis (Figure 8). According to Figure 8, CIM R&D topics can go more readily from multi-component topics (e.g., drive equipment and information terminal)) to technology topics including multi-scene applications (such as screen board, temperature sensor, and camera signal). Additionally, a number of technology topics serve as crucial links in the development and transformation of other technology topics, serving as significant bridges and intermediaries in the growth of technology. The information terminal, for instance, serves as a node between the control module and the target recognition system and the vehicle road. Information terminal is a vital node in this transition because control module and target recognition are subdivisions of information terminal application, and vehicle road is the integration of information terminal application. A technical transition from target detection and automated drive to vehicle road is also provided by plurality storage. Important machine learning applications include target recognition and automated driving, and plurality storage offers a technical assurance for machine learning. Camera signal, vehicle road, and automatic driving are further topics with comparable functions. The ambiguity feature illuminates the prospective R&D development course of the CIM technology topic, making it simple for the micro R&D department to select the technology path scientifically while planning the scientific research objective.
The chance of switching between topics in the macrohistorical R&D process is represented by the transition matrix. The diagonal in Figure 7d, however, is highlighted, indicating that the majority of topics continue to exhibit consistency in their research trends. The characteristics of the transition in R&D trend among topics vary when seen from the perspective of a single topic. In this study, a relationship feature diagram is plotted using the topics that have a transition probability of more than 15% (Figure 9).
Datum modeling, target identification, and screen boards are some of the topics with a great ability to retain their R&D direction. The topic with the largest proportion of transition loss throughout evolution is "screen board," which mainly flows to target recognition, control module, information terminal, and body equipment. These topics have consistently maintained a high degree of co-occurrence and are hot technology nodes in the CIM-related R&D. This behavior is brought on by the ongoing diversification of wireless terminal products, the escalating anti-interference standards for circuit boards, the steady decline in R&D for simple-use screen boards, and the rise in R&D and use of specialized equipment.
Since the current CIM platform has not yet developed a unified spatiotemporal data underlying framework compatible with heterogeneous information systems, the "datum model" has seen the largest increase in popularity, and its R&D strength mainly comes from "camera signal," "information terminal," and "plurality storage." There are many different standard formats for data collecting or design modeling software, and there exist hurdles to data accommodation between formats. Multi-source data includes vector data, raster data, model data, point cloud data, and other data from several sources. The CIM role is constrained if the interdepartmental business system's data format is uncoordinated, the data authority is unclear, and the data docking mechanism is flawed. As a result, the study of the datum model is always expanding. Additionally important is the rise in the "heat" of target recognition. Target identification, a hot topic in the realm of deep learning, is a crucial way of data collecting and human-computer interaction in the CIM application. The foundational technology of an intelligent city is the road target identification algorithm, which employs deep learning to assess the condition of the road. Finding the target's position and size inside an image, as well as detecting particular target classes, are two of the functions of target recognition. Intelligent security, intelligent medical care, unmanned retail, intelligent hardware, and intelligent robotics are some of the related fields.
The 2021 literature has not yet been released in its whole as of the conclusion of this study. Therefore, this study uses the patents released up until the end of 2020 as the training set to anticipate the CIM evolution trends, and it adds the predicted perplexity matrix and transition matrix to the Matlab HMM module. The outcomes of the HMM prediction of the evolution of the CIM topics from 2018 to 2026 are as a consequence obtained, as depicted in Figure 10.
The hidden Markov predictions reveal a sharp rise in the proportion of topics related to temperature sensors, camera signals, electrical equipment, and body equipment. Front-end data collection is necessary for the development of CIM. Real-time data acquisition is encouraged by effective front-end data collection [8]. Artificial intelligence and machine learning technologies provide a better understanding of how cities evolve and cope with drawbacks as sensors, computing cores, and more effective electronic communication systems get installed in urban infrastructure. The Internet of Everything is made possible by the CIM platform [46], [47], and the growth of electrical and body equipment is in line with these developments. Radio frequency identification, infrared sensors, global positioning systems, laser scanners, and other information sensing devices are becoming increasingly embedded in power grids, railways, bridges, tunnels, roads, buildings, water supply systems, dams, oil and gas pipelines, and other objects, thanks to the advancement of front-end data collection systems. Micro-platform integration now replaces fixed technology (plate fixed) requirements for particular platform equipment.
Target recognition is an image recognition technology based on machine learning. The advancement of this technology necessitates increased quantity and quality of visual data as well as the development of more effective algorithms [48]. Target recognition hence entered the period of technology accumulation, after quick early development, when the pre-technology such as camera signal failed to progress considerably. As a result, the degree of R&D has somewhat decreased, calling for an upgrade and optimization of the relevant technology (such as machine learning supporting technology). Similar to this, it became challenging to create technical innovation without significant advances in hardware technology, only based on the early development of technologies related to network and datum models. As a result, related technology development is very sluggish.
5. Discussion
The examination of the current patent topics in this work leads to the identification of technical CIM-related topics. Data collection, data transmission, data storage, and data application are the four main technological domains that the technologies associated with CIM and Smart City fall under [15]. CIM views the city as a whole, or as an organic system capable of intelligent and coordinated operation, from a technological perspective. Hence, the city is transformed into an ever-more-powerful digital system. Through the communication control module and the circuit control (nervous system), sensor signals like camera signal and temperature (sensory system) are embedded into the ubiquitous information terminal and datum model (brain), directing automatic drive, pipeline detection, and other data terminals. CIM, which is an essential component of managing smart cities, is the digital twin of urban entities on digital equipment [8]. However, because management technology is not included in the CIM construction as a concept of digital information technology, it is unable to completely actualize smart management even though it can assist urban intelligence management [3]. Along with digital information technology, management technologies such as policy-driven management [49] and social restrictions [50] should be taken into account when managing smart cities.
From a technological standpoint, the research of CIM now concentrates on IOT-related technologies (such as camera signals, electrical equipment, body equipment, and temperature sensors), big data, and data management technology (plurality storage). Mobile communication has greatly advanced with the adoption of 4G and 5G technologies, supplying the network infrastructure for the quick development of the IOT [42]. IOT technology advancements have resulted in an increase in data volume and data types in cities. This enhances the usefulness of and active development in associated data scheduling and management technologies (such as effective indexing, databases, and distributed storage).
Two tendencies in the advancement of CIM technology were identified by the analysis of the existing patent literature. Namely, software technology progress tends to slack off while hardware technology advances steadily. These trends do not necessarily reflect the relative importance of hardware and software technology, but rather the inadequacies in the current hardware that prevent it from supporting the significant advancement of software technology. For some target recognition applications, high-resolution image technology is necessary. For instance, in autopilot systems, target recognition is crucial, making lidar and camera necessary. The topic transition diagram also demonstrated the flow of target reorganization technology research and development to other technology sectors, including video signal and communication control module research. This highlighted the necessity for additional study on automated driving technology. As a result, municipal administrators must raise their spending on hardware R&D to further the development and promotion of CIM. To encourage the creation of the related follow-up applications, the hardware technology must continuously advance.
6. Conclusions
City managers must transition from traditional urban administration to intelligent city management using high-tech tools as a result of the sharp increase in urban population. The growing demand for intelligent city management among city managers has accelerated the technological advancement of related services and goods, with the development of the associated CIM technology being essential to the creation of the smart city. Therefore, it is urgently needed to reassess CIM technology development. The LDA topic model and HMM are used in this study to offer a fresh approach to mining and forecasting the topics of scientific publications. The technique allows for speedy mining of hidden topic information, collecting the main subject nodes and topic evolution patterns, and enabling efficient and unsupervised (i.e., without expert input) clustering of literature. It also offers a novel approach to text analysis.
Our method divides the associated technologies in the field of CIM into 17 topics, with the R&D of CIM patented technology focused on plurality storage, camera signal, electronic equip, body equipment, and temperature sensor. The three most significant technological advancement directions in the future are front-end collecting technology, terminal application technology, and data management technology. The hardware supporting technology for CIM needs to be developed concurrently, in contrast to the software technology. The technology particularly covers information transmission equipment, storage equipment, and information collecting equipment (cameras, sensors, etc.). As these fields are still in the early stages of research, it is necessary to increase their capacity for innovation. Scientific research institutes from different nations will need to find a solution to the issue of how to conduct additional innovative research on these R&D topics in the future.
The limitations of this study, however, may serve as a guiding principle for future research. First and foremost, this article exclusively examined the patent information included in the WIPO Patentscope database. Additional patent information from various patent databases should be included in future studies. Second, this study classifies the related topics using the LDA model. Future studies will analyze CIM patent documents using various categorization techniques, such as the BERT model [51]. Finally, the disparities between various locations are not taken into consideration in the examination of technology development trends that has been presented. Future research will therefore compare CIM development trends in other nations.
Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.
The authors declare that they have no conflicts of interest.