Research on Anomaly Detection in Time Series: Exploring United States Exports and Imports Using Long Short-Term Memory

sakshi aggarwal

Outline

Acadlore takes over the publication of JORIT from 2025 Vol. 4, No. 3. The preceding volumes were published under a CC-BY 4.0 license by the previous owner, and displayed here as agreed between Acadlore and the owner.

Research article

Research on Anomaly Detection in Time Series: Exploring United States Exports and Imports Using Long Short-Term Memory

sakshi aggarwal^*

Indian Institute of Foreign Trade, New Delhi, India

Journal of Research, Innovation and Technologies

|

Volume 2, Issue 2, 2023

|

Pages 200-225

https://doi.org/10.57017/jorit.v2.2(4).06

Received: N/A,

Revised: N/A,

Accepted: N/A,

Available online: N/A

View Full Article|

Download PDF

Abstract:

This survey aims to offer a thorough and organized overview of research on anomaly detection, which is a significant problem that has been studied in various fields and application areas. Some anomaly detection techniques have been tailored for specific domains, while others are more general. Anomaly detection involves identifying unusual patterns or events in a dataset, which is important for a wide range of applications including fraud detection and medical diagnosis. Not much research on anomaly detection techniques has been conducted in the field of economic and international trade. Therefore, this study attempts to analyse the time-series data of United Nations exports and imports for the period 1992 – 2022 using LSTM based anomaly detection algorithm. Deep learning, particularly LSTM networks, are becoming increasingly popular in anomaly detection tasks due to their ability to learn complex patterns in sequential data.

This paper presents a detailed explanation of LSTM architecture, including the role of input, forget, and output gates in processing input vectors and hidden states at each timestep. The LSTM based anomaly detection approach yields promising results by modelling small-term as well as long-term temporal dependencies.

Keywords: Anomaly detection, Long short-term memory, Machine learning, Artificial intelligence, Economic trade

1. Introduction

Anomaly detection is a technique used to identify unusual patterns or events in a dataset that do not conform to the expected behaviour or typical values. Anomaly detection is interesting because it involves automatically discovering interesting and rare patterns from datasets (Ahmed et al., 2014). Anomaly detection has been widely studied in statistics and machine learning. Anomalies are also referred to as outliers, novelties, or deviations. Anomalies are important because they indicate significant but rare events, and they can prompt critical actions to be taken in a wide range of application domains, including fraud detection in financial transactions, intrusion detection in network security, fault detection in manufacturing processes, and medical diagnosis in healthcare (Lin et al., 2005; Chandola et al., 2009; Chalapathy and Chawla, 2019).

In the broader field of machine learning, recent years have witnessed a proliferation of deep neural networks, with unprecedented results in diverse application domains. Deep learning is a subset of machine learning that achieves good performance and flexibility by learning to represent the data as a nested hierarchy of concepts within layers of the neural network. (Peng and Mărculescu, 2015; Javaid et al., 2016). Deep learning outperforms traditional machine learning as the scale of data increases as illustrated in Figure 1.

Figure 1. Performance comparison of deep learning-based algorithms vs traditional algorithms

The recent trends in machine learning have witnessed that deep learning-based anomaly detection algorithms are becoming increasingly popular in a diverse set of tasks. Figure 2 provides a glimpse that deep learning completely surpasses traditional methods in various domains.

Figure 2. Applications deep learning-based anomaly detection algorithms

In this paper, we focus on anomaly detection research in the international trade domain. Many sub-fields of economics are moving towards the use of AI in behavioural modelling, e.g., (Aggarwal and Chakraborty, 2017, 2019, 2020a, 2020b, 2020c, Ghoddusi et al., 2019) for energy and international economics. However, there is limited work in the literature that shows tailored and optimized neural network algorithms for economic applications. Neural networks offer the ability to find connections between variables that are difficult to detect using traditional machine learning methods. Even compared to other machine learning approaches, neural networks can provide a better fit in the long term as found in (Gopinath et al., 2020; Monken et al., 2021) for international trade. The 2020 U.S. National Artificial Intelligence Initiative Bill states that “Artificial intelligence is a tool that has the potential to change and possibly transform every sector of the U.S. economy” (S. 1558 – AIIA 2020). Given its obvious influence on sectoral growth, the next sector for AI to advance is international trade, and its causal effects on production, consumption, and prices.

Many authors have proposed varying definitions for an anomaly in their research; however, there has not been a universally adopted definition. Exact definitions of an anomaly depend on the problem statement, assumptions regarding the structure of the data and the application in consideration. However, definitions exist that are considered general to most if not all cases, regardless of the setting or application. Of those definitions, the most widely recognized is by Hawkins, who defines the concept of an anomaly or an outlier in this case, as follows: “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins, 1980). This definition refers to data from a statistics-based intuition, where normal data follows a generating mechanism and anomalies are samples or instances which deviate from this mechanism. Thus, anomalies often relay useful information about a system’s abnormal characteristics that are impacting the generating mechanism (Aggarwal, 2013). Figure 3 illustrates anomalies in a simple 2-dimensional dataset. The data has two normal regions, N1 and N2 since most observations lie in these two regions, whereas the region O3, and the data points O1 and O2 are few data points which are located further away from the bulk of data points and hence are considered anomalies.

Figure 3. Graphical visualization of anomalous data in a two-dimensional representation

Anomalies might be induced in the data for a variety of reasons, such as malicious activity or breakdown of a system, with the common characteristic that these reasons are of interest to the analyst. Such activities include credit card fraud, cyber-intrusion, terrorist activity, breach of computer network, etc.

Types of anomalies

Anomalies can be classified into different categories, and these refer to various types of unexpected or unusual events or data points that do not conform to a well-defined characteristic of normal behaviour. The type of anomaly being dealt with is a crucial aspect of consideration for any anomaly detection technique. The three main categories of anomalies are discussed below:

▪ Point anomalies: These are individual data points that are significantly different from the rest of the data in a dataset. For instance, the monthly expenditure on grocery shopping can be considered. If the usual expenditure on grocery shopping per month is USD 500 but if it becomes 5001 in any random month then it is a point anomaly.

▪ Contextual anomalies: These are data points that are not anomalous on their own but are considered unusual when considering the context or conditions in which they occur. For instance, high expenditure on credit cards during a festive period, e.g., Christmas or New Year, is usually higher than the rest of the year. Although the expenditure during a festive month can be considered high, it may not be anomalous due to the high expenses being contextually normal at that time.

▪ Collective anomalies: These are groups of data points that are anomalous when analysed together but may not be anomalous when analysed individually. For example, in a human Electrocardiogram (ECG) output, the existence of low values for a long period of time indicates underlying phenomenon corresponding to abnormal pre-mature contraction (Keogh et al., 2005), however, one low value by itself is not considered as anomalous.

The few other categories of anomalies are as follows:

▪ Temporal anomalies: These are data points that exhibit anomalous behaviour over time, such as sudden spikes or drops in value. A common example of a temporal anomaly is credit card fraud detection. Credit card transactions typically occur at regular intervals and have a consistent pattern of behaviour. Any transaction that deviates significantly from this pattern may be considered anomalous and indicative of fraud. For example, if a customer who typically makes small purchases suddenly starts making large transactions at unusual times, it could be flagged as a temporal anomaly.

▪ Spatial anomalies: These are data points that exhibit anomalous behaviour in terms of their location or spatial distribution. A common example of a spatial anomaly is detecting unusual patterns in the distribution of disease outbreaks. Epidemiologists can use spatial analysis techniques to identify clusters of cases that occur near each other, indicating a potential outbreak. Any area that has a significantly higher incidence of a disease compared to its neighbouring regions may be considered a spatial anomaly.

▪ Rare events: These are events that are highly unusual or unexpected, such as natural disasters or cyber-attacks.

▪ Noise anomalies: These are data points that are not actually anomalous but appear to be due to measurement errors or random fluctuations in the data. A common example of a noise anomaly is in sensor data. Sensors can sometimes produce inaccurate or random measurements due to a variety of factors, such as environmental conditions, electrical interference, or malfunctioning equipment. These measurements may not actually reflect any meaningful signal, but they can still be captured and recorded as part of the dataset, leading to false conclusions if not handled correctly. For example, if a temperature sensor in a room suddenly produces a reading that is significantly higher or lower than the surrounding readings, it could be flagged as a noise anomaly.

Anomaly detection is related to but distinct from noise removal (Chen et al., 1990) and noise accommodation (Rousseeuw and Leroy, 1987), both of which deal with unwanted noise in the data. In real-world applications, the data may be affected by a significant amount of noise, which may not be of interest to the analyst, but acts as a hindrance during the data analysis stage. It is usually only significantly interesting deviations that are of interest (Aggarwal, 2013). The detrimental effect of noise on data analysis drives the need for noise removal, as it removes any unwanted objects before data analysis (Xiong et al., 2006). Another topic related to anomaly detection is novelty detection (Markou and Singh, 2003a, 2003b; Saunders and Gero, 2000) which aims at detecting previously unobserved novel patterns in the data, e.g., a new topic of discussion in a news group. The distinction between novel patterns and anomalies is that the novel patterns are typically incorporated into the normal model after being detected.

What are novelties?

Novelty detection is the identification of a novel or unobserved patterns in the data. The novelties detected are not considered as anomalous data points; instead, the data points are applied to the regular data model (Miljković, 2010). A novelty score may be assigned for these previously unseen data points, using a decision threshold score (Pimentel et al., 2014). The points which significantly deviate from this decision threshold may be considered as anomalies or outliers. For instance, in Figure 4 the images of white tigers among regular tigers may be considered as a novelty while the image of horse, panther, lion and cheetah are considered as anomalies. The techniques used for anomaly detection are often used for novelty detection and vice versa.

Figure 4. Illustration of novelty in the image dataset

2. Literature Survey

A substantial amount of research on outlier detection has already been conducted in statistics in the previous century (Rousseeuw and Leroy, 1987; Barnett and Lewis, 1994; Hawkins, 1980; Beckman and Cook, 1983). An extensive review of novelty detection techniques using deep learning framework and advanced statistical methodology has been presented in Markou and Singh (2003a, 2003b). A plethora of published research literature has been conducted by applying anomaly detection techniques in various applications in different domains. Anomaly detection has emerged as the topic of focus for many surveys and review papers in recent years. An extensive survey of anomaly detection techniques developed in machine learning and statistics has been provided by (Hodge and Austin, 2004; Nguyen and Armitage, 2008). These surveys provide extensive background on outliers or anomalies and the challenges associated with detecting them, thereby marked a significant impact on further research in various fields. In 2003, Noble and Cook detected unusual patterns within graph-based data using the concept of ‘conditional entropy’ (Noble and Cook, 2003). In 2009, Chandola et al. also surveyed different aspects of anomaly detection techniques that has been proposed in research but not covered in the literature of Hodge and Austin, provided more meaningful insight into the real-world applications they are employed in Chandola et al. (2009).

A survey Zimek et al. (2012) published research for high-dimensional numerical data and reviewed unsupervised anomaly detection techniques specifically. The focus of the research was to discuss the aspects of the ‘curse of dimensionality’ in detail. The analysis involved in this research was comparison of two categorizes of specialized algorithms: ones that address the presence of irrelevant feature or attributes in the model and others that are more concerned with efficiency and effectiveness of the model. Temporal data poses another issue for detecting anomalies in the dataset. With the advances in computational capabilities enabling the availability of various forms of temporal data, the need to extensively review the anomaly detection techniques in time-series data has arisen. In this respect, Gupta et al. (2013) conducted research and provided significant insight into various applications of temporal anomaly detection and the associated challenges in each domain.

Other exhaustive survey papers exist that focus more specifically on the techniques and applications of anomaly detection in several domains includes (Patcha and Park, 2007; Garcia-Teodoro, 2009; Callado et al., 2009; Zhang et al., 2009; Sperotto et al., 2010). Patcha and Park (2007) presented surveys of anomaly detection techniques used specifically for cyber intrusion detection. Callado et al. (2009) reported major techniques and problems identified in IP traffic analysis, with an emphasis on application detection. Zhang et al. (2009) presented a survey on anomaly detection methods in network intrusion detection. A review of flow-based intrusion detection was presented in Sperotto et al. (2010), who elaborates on the concepts of flow and classified attacks and provided a detailed discussion of detection techniques for DoS attacks. Other literature surveys have been reported in the context of wireless networks (Sun et al., 2006; Sun et al., 2007a, Sun et al., 2007b; Sun et al., 2007c). Sun et al. (2007a) presented a survey of intrusion detection techniques for mobile ad-hoc networks (MANET) and wireless sensor networks (WSN). Their analysis also highlighted the several important research issues and challenges in the context of building IDSs by integrating aspects of mobility. A systematic literature review of different graph-based anomaly detection techniques that have been studied in published literature (Pourhabibi et al., 2020).

Not much extensive research on anomaly detection techniques has been conducted in the field of economic and international trade. Understanding, modelling, and predicting international trade is a central domain in economics. Economists have implemented the Gravity equation of trade to describe trade flows (Anderson 1979, Aggarwal et al., 2022; Aggarwal et al., 2023). More recently, researchers across multidisciplinary fields have identified international trade as well suited for machine learning and network analysis (Bhattacharya et al., 2007; Aggarwal 2016, 2017a, 2017b; Aggarwal et al., 2021). By harnessing machine learning methods, researchers have been able to better predict agricultural exports trade patterns using Gradient Tree Boosting, ARIMA, and XGBoosting (Batarseh et al., 2019; Aggarwal and Chakraborty, 2021, 2022). Use of shallow neural networks also improves international trade forecasting (Wohl and Kennedy 2018). Combining neural networks with network analysis theory of international trade, (Panford-Quainoo et al., 2020; Monken et al., 2021) demonstrates the efficacy of country GDP classification using a Graph Neural Networks (GNNs) for a single period. Our contribution to the literature is to detect anomalies in novel ways for United States trade data over the past two decades and influencing public policy for the greater good.

3. Deep Learning Anomaly Detection Techniques

Deep learning anomaly detection techniques have gained prominence in recent years, demonstrating significantly better performance than other techniques in addressing real-world problems. These include neural networks (NN) architectures of various types, such as convolutional neural networks (CNN), long short-term memory networks (LSTM), autoencoders (AE) and generative adversarial networks (GANs). (Agarwal et al., 2017; Fiore et al., 2019; Wang et al., 2020). Neural networks have been successfully applied to a wide range of tasks, including image and speech recognition, natural language processing, and predictive analytics. Convolutional neural networks are commonly used for image recognition tasks.

LSTM networks are well-suited for tasks that involve processing sequential data, such as speech recognition, natural language processing, and time series prediction. Autoencoders have many applications, including image and video compression, anomaly detection, and feature extraction for downstream tasks such as classification or clustering. They can also be used for generative modelling by training the decoder network to generate new data points by sampling from the compressed representation. Generative adversarial networks are used for generating realistic synthetic data, such as images and videos. This section reviews the neural network algorithms and its different types that have been applied in the published literature.

3.1 Neural Networks (NN)

A neural network is a type of machine learning model that is designed to mimic the structure and function of the human brain. In this section, we are primarily concerned with feedforward neural networks, also known as multi-layered perceptron networks (MLP), which are simplest type of neural networks. It is made up of interconnected nodes, or neurons, which are organized into layers.

The basic building block of a neural network is the neuron. Each neuron receives one or more inputs, which are multiplied by weights that represent the strength of the connection between the neuron and its inputs. The neuron then applies an activation function to the weighted sum of the inputs, producing an output. The activation function is a non-linear function that is applied to the weighted sum of the inputs. It is used to introduce non-linearity into the network and enable it to learn complex relationships between inputs and outputs. Common activation functions include sigmoid, tanh, and ReLU (rectified linear unit). A neural network typically consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the input data, which is then passed through the hidden layers before reaching the output layer (Michelucci, 2018). A graphical illustration of a densely connected NN can be seen in Figure 5. Each hidden layer consists of multiple neurons, each of which applies an activation function to the weighted sum of its inputs. During training, the weights of the connections between neurons are adjusted in order to minimize the difference between the network's predicted outputs and the true outputs. This process is executed by feeding the network a set of training examples, each with a known input and output, and adjusting the weights based on the difference between the predicted and true outputs.

Figure 5. Labelled visual representation of a densely connected multi-layered perceptron

Neural network algorithm can be broken down into several steps, including:

▪ Initialize the weights of the connections between the neurons in the network to small random values.

▪ Feed the input data into the input layer of the neural network. The data then flows through the network, layer by layer, until it reaches the output layer. At each layer, the weighted sum of the inputs is calculated, and an activation function is applied to produce the output of each neuron.

▪ Compare the output of the neural network to the actual output and calculate the loss or error. This is typically done using a loss function such as mean squared error (MSE) or cross-entropy loss.

▪ Propagate the error backwards through the network, from the output layer to the input layer. The error is used to adjust the weights of the connections between neurons, with the goal of minimizing the loss.

▪ Use an optimization algorithm, such as stochastic gradient descent (SGD) or Adam, to update the weights of the connections between neurons based on the error calculated during backpropagation.

▪ Repeat steps 2-5 for multiple iterations or epochs, until the network has learned to produce accurate outputs for the given inputs.

▪ Use the trained neural network to predict the output for new inputs.

This process is typically repeated multiple times with different training examples until the network can accurately predict outputs for unseen data. Additionally, hyperparameters such as the learning rate, number of layers, and number of neurons in each layer can be tuned to improve the performance of the neural network.

3.2 Convolutional Neural Networks (CNN)

CNN is another type of deep learning architecture that was first introduced in the 1990s, which is characterized as a network with many layers categorized into the input, pooling, fully connected, and output layers (LeCun and Bengio, 1995). They are named after the fact that the convolutional layer performs a mathematical operation known as convolution on the input. CNN (LeCun et al., 1998) is a subclass of neural networks that takes advantage of the spatial structure of the inputs. CNN models have a standard structure consisting of alternating convolutional layers and pooling layers (often each pooling layer is placed after a convolutional layer). Figure 6 displays a representation of the structure of a CNN’s layers. CNNs are usually trained by backpropagation via Stochastic Gradient Decent (SGD) to find weights and biases that minimize certain loss function in order to map the arbitrary inputs to the targeted outputs as closely as possible. CNNs have been predominantly used for image-driven pattern recognition tasks due to the essential nature of the input, which is usually in matrix form (O’Shea and Nash, 2015). However, they have been demonstrated to be applicable in other fields and domains by manipulating the structure of the input data.

Figure 6. Schematic representation of the layers of a CNN and the associated functions

The various layers of CNN’s architecture are responsible for carrying out different operations. As briefly mentioned, the convolution operation is performed in the convolutional layer, which preserves the spatial relationship of the input to extract derived features. Different filters are used for the convolution operation, acting as feature detectors. The pooling layer reduces the dimensionality of the feature maps produced from the convolutional layer using subsampling techniques, retaining the most critical information, and makes the network more robust to small variations in the input image. Pooling allows for the input to be more manageable, reducing the number of computations and parameters in the network, which helps to control overfitting (Krizhevsky et al., 2012; Albelwi and Mahmood, 2017). The fully connected and output layer is a traditional densely connected feedforward NN or MLP whose input is the output from the convolutional and pooling layer and serves to classify the data.

The algorithm of a convolutional neural network (CNN) can be broken down into several steps, including:

▪ Convolution: The input image is convolved with a set of learnable filters, also known as kernels or weights. Each filter is moved across the image, and the dot product of the filter and the image pixels within the filter window is calculated at each position. This produces a feature map that highlights certain patterns or features in the image.

▪ Activation: An activation function such as ReLU (rectified linear unit) is applied to the output of each convolution operation. This introduces non-linearity into the network and enables it to learn complex features.

▪ Pooling: A pooling operation is applied to the output of the activation function. This down samples the feature map by taking the maximum or average value within a small window. This reduces the dimensionality of the feature map and makes the network more computationally efficient.

▪ Repeat: Steps 1-3 are repeated multiple times, with the output of one layer becoming the input to the next layer. This allows the network to learn increasingly complex features at higher levels of abstraction.

▪ Fully Connected Layers: The output of the last convolutional layer is flattened and passed through one or more fully-connected layers, also known as dense layers. These layers perform a weighted sum of the inputs, followed by an activation function, to produce the final output of the network.

▪ Loss Calculation: Compare the output of the neural network to the actual output and calculate the loss or error. This is typically done using a loss function such as cross-entropy loss.

▪ Backpropagation and Weight Update: Propagate the error backwards through the network using backpropagation and update the weights of the connections between neurons using an optimization algorithm such as stochastic gradient descent (SGD) or Adam.

▪ Repeat: Repeat steps 1-7 for multiple iterations or epochs, until the network has learned to produce accurate outputs for the given inputs.

▪ Prediction: Use the trained CNN to predict the output for new inputs.

This process is typically repeated multiple times with different training examples until CNN can accurately predict outputs for unseen data. Additionally, hyperparameters such as the size of the filters, the number of filters in each layer, and the size of the pooling windows can be tuned to improve the performance of the CNN.

3.3 Long Short-Term Memory Networks (LSTM)

Long Short-Term Memory networks (LSTM) are an extension of recurrent neural networks (RNN), a form of deep neural network primarily used for time series data proposed by Hochreiter and Schmidhuber in 1997 (Hochreiter and Schmidhuber, 1997). Each neuron in an LSTM is a cell with ‘memory’ that can store information, maintaining its own state, in contrast to RNNs that merely take the current input from their previous hidden state to output a new hidden state. LSTM is designed to overcome the vanishing gradient problem in traditional RNNs, which occurs when gradients become too small during backpropagation and cause the network to forget long-term dependencies. LSTM is used for processing sequential data such as time series, natural language text, or speech. (Gers et al., 2000; Yu et al., 2019).

LSTM is composed of several memory cells, each containing three gates: input, forget, and output gates. The input gate controls the amount of new information that enters the cell, while the forget gate controls the amount of old information that is retained in the cell. The output gate controls the amount of information that is outputted from the cell. During each timestep, the LSTM receives an input vector and a hidden state vector from the previous timestep. The input vector is first multiplied by a weight matrix and added to the previous hidden state vector. The resulting vector is then fed through the input gate, which applies a sigmoid activation function to the weighted sum of the input and the previous hidden state. The output of the input gate is then multiplied by a candidate vector that contains new information, which is passed through a hyperbolic tangent function to normalize the values between -1 and 1. The forget gate is then applied to the previous hidden state vector and the input vector. It also applies a sigmoid activation function to the weighted sum of the previous hidden state and the input vector. The output of the forget gate is then multiplied by the previous cell state vector, which results in the removal of unnecessary information from the cell. The output gate is then applied to the current input vector and the current hidden state vector. It applies a sigmoid activation function to the weighted sum of the input vector and the current hidden state. The output of the output gate is then multiplied by the hyperbolic tangent of the current cell state vector, resulting in the output of the LSTM cell. The LSTM architecture can be stacked to form multiple layers, which allows the network to learn more complex patterns in the input sequence. The output of the final LSTM cell is typically fed through a dense layer with a SoftMax activation function to predict the next element in the sequence.

The algorithm of Long Short-Term Memory (LSTM) networks can be broken down into several steps:

▪ Input Processing: The input sequence is fed into the LSTM network, with each element of the sequencebeing processed one at a time.

▪ Forget Gate: The LSTM network decides which information from the previous time step to forget, by usinga sigmoid activation function to decide which values to keep or discard.

▪ Input Gate: The LSTM network decides which new information to add to the memory cell, by using asigmoid activation function and a tanh activation function to compute the values to add.

▪ Update Memory Cell: The LSTM network updates the memory cell with result of the forget and input gates.

▪ Output Gate: The LSTM network decides which information to output, by using a sigmoid activationfunction and a tanh activation function to compute the output value.

▪ Output: The LSTM network produces the output value and passes it on to the next time step.

▪ Repeat: Steps 2-6 are repeated for each element of the input sequence.

▪ Loss Calculation: Compare the output of the LSTM network to the actual output, and calculate the loss orerror. This is typically done using a loss function such as mean squared error (MSE) or binary cross-entropy.

▪ Backpropagation and Weight Update: Propagate the error backwards through the network usingbackpropagation, and update the weights of the connections between neurons using an optimizationalgorithm such as stochastic gradient descent (SGD) or Adam.

▪ Repeat steps 1-9 for multiple iterations or epochs, until the network has learned to predict outputsaccurately.

3.4 Autoencoders (AE)

An autoencoder (AE) is a type of unsupervised deep learning network symmetric in structure with fewer nodes in the middle layers. It has a section that encodes inputs into a lower-dimensional representation and another section that decodes or reconstructs that input again (Boukerche et al., 2020).

The basic idea behind an autoencoder is to learn a compressed representation of the input data by reducing its dimensionality and then reconstructing the original data from this compressed representation. Autoencoders are composed of two main components: an encoder network that compresses the input data into a lower-dimensional representation, and a decoder network that reconstructs the original data from this representation. The encoder network takes the input data and applies a series of transformations to it to reduce its dimensionality. This is typically done using a series of fully connected layers or convolutional layers with non-linear activation functions, such as ReLU or sigmoid. As illustrated in Figure 7, the input layer passes the input data to the hidden layer to the output layer, where it is decoded and reconstructed as much as possible. The number of hidden layers in an AE is arbitrary, with the condition that for each part of the network, e.g., the encoder, each subsequent hidden layer must have fewer neurons than the previous layer. The output of the encoder network is a compressed representation of the input data that captures its most important features. The decoder network takes this compressed representation and reconstructs the original data from it. Like the encoder network, the decoder network is typically composed of fully connected or convolutional layers with non-linear activation functions. The output of the decoder network is a reconstruction of the original input data that has been compressed and then decompressed. During training, the autoencoder tries to minimize the difference between the original input data and the reconstructed output data. This is done by optimizing a loss function such as mean squared error or binary cross-entropy. The parameters of the encoder and decoder networks are adjusted using backpropagation to minimize this loss function (Chalapathy and Chawla, 2019; Kucharski et al., 2020).

Figure 7. Schematic of an autoencoder network’s basic, symmetric architecture

The algorithm of an autoencoder in a neural network can be broken down into several steps, including:

▪ Encoding: The input data is fed into the encoder network, which compresses the data into a lower-dimensional representation. This is typically done using one or more fully-connected layers with a non-linear activation function such as ReLU.

▪ Decoding: The encoded data is then fed into the decoder network, which reconstructs the original input data. This is typically done using one or more fully-connected layers with a non-linear activation function such as ReLU.

▪ Loss Calculation: Compare the output of the autoencoder to the actual input data, and calculate the loss or error. This is typically done using a loss function such as mean squared error (MSE) or binary cross-entropy.

▪ Backpropagation and Weight Update: Propagate the error backwards through the network using backpropagation and update the weights of the connections between neurons using an optimization algorithm such as stochastic gradient descent (SGD) or Adam.

▪ Repeat: Repeat steps 1-4 for multiple iterations or epochs, until the network has learned to reconstruct the input data accurately.

▪ Prediction: Use the trained autoencoder to reconstruct the input data or to encode new data into a lower-dimensional representation.

The objective of an autoencoder is to learn a compressed representation of the input data that captures its most important features. The compressed representation can then be used for tasks such as data compression, data visualization, or feature extraction. Autoencoders can be further specialized for specific tasks, such as denoising autoencoders, variational autoencoders, or convolutional autoencoders, depending on the nature of the input data and the desired output.

3.5 Generative Adversarial Networks (GAN)

Generative Adversarial Networks (GANs) is proposed by Goodfellow et al. (2014), which uses a minimax game to train the generation model from the game theory perspective. These are a type of deep learning model that consists of two neural networks: a generator and a discriminator. The generator is responsible for generating new data that is similar to the training data, while the discriminator tries to distinguish between the generated data and the real data. Figure 8 illustrates the structure of GAN, which includes two networks; one is the generator G. The goal of G is to transform the noise variable z into the generated sample G(z), which learns the distribution of real data x. The other is discriminator D, whose goal is to distinguish whether a sample is real or generated. Both G and D implement non-linear mapping by using network structures, such as multi-layer perceptron (Bengio et al., 2013; Goodfellow et al., 2014).

Figure 8. Schematic of a Generative Adversarial Network (GAN) model

The basic idea behind GANs is to have the generator learn from the feedback provided by the discriminator. The generator starts by producing random noise as input, and then generates a sample of data that is similar to the training data. The discriminator is then fed both the generated sample and a sample of real data, and it is trained to classify which sample is real and which is fake. The generator is then updated based on the feedback provided by the discriminator, with the goal of generating data that is increasingly difficult for the discriminator to distinguish from the real data (Mirza and Osindero, 2014).

The algorithm of Generative Adversarial Networks (GANs) can be broken down into several steps, including:

▪ The generator network is trained to generate synthetic data from random noise input, typically using one or more fully-connected layers and a non-linear activation function such as ReLU or tanh.

▪ The discriminator network is trained to distinguish between real data and synthetic data generated by the generator network, typically using one or more fully-connected layers and a sigmoid activation function.

▪ The generator and discriminator networks are trained together in an adversarial process, where the generator network tries to generate synthetic data that can fool the discriminator network, and the discriminator network tries to accurately distinguish between real and synthetic data.

▪ Compare the output of the discriminator network to the actual labels (real or fake), and calculate the loss or error. This is typically done using a loss function such as binary cross-entropy.

▪ Propagate the error backwards through the discriminator network using backpropagation, and update the weights of the connections between neurons using an optimization algorithm such as stochastic gradient descent (SGD) or Adam. Then, propagate the error backwards through the generator network using backpropagation, and update the weights of the connections between neurons using the same optimization algorithm.

▪ Steps 3-5 are repeated for multiple iterations or epochs, until the generator network has learned to generate synthetic data that can fool the discriminator network.

▪ Use the trained generator network to generate new synthetic data.

The objective of GANs is to learn a probability distribution over the input data, so that the generator network can generate synthetic data that is similar to the real data. GANs can be used for a variety of tasks, such as image and video generation, data augmentation, and style transfer. However, training GANs can be challenging due to the instability of the adversarial training process, and several techniques such as batch normalization, gradient penalty, and Wasserstein distance have been proposed to improve the stability and convergence of GANs.

The recent research work that has deployed neural network algorithms to evolve into published literature are summarized in Table 1.

Table 1. Summary of published literature using different types of Neural Network algorithms

Year	Reference	Method	Comments
1994	Ghosh and Reilly	MLP	MLP resulted in 20% to 40% decrease in economic losses.
1997	He et al.	MLP	MLP had poor accuracy on its own, improved the results when clustering implemented prior to training
2002	Maes et al.	MLP	MLP trained on pre-processed dataset, adaptive learning rate proved to be beneficial.
2009	Wiese and Omlin	LSTM	LSTM outperformed SVMs as well as the MLP proposed by Maes et al.
2014	Khan et al.	MLP	Model achieves high detection rate at cost of increased false positives and is computationally expensive.
2016	Paula et al.	AE	AE was able to detect the fraudulent cases.
2016	Fu et al.	CNN	CNN achieved F-score of 0.33 and outperformed MLPs.
2017	Heryadi and Warnars	CNN-LSTM	CNN’n short-term and LSTM long-term abilities combined to capture temporal relations. AUC score of 77% achieved.
2017	Kazemi and Zarrabi	AE	AE accuracy of 81.6% achieved in the model.
2018	Wang et al.	MLP	Model addressed the problem of NNs overfitting with F-score of 98.4%.
2018	Jurgovsky et al.	LSTM	LSTM with feature aggregation strategy detected fraud behaviour.
2018	Pumsirirat and Yan	AE	AE performed poorly with small dataset.
2018	Chen et al.	SAE-GAN	GAN trained on SAE-learned features from majority class has improved F-score and precision, but with a decrease in recall.
2019	Tanaka and Aranha	GAN-DT	DT trained with minority class GAN-based oversampling had slightly higher precision, but lower recall than when using SMOTE or ADASYN.
2019	Fiore et al.	GAN-MLP	MLP trained with GAN-based oversampling had improved recall, and proposed model also outperformed SMOTE in terms of recall, but had slightly lower specificity.
2020	Charitou et al.	SAE-GAN	SAE features extracted from entire train set, then used to train generator of GAN to produce complementary samples.

4. Data and Methodology

The present analysis is based on exhibit history of United States international trade in goods and services. The foreign trade data (export of goods, export of services, import of goods and import of services) for the purpose of the analysis is drawn for the period 1992-2022 from United States Census Bureau (undated).

First, it is observed that at the composite level, U.S. exports and imports has witnessed an increasing trend over the past two decades. United nations trade flows have registered fairly strong growth over 1992-2008, accompanied by rising commodity prices. Post financial crises in 2008; trade fell steeply before rebounding strongly during 2010-2011 (WTO, 2015; Aggarwal, 2020; Nag et al., 2021; Aggarwal, 2023a, 2023b). After witnessing moderate growth over 2012-2014, an economic downturn is observed in 2015-2016, and thereafter a strong rebound is marked in 2017-2018 (UNCTAD, 2019; Aggarwal, 2020). Following the COVID-19 crises, a major slowdown impacted the U.S economy drastically in 2020 (UNCTAD, 2022), finally a remarkable progress in U.S. trade with world economy has been observed in 2021. Figure 9 reports the year-wise trend of U.S. trade during the period 1992-2022. Exports of goods has witnessed high growth in its value from US \$439 billion to US \$2085 billion, import of goods have been marked with ever higher growth in its value from US \$536 billion to US \$3277 billion, whereas export of services has increased from US \$177 billion to US \$924 billion and import of services has increased from US \$119 billion to US \$680 billion respectively in 1992 and 2022.

Second, the input data is standardized (z-score normalized) using ‘scikit-learn’ Python library before applying a machine learning algorithm. To standardize the variables, the mean of each variable is subtracted from each observation and then divided by the standard deviation of that variable (Annexure 1). This centers the variables around zero and scales them to have unit variance. The standardized variables can be represented as:

$Z=\frac{(X-{mean}(X))}{{std}\ {dev}(X)}$

where: X is the original variable, mean(X) is the mean of X, and std dev(X) is the standard deviation of X. The resulting scores have a mean of 0 and a standard deviation of 1.

Third, LSTM-based anomaly detection is applied to the dataset. In this method, we have considered a time series $X=\left\{X^{(1)}, X^{(2)}, \ldots, X^{(n)}\right\}$, where each point $X^{(t)} \in \mathrm{R}^m$ in the time series is an m-dimensional vector $\left\{X_1{ }^{(t)}\right.$, $\left.x_2{ }^{(t)}, \ldots, x_m{ }^{(t)}\right\}$, whose elements correspond to the input variables. A prediction model learns to predict the next I values for d of the input variables s.t. $I \leq \mathrm{d} \leq \mathrm{m}$. The normal sequence(s) are divided into four sets: normal train $\left(\mathrm{s}_{\mathrm{N}}\right)$, normal validation $-1\left(\mathrm{v}_{\mathrm{N} 1}\right)$, normal validation $-2\left(\mathrm{v}_{\mathrm{N} 2}\right)$, and normal test $\left(\mathrm{v}_{\mathrm{A}}\right)$. We first learn a prediction model using stacked LSTM networks and then compute the prediction error distribution for detecting anomalies in the rade dataset.

Figure 9. United States trade with ROW (1992-2022)

4.1 Long Short-Term Memory (LSTM) based Prediction Model

We have considered the following LSTM network architecture. We take one unit in the input layer for each of the m dimensions, d*lunits in the output layer such that there is one unit for each of the l future predictions for each of the d dimensions. The LSTM units in the hidden layer are fully connected through recurrent connections. We have stacked LSTM layers such that each unit in a lower LSTM hidden layer is fully connected to each unit in the LSTM layer above it through feedforward connections. The prediction model is learned using the sequence(s) in sN. The set vN1 is used for early stopping while learning the network weights (Malhotra et al., 2015).

Anomaly detection using the prediction error distribution: With a prediction length of $I$, each of the selected dimensions of $\mathrm{x}^{(\mathrm{t})} \in \mathrm{X}$ for $\mathrm{I}<\mathrm{t} \leq \mathrm{n}-I$ is predicted I times. We compute an error vector $\mathrm{e}^{(\mathrm{t})}$ for point $\mathrm{x}^{(\mathrm{t})}$ as $\mathrm{e}^{(\mathrm{t})}=$ $\left[e_{11}{ }^{(t)}, \ldots . ., e_{1I} f^{(t)}, \ldots . ., e_{d 1} f^{(t)}, \ldots . ., e_{dI} f^{(t)}\right]$, where $e_{i j}{ }^{(t)}$ is the difference between and $x_i{ }^{(t)}$ its value as predicted at time $t-j$.

Anomaly score from the Multivariate Gaussian Distribution model is calculated as:

Anamoly Score $=(\mathrm{e}-\mu) \Sigma^{-1}(\mathrm{e}-\mu)^T$.

Precision is calculated as the ratio of true positives (the number of correctly identified anomalies) to the total number of positives (the number of all data points identified as anomalies, including false positives). A high precision score indicates that the algorithm is accurately identifying a high proportion of true anomalies while minimizing false positives. Precision is defined as:

Precision $=\frac{ { True\ Positives }\ (T P)}{ { True\ Posiitves }\ (T P)+ { False\ Positive }\ (F P)}$.

Recall is the ratio of true positives to the sum of true positives and false negatives. In the context of anomaly detection, a true positive refers to an actual anomaly that has been correctly identified by the anomaly detection algorithm, while a false negative refers to an anomaly that has been missed by the algorithm and classified as normal. Mathematically, recall is defined as:

Recall $=\frac{ { True\ Positives }\ (T P)}{ { True\ Posiitves }\ (T P)+ { False\ Negative }\ (F N)}$.

High recall is desirable in anomaly detection because it indicates that the algorithm is correctly identifying most of the anomalies in the dataset. Anomalies can be important events that require immediate attention or investigation, such as a fraudulent transaction or a network intrusion, and missing them can have serious consequences (Agarwal et al., 2017). Therefore, it is crucial to ensure that the anomaly detection algorithm has a high recall to minimize the risk of missing important anomalies.

The F1 score is a commonly used performance metric in anomaly detection that balances the trade-off between precision and recall. It is the harmonic mean of precision and recall and is calculated as:

F 1 score $=\frac{2 ({Precision } { Recall })}{({ Precision }+ { Recall })}$.

The F1 score ranges between 0 and 1, where a score of 1 indicates perfect precision and recall, and a score of 0 indicates poor performance. In anomaly detection, a high F1 score indicates that the algorithm is both accurately identifying true anomalies and minimizing the number of false positives. However, it's important to note that optimizing for F1 score alone may not always be the best approach, as it depends on the specific use case and the consequences of missing or misclassifying anomalies. Sometimes, optimizing for high recall or high precision may be more important than optimizing for F1 score. Therefore, it's important to evaluate the performance of an anomaly detection algorithm using multiple metrics, including precision, recall, F1 score, and other relevant metrics, and select the best approach based on the specific requirements and constraints of the application.

The current analysis uses monthly times series data for each of the variables (i.e., export of goods, export of services, import of goods and import of services) for the period 1992-2022 where each point in the time series in an m dimensional vector whose elements correspond to the input variables. This implies that a total of 372 points exists in the dataset. Thereafter, LSTM based prediction model is applied where we have first trained the model with a trainset which contains no anomalies, then we have used the trained model to detect anomalies in a test set, where anomalies are included. highlights the steep fall in trade values i.e., the economic slowdown, which is primarily witnessed in the period 2008-2009 and 2020 due to onset of economic recession and COVID-19 respectively. The model is trained with 100 epochs, weight decay of 0.0001 and window size of 3.

5. Research Results

The LSTM based prediction model is trained for 100 epochs. When the model started to learn in epoch 1, the validation loss was 1.8277. At epoch 20, the validation loss was 0.7056 and at epoch 100, the validation loss was reduced to 0.46. This can also be visualized from Figure 10, Figure 11 and Figure 12. Figure 10 depicts that model has not learned the trend of export of goods, export of services, import of goods and import of services, therefore the prediction (marked by green line) is not accurate. Figure 11 depicts that model have learned some aspects of the growth trend of export of trade data, leading to better predictions. Figure 12 depicts that model fully learned the trend of export of goods, export of services, import of goods and import of services, resulting in the accurate prediction of trade data.

Once the model has trained the time series data (with 100 epochs) for export and import of goods and services without anomalies (excluding data-points [197,232], [337,350]). The next step is to use this trained model to detect anomalies in the test set for export of goods, export of services, import of goods and services. The threshold value of anomaly score detected by the algorithm is 60. The first anomaly is detected near point 200 in the below graph (Figure 13 – Figure 17) which is marked by the period of 2008 economic recession, the second anomaly is detected near point 340 which is the period of economic downturn i.e., challenges posed by COVID-19. Interestingly, it can be observed that anomalies are detected accurately for export of goods in Figure 13, export of services in Figure 14, import of goods in Figure 15 and import of services in Figure 16. Therefore, stacked LSTM network architecture has successfully captured the anomalies in the trade dataset.

Figure 10. Prediction after Epoch 1

Figure 11. Prediction after Epoch 20

Figure 12. Prediction after Epoch 100

Figure 13. Anomaly detection for export of goods

Figure 14. Anomaly detection for export of services

Figure 15. Anomaly detection for import of goods

Figure 16. Anomaly detection for import of services

In anomaly detection, a marginal decrease in recall can be worse than a greater increase in precision because it can mean missing a significant number of true anomalies. As mentioned earlier, recall is the proportion of true anomalies that the anomaly detection algorithm identifies correctly. If the algorithm's recall is already high, even a small decrease can result in missing a considerable number of anomalies. Figure 17 depicts the recall value of 0.8 at threshold anomaly score of 60.

Figure 17. Precision and Recall trade-off for threshold anomaly score

6. Conclusion

The analysis demonstrate that stacked LSTM network is able to learn higher-level temporal patterns without prior knowledge of the pattern duration and so stacked LSTM networks may be a viable technique to model normal time series behaviour, which can then be used to detect anomalies. The LSTM-AD approach deployed in the current analysis yields promising results on real-world international trade datasets which involve modelling small-term as well as long-term temporal dependencies. LSTM-AD provide better or similar results when compared with RNN-AD suggesting that LSTM based prediction models may be more robust in nature compared to RNN based models, especially when the long-term dependencies of the normal time-series behaviour is unknown.

In this paper, we have deployed LSTM network architecture to detect anomalies in United States trade dataset. The model is trained with 100 epochs on the time series dataset where each point x(t) € R m in the time series is an m-dimensional vector (export and import of goods and services) whose elements correspond to the input variables. Our learned prediction model using stacked LSTM networks can compute the prediction error distribution for detecting anomalies in the trade dataset and hence calculates the anomaly score that represents the unusual data-points in comparison to normal data. The algorithm flagged the data points with the anomaly score beyond the threshold level and after the research investigation, it was found that the anomalous data-points could be attributed to the period of economic downturn (i.e., financial crises in 2008 and COVID-19 in 2020).

Future research may focus on the introduction of more trade-specific dimensions in the LSTM framework to detect anomalies in the real-world datasets. Also, the robustness of the model can be increased by incorporating other advanced economies as well.

Informed Consent Statement

The author contributed to formulating research goals, collected, and compiled data from secondary sources to investigate it in a meaningful way. The methodology and formal analysis of the research was developed later by the author, followed by data investigation and validation through ML visualization techniques.

Acknowledgments

The author would like to thank eminent researchers across the globe who has worked in the domain anomaly detection. The work of these professionals has helped author in framing the current research idea.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Agarwal, A., Dawson, S., McKee, D., Eugster, P., Tancreti, M., & Sundaram, V. (2017). Detecting Abnormalities in IoT Program Executions through Control-Flow-Based Features. Proceedings of the Second International Conference on Internet-of-Things Design and Implementation, 339–340. [Google Scholar] [Crossref]

Aggarwal, C. C. (2013). Outlier Analysis. Springer New York. [Google Scholar] [Crossref]

Aggarwal, S. (2016). Determinants of money demand for India in presence of structural break: An empirical analysis. Business and Economic Horizons (BEH), Prague Development Center (PRADEC), 12(4), 173–177. [Google Scholar] [Crossref]

Aggarwal, S. (2017). Smile curve and its linkages with global value chains. Journal of Economic Bibliography, 4(3). [Google Scholar] [Crossref]

Aggarwal, S. (2017). Sectoral Level Analysis of India’s Bilateral Trade over 2001-2015 [MPRA Paper No 80099, University Library of Munich, Germany]. https://mpra.ub.uni-muenchen.de/80099/ [Google Scholar]

Aggarwal, S. (2020). Determinants of Intra-Industry Trade and Labour Market Adjustment: A Sectoral Analysis for India. Doctoral dissertation, Indian Institute of Foreign Trade. [Google Scholar]

Aggarwal, S. (2023). Machine Learning Algorithms, Perspectives, and Real-World Application: Empirical Evidence from United States Trade Data. International Journal of Science and Research, 12(2), 292–313. [Google Scholar]

Aggarwal, S. (2023). The empirical measurement and determinants of intra-industry trade for a developing country. MPRA Paper No. 117112, University Library of Munich, Germany. [Google Scholar]

Aggarwal, S. & Chakraborty, D. (2017). Determinants of India’s Bilateral Intra-industry Trade over 2001–2015. South Asia Economic Journal, 18(2), 296–313. [Google Scholar] [Crossref]

Aggarwal, S. & Chakraborty, D. (2019). Which factors influence India’s intra-industry trade? Empirical findings for select sectors. Global Business Review, 23(3), 729–755. [Google Scholar] [Crossref]

Aggarwal, S. & Chakraborty, D. (2020). Labour market adjustment and intra-industry trade: Empirical results from Indian manufacturing sectors. Journal of South Asian Development, 15(2), 238–269. [Google Scholar] [Crossref]

Aggarwal, S. & Chakraborty, D. (2020). Determinants of vertical intra-industry trade: Empirical evidence from Indian manufacturing sectors. Prajnan: Journal of Social and Management Sciences, 49(3), 221–252. [Google Scholar]

Aggarwal, S. & Chakraborty, D. (2020). Is there any relationship between Marginal Intra-Industry Trade and Employment Change? Evidence from Indian Industries. Working Paper, No. EC-20-44, Institute of Foreign Trade, Delhi. [Google Scholar]

Aggarwal, S., Chakraborty, D., & Bhattacharyya, R. (2021). Determinants of Domestic Value Added in Exports: Empirical Evidence from India’s Manufacturing Sectors. Global Business Review. [Google Scholar] [Crossref]

Aggarwal, S. & Chakraborty, D. (2021). Which factors influence vertical intra-industry trade in India? Empirical results from panel data analysis. Working Paper, No. EC-21-54, Indian Institute of Foreign Trade, Delhi. http://cc.iift.ac.in/research/Docs/WP/EC-21-54.pdf [Google Scholar]

Aggarwal, S. & Chakraborty, D. (2022). Which Factors Influence India’s Bilateral Intra-Industry Trade? Cross-Country Empirical Estimates. Working Papers 2260, Indian Institute of Foreign Trade, Delhi. [Google Scholar]

Aggarwal, S., Mondal, S., & Chakraborty, D. (2022). Efficiency Gain in Indian Manufacturing Sectors: Evidence from Domestic Value Addition in Exports. Empirical Economics Letters, 21(1), 69–83. [Google Scholar]

Aggarwal, S., Chakraborty, D., & Banik, N. (2022). Does Difference in Environmental Standard Influence India’s Bilateral IIT Flows? Evidence from GMM Results. Journal of Emerging Market Finance, 22(1), 7–30. [Google Scholar] [Crossref]

Ahmed, M., Mahmood, A. N., & Hu, J. (2014). Outlier Detection. In The State of the Art in Intrusion Prevention and Detection (Vol. 44, pp. 3–21). New York, USA: CRC Press. [Google Scholar] [Crossref]

Albelwi, S. & Mahmood, A. (2017). A Framework for Designing the Architectures of Deep Convolutional Neural Networks. Entropy, 19(6), 242. [Google Scholar] [Crossref]

Anderson, J. E. (1979). A theoretical foundation for the gravity equation. The American Economic Review, 69(1), 106–116. [Google Scholar]

Barnett, V. & Lewis, T. (n.d.). Outliers in Statistical Data. John Wiley and sons, New York. [Google Scholar] [Crossref]

Batarseh, F., Gopinath, M., Nalluru, G., & Beckman, J. (2019). Application of Machine Learning in Forecasting International Trade Trends. [Google Scholar] [Crossref]

Beckman, R. J. & Cook, R. D. (1983). Outlier … … …. s. Technometrics, 25(2), 119–149. [Google Scholar] [Crossref]

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8), 1798–1828. [Google Scholar] [Crossref]

Bhattacharya, K., Mukherjee, G., & Manna, S. S. (2007). The International Trade Network. In A. Chatterjee & B. K. Chakrabarti (Eds.), Econophysics of Markets and Business Networks. New Economic Windows. Springer, Milano. [Google Scholar] [Crossref]

Boukerche, A., Zheng, L., & Alfandi, O. (2020). Outlier Detection. ACM Comput. Surv., 53(3), 1–37. [Google Scholar] [Crossref]

Callado, A., Kamienski, C., Szabo, G., Gero, B. P., Kelner, J., Fernandes, S., & Sadok, D. (2009). A Survey on Internet Traffic Identification. IEEE Commun. Surv. Tutorials, 11(3), 37–52. [Google Scholar] [Crossref]

Chalapathy, R. & Chawla, S. (2019). Deep Learning for Anomaly Detection: A Survey. [Google Scholar] [Crossref]

Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection. ACM Comput. Surv., 41(3), 1–58. [Google Scholar] [Crossref]

Charitou, C., Garcez, A. D. A., & Dragicevic, S. (2020). Semi-supervised GANs for Fraud Detection. International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 1–8. [Google Scholar] [Crossref]

Teng, H. S., Chen, K., & Lu, S. C. (1990). Adaptive real-time anomaly detection using inductively generated sequential patterns. In Proceedings. 1990 IEEE Computer Society Symposium on Research in Security and Privacy (Vol. 1990, pp. 278–284). https://www.cs.unc.edu/~jeffay/courses/nidsS05/ai/Teng-AdaptiveRTAnomaly-SnP90.pdf [Google Scholar]

Chen, J., Shen, Y., & Ali, R. (2018). Credit card fraud detection using sparse autoencoder and generative adversarial network. In IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) (pp. 1054–1059). https://ieeexplore.ieee.org/document/8614815 [Google Scholar]

Fiore, U., De Santis, A., Perla, F., Zanetti, P., & Palmieri, F. (2019). Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 479, 448–455. [Google Scholar] [Crossref]

Fu, K., Cheng, D., Tu, Y., & Zhang, L. (2016). Credit Card Fraud Detection Using Convolutional Neural Networks. In A. Hirose, S. Ozawa, K. Doya, K. Ikeda, M. Lee, & D. Liu (Eds.), Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science (Vol. 9949). Springer, Cham. [Google Scholar] [Crossref]

García-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., & Vázquez, E. (2009). Anomaly-based network intrusion detection: Techniques, systems and challenges. Computers & Security, 28(1–2), 18–28. [Google Scholar] [Crossref]

Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471. [Google Scholar] [Crossref]

Ghoddusi, H., Creamer, G. G., & Rafizadeh, N. (2019). Machine learning in energy economics and finance: A review. Energy Economics, 81, 709–727. [Google Scholar] [Crossref]

Ghosh, S. & Reilly, D. L. (1994). Credit card fraud detection with a neural-network. In System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference, (Vol. 3, pp. 621–630). IEEE. https://ieeexplore.ieee.org/abstract/document/323314 [Google Scholar]

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., ..., & Bengio, Y. (2014). Generative Adversarial Networks. [Google Scholar] [Crossref]

Gopinath, M., Batarseh, F. A., & Beckman, J. (2020). Machine learning in gravity models: An application to agricultural trade (No. w27151). National Bureau of Economic Research. https://www.nber.org/system/files/ [Google Scholar]

Gupta, M., Gao, J., Aggarwal, C. C., & Han, J. (2013). Outlier detection for temporal data: A survey. Transactions on Knowledge and Data Engineering, 26(9), 2250–2267. [Google Scholar]

Hawkins, D. M. (1980). Identification of Outliers (Vol. 11). Heidelberg, Germany: Springer. [Google Scholar]

He, H., Wang, J., Graco, W., & Hawkins, S. (1997). Application of neural networks to detection of medical fraud. Expert Systems with Applications, 13(4), 329–336. [Google Scholar] [Crossref]

Heryadi, Y. & Warnars, H. L. H. S. (2017). Learning temporal representation of transaction amount for fraudulent transaction recognition using CNN, Stacked LSTM, and CNN-LSTM. In 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom) (pp. 84–89). https://ieeexplore.ieee.org/document/8311689 [Google Scholar]

Hochreiter, S. & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. [Google Scholar] [Crossref]

Hodge, V. & Austin, J. (2004). Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22, 85–126. [Google Scholar] [Crossref]

Javaid, A., Niyaz, Q., Sun, W., & Alam, M. (2016). A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS). [Google Scholar] [Crossref]

Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., He-Guelton, L., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234–245. [Google Scholar] [Crossref]

Kazemi, Z. & Zarrabi, H. (2017). Using deep networks for fraud detection in the credit card transactions. In IEEE 4th International Conference on Knowledge-based Engineering and Innovation (KBEI), Tehran, Iran (pp. 0630–0633). https://ieeexplore.ieee.org/document/8324876 [Google Scholar]

Khan, A. U. S., Akhtar, N., & Qureshi, M. N. (2014). Real-time credit-card fraud detection using artificial neural network tuned by simulated annealing algorithm. In Proceedings of International Conference on Recent Trends in Information, Telecommunication and Computing, ITC (pp. 113–121). [Google Scholar]

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. In 25th Advances in Neural Information Processing Systems, Grenada, Spain. [Google Scholar] [Crossref]

Kucharski, D., Kleczek, P., Jaworek-Korjakowska, J., Dyduch, G., & Gorgon, M. (2020). Semi-Supervised Nests of Melanocytes Segmentation Method Using Convolutional Autoencoders. Sensors, 20(6), 1546. [Google Scholar] [Crossref]

Markou, M. & Singh, S. (2003). Novelty detection: a review—part 1: statistical approaches. Signal Processing, 83(12), 2481–2497. [Crossref]

Markou, M. & Singh, S. (2003). Novelty detection: a review—part 2: Signal Processing, 83(12), 2499–2521. [Crossref]

Michelucci, U. (2018). Feedforward Neural Networks. In Applied Deep Learning (pp. 83–136). Apress. [Crossref]

Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99, 215–249. [Crossref]

Mirza, M. & Osindero, S. (2014). Conditional Generative Adversarial Nets (Version 1). arXiv. [Crossref]

Monken, A., Haberkorn, F., Gopinath, M., Freeman, L., & Batarseh, F. A. (2021). Graph Neural Networks for Modeling Causality in International Trade. FLAIRS, 34(1). [Google Scholar] [Crossref]

O’Shea, K. & Nash, R. (2015). An Introduction to Convolutional Neural Networks (Version 2). arXiv. [Crossref]

Patcha, A. & Park, J.-M. (2007). An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer Networks, 51(12), 3448–3470. [Google Scholar] [Crossref]

Peng, H.-K. & Marculescu, R. (2015). Multi-Scale Compositionality: Identifying the Compositional Structures of Social Dynamics Using Deep Learning. PLoS ONE, 10(4), e0118309. [Google Scholar] [Crossref]

Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99, 215–249. [Google Scholar] [Crossref]

Pourhabibi, T., Ong, K.-L., Kam, B. H., & Boo, Y. L. (2020). Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decision Support Systems, 133, 113303. [Google Scholar] [Crossref]

Pumsirirat, A. & Yan, L. (2018). Credit Card Fraud Detection using Deep Learning based on Auto-Encoder and Restricted Boltzmann Machine. Ijacsa, 9(1). [Google Scholar] [Crossref]

Frontmatter. (1987). In Wiley Series in Probability and Statistics. Wiley. [Crossref]

Sun, B., Wu, K., Xiao, Y., & Wang, R. (2006). Integration of mobility and intrusion detection for wirelessad hocnetworks. Int J Communication, 20(6), 695–721. [Crossref]

Tanaka, F. H. K. dos S. & Aranha, C. (2019). Data Augmentation Using GANs (Version 1). arXiv. [Crossref]

Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y., & Yang, Y. (2020). adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. Knowledge-Based Systems, 190, 105187. [Google Scholar] [Crossref]

Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31(7), 1235–1270. [Crossref]

Zimek, A., Schubert, E., & Kriegel, H. (2012). A survey on unsupervised outlier detection in high‐dimensional numerical data. Statistical Analysis, 5(5), 363–387. [Crossref]

Cite this:

APA Style

IEEE Style

BibTex Style

MLA Style

Chicago Style

GB-T-7714-2015

Aggarwal, S. (2023). Research on Anomaly Detection in Time Series: Exploring United States Exports and Imports Using Long Short-Term Memory. J. Res. Innov. Technol., 2(2), 200-225. https://doi.org/10.57017/jorit.v2.2(4).06