In the fast-paced digital economy era, finding useful information from massive and complex data is crucial. Recommendation systems have emerged as effective tools to overcome information overload and are widely used in online platforms. Traditional recommendation systems use collaborative filtering to mine users' interests from their historical behaviors and then predict the products they might like. However, long-term historical behaviors may overshadow users' current interests during a session, resulting in poor user experience. To address this issue, session-based recommendation systems have been developed, which aim to predict users' behavior during the current session. Session-based recommendation systems have been widely adopted by various e-commerce platforms, such as T-mall and Amazon.
In the field of recommendation systems, there are two main categories of traditional session-based recommendation algorithms: collaborative filtering and Markov chain-based algorithms. Collaborative filtering [1] typically utilizes user or item similarity to make recommendations. In contrast, the Markov chain-based recommendation algorithm [2] treats session-based recommendation as a sequence prediction task and predicts the next item based on the sequence relationship between items. However, it is important to note that the Markov chain-based approach faces the challenge of dimension explosion, which makes it difficult to apply in actual production scenarios. This limitation arises because the number of possible item combinations can increase exponentially as the sequence length grows, leading to a computational explosion. As a result, it becomes impractical to compute the probabilities for all possible combinations of items.
With the development of deep learning technology, many researchers have applied neural network technology in recommendation systems. Several session-based recommendation models based on deep learning have been proposed, such as GRU4REC [3], Tan et al. [4], NARM [5], STAMP [6], SR-GNN [7], HA-GNN [8], TPA-GNN [9], GCE-GNN [10], DIDN [11], and CORE [12]. These models have achieved good performance in recommendations; however, they have some limitations that can be improved.
GRU4REC uses multi-layer GRU to model the whole session and capture the continuous preferences of users. However, it only captures the session relationship between one-way adjacent items and may not capture long-term dependencies between items.
NARM combines the attention mechanism with RNN to capture users' interests. However, it also suffers from the limitation of capturing long-term dependencies between items.
STAMP captures users' long-term interests by using the multi-layer perceptron and the attention mechanism, and combines users' short-term preferences for recommendations. However, it still lacks the ability to capture the influence of popularity on user interest.
SR-GNN uses graph neural networks to learn the transformation relationship between items and capture the high-order relationship in graphs. However, it still uses the attention mechanism to filter out noise and may not effectively capture the influence of popularity on user interest.
HA-GNN captures the high-order relationship in graphs by using the attention mechanism. However, it does not explicitly consider the influence of popularity on user interest.
TPA-GNN combines time series with graph neural networks to capture users' interests. However, it still lacks the ability to effectively capture the influence of popularity on user interest.
GCE-GNN uses the features of the session graph and the global graph to learn items. However, it also suffers from the limitation of capturing long-term dependencies between items.
DIDN incorporates dynamic intent-aware and iterative denoising modules to learn dynamic item embeddings and filter out noisy clicks within sessions. However, it still lacks the ability to capture the influence of popularity on user interest.
CORE unifies the representation space for both the encoding and decoding processes in session-based recommendation to address the issue of inconsistent predictions. However, it also does not explicitly consider the influence of popularity on user interest.
To address the aforementioned issues, we propose a novel model called the Popularity-Aware Machine for Session-Based Recommendations (PASR). Figure 1 illustrates the workflow of the proposed PASR method. Firstly, PASR constructs a popularity-aware item graph, which effectively captures users' preferences for popular items. Secondly, PASR aggregates the features of neighboring nodes based on the type and frequency of edges in the graph, which enhances the model's ability to capture the dependencies between items. Finally, popularity embeddings are integrated into the attention mechanism to learn users' interests and improve the accuracy of recommendations. The main contributions of this work are as follows:
(1) The PASR model uses the number of edge occurrences to learn item features, which captures dependencies between items for the first time.
(2) The PASR model learns the user's interest by considering the popularity of items to reflect their importance in the session.
(3) Experiments were conducted on two widely-used datasets (Tmall dataset and Nowplaying dataset). The results of the experiments demonstrate that our model exhibits excellent performance.
A. Problem description
In a session-based recommendation system, we define the item set $V=\left\{v_1, v_2, \ldots, v_m\right\}$, where m is the total number of items, and the session sequence $s=\left[v_1, v_2, \ldots, v_t\right]$, where t is the length of the session. The primary objective of a session-based recommendation system is to predict the user's next click behavior based on the item sequence. For example, the next clicked item $v_{t+1}$ is predicted according to $s=\left[v_1, v_2, \ldots, v_t\right]$, the session-based recommendation model outputs n candidate items that may interact according to the current interaction sequence.
B. Graph construction
To aggregate the representations of nodes in the graph, the session sequence is transformed into a graph using a graph neural network. For any given session sequence s, a directed graph can be constructed $G_s=\left\{V_s, E_s\right\}$, where Vs represents the node set of the session, E_{s} represents the edge set of session s , which includes item $v_{n-1}$ and item $v_n$, and $\left\{v_{n-1}, v_n\right\} \in E_S$.
C. Graph aggregator
For session s, we defines the embedding of each item as $H=\left\{h_{v_1}, h_{v_2}, \ldots, h_{v_m}\right\}$, where $h_{v_i}$ refers to the unique hot code of the item $v_i(1 \leq i \leq t)$, m is the number of unique items in session s. Many works [13], [14] have proved that self-loops in graphs are beneficial to feature learning. We add self-connections for each node in the graph. For any item v_{n}, there are 4 different types of edges:
e_{self}: It represents the self-connection of the item.
e_{out}: It represents the edge from item v_{n} to item $\mathrm{V}_{\mathrm{n}+1}$.
e_{in}: It represents the edge from item v_{n} to item $V_{n-1}$.
e_{in-out}: It indicates that there are edges from item $\mathrm{V}_{\mathrm{n}+1}$ to item v_{n} and from item v_{n} to item $V_{n-1}$.
Constructing a graph-based model with these edges allows the model to capture both the local and global structure of the session and learn features that are specific to each item, as well as the relationships between items. This, in turn, can lead to more accurate and relevant recommendations for the user.
Moreover, different edges appear in the dataset with varying frequencies, and frequently occurring edges often indicate common browsing habits. By segmenting the appearance times of edges and training different weight vectors for each segment, we can effectively capture the influence of edge frequency on feature aggregation in graph modeling. Specifically, we divide the number of occurrences of edges by a multiple of 10, and consider edges with more than 100 occurrences as a rare interval. By incorporating edge frequency into the weight vector training process, we can improve the accuracy and effectiveness of the PASR model in session-based recommendation tasks.
In the graph, the importance of item neighbors is reflected in the weight of edges. We believe that the weight is influenced by the type and number of occurrences of edges. Therefore, we define the following function to aggregate the features of neighbors:
where, $h_{v_i}$ represents the features of the target node, $h_{v_j}$ represents the features of the neighbor, $e^{\text {type }}$ indicates the type of edge, $e^{\text {num }}$ represents the interval that the number of occurrences of the edge.
To capture the different types and frequencies of edges in the graph, we use a learnable embedding matrix $\mathrm{E}^{\text {type }}=\left[\mathrm{a}_1^{\text {type }}, \mathrm{a}_2^{\text {type }}, \mathrm{a}_3^{\text {type }}, \mathrm{a}_4^{\text {type }}\right]$, where each element represents the features of $\mathrm{e}_{\text {self }}, \mathrm{e}_{\text {out }}, \mathrm{e}_{\mathrm{in}}, \mathrm{e}_{\text {in-out }}$. In addition, we use the learnable embedding matrix $\mathrm{E}^{\text {num }}=\left[\mathrm{a}_1^{\text {num }}, \mathrm{a}_2^{\text {num }}, \ldots, \mathrm{a}_n^{\text {num }}\right]$, where a_{i} represents the features of the edge in partition i. To capture the dependencies between items and improve the accuracy of recommendations, the PASR model learns to assign appropriate weights to each edge based on its type and frequency, by utilizing the aforementioned embeddings. We use the following method to learn the edge weights:
where, $\mathrm{W}_1, \mathrm{~W}_2 \in \mathrm{R}^{\mathrm{d}}$ are weight, $\mathrm{b}_1 \in \mathrm{R}^{\mathrm{d}}$ is a bias, $a^{\text {type }}, a^{\text {num }} \in \mathrm{R}^{\mathrm{d}}$ are features of edges, $\gamma$ is sigmoid or tanh activation function. $\mathrm{e}_{\mathrm{ij}}$ is the weight of the learned edge. $N_{v_i}^s=\left\{\mathrm{v}_{\mathrm{j}} \mid \mathrm{v}_{\mathrm{i}}, \mathrm{v}_{\mathrm{j}} \in \mathrm{V}_{\mathrm{s}} ; \mathrm{j}=\mathrm{i} \pm 1\right\}$ represents the neighbor set of item v_{i}.
Finally, the features of each node can be updated to:
D. Interest encoder
To learn the user's interest based on the item representation learned by the graph aggregator, we use an attention mechanism. Unlike previous models, we focus on the popularity of each item to better understand the contribution of each item to the user's interest. To achieve this, we incorporate a learnable popularity embedding matrix $P_e=\left[p_1, p_2, \ldots, p_n\right]$, where $e_i \in R^d$ represents the popularity embedding of item i, based on the number of occurrences.
By incorporating popularity into the attention mechanism, the model can better capture the diversity of user interests and make more accurate recommendations. We integrate the popularity vector into the calculation process of the attention mechanism, as follows:
where, $Q \in R^d$, $\sigma$ is activation function, $W_2 \in R^{d \times d}, W_3 \in R^{d \times 2 d}$ are weight matrix, h_{a }is the average embedded in the session.
E. Prediction layer
Once the user's interest H^{s} has been obtained, the model calculates the dot product of the embedding and the interest of each candidate item. Subsequently, the softmax normalization is applied to obtain the probability of the user clicking on each item next time.
To learn the parameters of the model, the cross-entropy loss function is used, and the backpropagation algorithm is applied to train the model.
where, $y_i$ is the unique hot code of the session tag item.
The experiment aims to demonstrate the effectiveness of the PASR model by addressing the following two questions:
Q1: Does the PASR model outperform the latest baselines?
Q2: Is the popularity-aware mechanism of the PASR model effective?
A. Experimental setup
Dataset | Tmall | Nowplaying |
# click | 818,479 | 1,367,963 |
# train | 351,268 | 825,304 |
# test | 25,898 | 89,824 |
# items | 40,728 | 60,417 |
avg. len. | 6.69 | 7.42 |
Dataset: We employ two classic e-commerce datasets, namely the Tmall (https://tianchi.aliyun.com/dataset/dataDetail?dataId=42) and Nowplaying (http://dbis-nowplaying.uibk.ac.at/#nowplaying) datasets. To ensure the validity of the model, we adopt the same processing method as [6], [7], [10]. Prior to the experiment, we filter out sessions with a length of 2, or items that appear less than 5 times. We also filter out sessions with more than 20 reverse positions, as long-distance projects have little impact on the prediction results of the model but can slow down the running speed of the model.
For the Tmall dataset, we use click data from the last 100 seconds as the test set, and the remaining data as the training set. For the Nowplaying dataset, we use data from the last two months as the test set, and the remaining data as the training set. Additionally, we employ data augmentation techniques by generating a series of sessions and corresponding tags for the session $s=\left[v_1, v_2, \ldots v_l\right]$. These include $\left(\left[v_1, v_2, \ldots, v_{l-1}\right], v_l\right)$, $\left(\left[v_1, v_2, \ldots, v_{l-2}\right], v_{l-1}, \ldots,\left(\left[v_1\right], v_2\right)\right.$. Such methods expand the training data and ensure the model's parameters are sufficiently trained.
The statistics of the datasets after preprocessing are summarized in Table 1.
Consistent with previous work, we adopt two commonly used evaluation indicators: P@20 and MRR@20.
P@20 measures the proportion of recommended items that are predicted correctly among the top 20 recommendation results. Its calculation formula is:
The evaluation metric MRR@20 is defined as the reciprocal of the position of the highest ranked correct recommended item in the top 20 recommendation results.
where, M is the number of items that the user is actually interested in among the top 20 recommended items.
MRR@20 is calculated using the following formula:
where, Rank(i) represents the order of labels in session i, and Reci(i) represents the reciprocal of the rank. Reci(i) is assigned a value of 0 if the rank is greater than 20.
These two indicators respectively reflect the accuracy of the model and the ranking of the tags in the candidate items. The larger the value of two indicators, the better the recommendation performance.
Parameters: We set the dimension of the hidden layer to 100 and the number of multiple attention heads to 5. The minimum batch size is set to 100. All weight matrices and embedding layers are initialized with a Gaussian distribution with a mean of 0 and a variance of 0.01. The initial value of all biases is set to 0.
To optimize the model, we use the Adam optimizer with an initial learning rate of 0.001 and an attenuation value of 1 every three epochs. The L2 penalty item is set to 10^{-5}. These parameters are chosen based on previous studies and empirical experiments to achieve the best performance of the PASR model on the given datasets.
B. Baselines model
To evaluate the performance of the proposed PASR model, we compare it with the following latest baseline models:
(1) POP: This model recommends the most popular items in the training set.
(2) Item-KNN [15]: This model uses cosine similarity between items for recommendation.
(3) FMPC [16]: In this model, the Markov chain is used for recommendation.
(4) GRU4REC [17]: This model uses GRU to learn the user's final interest.
(5) NARM [18]: This model combines GRU and attention mechanism for recommendations.
(6) STAMP [6]: This model uses the soft attention mechanism to learn the user's long-term interest and then combines the user's short-term interest for recommendations.
(7) SR-GNN [7]: This model uses a graph neural network to capture the transformation relationship between items and then uses the soft attention mechanism to learn the user's interest.
(8) GCE-GNN [10]: This model is an improvement of SR-GNN. It uses session graphs and a global graph to learn the representation of items.
(9) DIDN [11]: This model incorporates user behavior patterns hidden behind items in the click process.
(10) CORE [12]: This model unifies the representation space for both encoding and decoding processes to address the inconsistent prediction issue while recommending items. This algorithm can be used as the latest baseline.
These baseline models are chosen based on their popularity and effectiveness in previous studies.
C. Comparison with baselines
Method | Tmall | Nowplaying | ||
P@20 | MRR@20 | P@20 | MRR@20 | |
POP | 2.00 | 0.90 | 2.28 | 0.86 |
Item-KNN | 9.15 | 3.31 | 15.94 | 4.91 |
FPMC | 16.06 | 7.32 | 7.36 | 2.82 |
GRU4REC | 10.93 | 5.89 | 7.92 | 4.48 |
NARM | 23.30 | 10.70 | 18.59 | 6.93 |
STAMP | 26.47 | 13.36 | 17.66 | 6.88 |
SR-GNN | 27.57 | 13.72 | 18.87 | 7.47 |
GCE-GNN | 33.42 | 15.42 | 23.11 | 7.55 |
DIDN | 34.25 | 15.01 | 23.16 | 7.59 |
CORE | 34.67 | 15.56 | 22.09 | 7.41 |
PASR | 35.91 | 15.83 | 23.32 | 7.83 |
To address Q1, we compared the performance of the proposed PASR model with that of the commonly used baseline models listed in Section 3.B. In the experiment, we conducted 10 runs with different random seeds and recorded the average results. The experimental results are presented in Table 2, where the best-performing baseline and PASR model in each column are underlined and boldfaced, respectively. It can be observed from Table 2 that the PASR model achieves the best prediction performance on both datasets, outperforming the baseline models.
Among the traditional recommendation algorithms, POP is a very simple method that ignores the differences among users and has poor recommendation performance. Item-KNN and FPMC have improved to some extent, but they still have limitations. Item-KNN does not consider the sequence information of items, while the dimension explosion problem of FPMC makes it difficult to apply in actual production.
Recommendation algorithms based on deep learning, such as GRU4REC, NARM, and STAMP, have achieved better performance than traditional recommendation algorithms. However, GRU4REC and NARM based on GRU still suffer from the problem of gradient disappearance. STAMP only relies on the attention mechanism to learn the user's interest, but ignores the dependency between items.
Recommendation algorithms based on graph neural networks have further improved the performance of session recommendation because they can capture the transformation relationship between long-distance items. SR-GNN constructs the session into a graph, uses the gating graph neural network to learn the representation of the item, and then uses the soft attention mechanism to learn the user's interest. GCE-GNN uses the attention mechanism to evaluate and consider the importance of each item and generates the final item representation. DIDN incorporates item-aware, user-aware, and temporal-aware information to learn dynamic item embeddings and filter out noisy clicks within sessions. CORE applies a weighted sum for item embeddings to encode sessions and robust distance measuring techniques to prevent overfitting.
Although these models have achieved good recommendation performance, the PASR model still outperforms them. Unlike previous models, the PASR model constructs the item graph based on popularity, effectively capturing the preference of users for popular items. This ensures that the recommendations are tailored to the interests of the majority of users. The use of learnable embedding matrices E^{type} and E^{num} enables PASR to capture the different types and frequencies of edges in the graph. This allows the model to assign appropriate weights to each edge based on its type and frequency, which can better capture the dependencies between items and improve the accuracy of recommendations. Finally, the integration of popularity embeddings into the attention mechanism helps to better learn the user's interest and the contribution of each item to it. All of these factors together make PASR a powerful model for session-based recommendation, outperforming other state-of-the-art models.
D. Influence of popularity aware mechanism on recommendation performance
To address Q2 and demonstrate the effectiveness of the popularity-aware mechanism, we conducted the following comparative experiments:
(1) PASR-NP1: The number of occurrences of edges in the graph aggregator is not considered.
(2) PASR-NP2: The number of occurrences of items in the interest encoder is not considered.
(3) PASR-NP12: Neither the number of occurrences of edges in the graph aggregator nor the number of occurrences of items in the interest encoder is considered.
In this study, we fully tested the above comparison models and the PASR model on two datasets, and the experimental results are shown in Figure 2. The results demonstrate that the PASR model outperforms the other models, proving the importance of the popularity-aware mechanism. This is because the PASR model considers both edge occurrences in graph aggregators and item popularity in the interest encoder.
The performance of PASR-NP1 and PASR-NP2 is lower than that of the PASR model under normal conditions. This is because the former models ignore the number of occurrences of edges in the graph aggregator, making it difficult to accurately learn the weight of edges. The latter model does not consider the number of occurrences of items in the interest encoder, leading to interest bias.
The PASR-NP12 model has the worst performance because it ignores both the number of edges in the graph aggregator and the number of items in the interest encoder. This makes the model unable to capture the impact of item popularity on user interests.
Overall, these comparative experiments demonstrate the effectiveness of the popularity-aware mechanism in the PASR model, which can accurately learn the contribution of each item and improve the accuracy of recommendations.
In this study, we proposed a popularity-aware graph neural network model for session-based recommendation systems. Our model investigates the impact of the number of edges and items in the graph neural network on recommendation performance. Through comprehensive experiments on two different datasets, we demonstrated that our proposed PASR model outperforms ten baseline schemes.
In future work, we plan to investigate the applicability of the PASR model in other recommendation systems, such as conversational recommendation systems. We also aim to explore the integration of other factors such as novelty and diversity into our model to further improve the quality of recommendations.
In conclusion, our proposed PASR model provides a powerful solution for session-based recommendation systems, which effectively captures the dependencies between items and improves the accuracy of recommendations. The success of our model also suggests that incorporating popularity-aware mechanisms can be a promising direction for improving recommendation performance in various domains.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.