Racism and Hate Speech Detection on Twitter: A QAHA-Based Hybrid Deep Learning Approach Using LSTM-CNN

praveen kumar jayapal; kumar raja depa ramachandraiah; kranthi kumar lella

Outline

Open Access

Research article

Racism and Hate Speech Detection on Twitter: A QAHA-Based Hybrid Deep Learning Approach Using LSTM-CNN

praveen kumar jayapal¹

,

kumar raja depa ramachandraiah²

,

kranthi kumar lella³^*

¹

DiSTAP, Singapore-MIT Alliance for Research and Technology, 138602 Singapore, Singapore

²

Faculty of Information and Communications Technology, Universiti Teknikal Malaysia Melaka, 76100 Melaka, Malaysia

³

School of Computer Science and Engineering, VIT-AP University, 522237 Vijayawada, India

International Journal of Knowledge and Innovation Studies

|

Volume 1, Issue 2, 2023

|

Pages 89-102

https://doi.org/10.56578/ijkis010202

Received: 10-11-2023,

Revised: 11-16-2023,

Accepted: 11-22-2023,

Available online: 12-07-2023

View Full Article|

Download PDF

Abstract:

Twitter, a predominant platform for instantaneous communication and idea dissemination, is often exploited by cybercriminals for victim harassment through sexism, racism, hate speech, and trolling using pseudony-mous accounts. The propagation of racially charged online discourse poses significant threats to the social, political, and cultural fabric of many societies. Monitoring and prompt eradication of such content from social media, a breeding ground for racist ideologies, are imperative. This study introduces an advanced hybrid forecasting model, utilizing convolutional neural networks (CNNs) and long-short-term memory (LSTM) neural networks, for the efficient and accurate detection of racist and hate speech in English on Twitter. Unlabelled tweets, collated via the Twitter API, formed the basis of the initial investigation. Feature vectors were extracted from these tweets using the TF-IDF (Term Frequency-Inverse Document Frequency) feature extraction technique. This research contrasts the proposed model with existing intelligent classification algorithms in supervised learning. The HateMotiv corpus, a publicly available dataset annotated with types of hate crimes and ideological motivations, was employed, emphasizing Twitter as the primary social media context. A novel aspect of this study is the introduction of a revised artificial hummingbird algorithm (AHA), supplemented by quantum-based optimization (QBO). This quantum-based artificial hummingbird algorithm (QAHA) aims to augment exploration capabilities and reveal potential solution spaces. Employing QAHA resulted in a detection accuracy of approximately 98%, compared to 95.97% without its application. The study's principal contribution lies in the significant advancements achieved in the field of racism and hate speech detection in English through the application of hybrid deep learning methodologies.

Keywords: Cyberstalkers, Artificial hummingbird algorithm, Quantum-based optimization, Long short-term memory, Racism detection

1. Introduction

In the digital age, the prevalence of social media has revolutionized the way individuals communicate and express themselves. A notable trend observed is the uninhibited expression of thoughts and opinions by users, often leading to the oversharing of personal information [1]. The anonymity provided by social networks emboldens many users to post their emotions and thoughts without filters, sometimes disregarding the potential harm to others [2]. Particularly, individuals with racist ideologies exploit social media to disseminate their beliefs, asserting their right to free expression [3]. This unfettered expression is not confined to fanatical religious, racial, or political views; it extends to extreme bigotry, including sexist behavior that transgresses the bounds of hate speech [4]. The voluminous nature of interactions on social media platforms renders manual monitoring and response to the myriad of comments, messages, and data virtually unfeasible [5]. Furthermore, the scarcity of official data on hate crimes underscores the prevailing issues in accurately addressing such content on these platforms [6]. Despite these challenges, the rich data available on social media are pivotal for user data processing. Data mining plays a critical role in this context, uncovering hitherto unknown patterns within datasets and enabling rapid, informed research and decision-making processes [7]. In the realm of Natural Language Processing (NLP), LSTM and CNN are prominent neural network architectures. LSTM excels in handling sequential data, while CNN is adept at detecting patterns and features in text data. The amalgamation of these two architectures suggests a hybrid model that capitalizes on the strengths of both LSTM and CNN for effective hate speech detection.

Social media platforms, notably Facebook and Twitter, have raised significant concerns regarding the prevalence of inappropriate language in user posts [8]. The manifestation of racism on these platforms is multifaceted, encompassing both overt and covert forms [9]. Instances include the utilization of counterfeit profiles for disseminating racist remarks. Historically linked to ethnicity, racism now proliferates based on skin tone, country of origin, language, cultural background, and predominantly religious beliefs. Online remarks and posts inciting racial tensions have threatened the social, political, and cultural equilibrium of various nations [10]. The rapid dissemination of racist ideologies via social media underscores the urgency of identifying and eliminating such content [11].

Exposure to racist comments and tweets on social media has been associated with various mental and physical health conditions, leading to adverse health outcomes [12]. Three distinct forms of racism identified in online interactions include institutional racism, personally mediated racism, and internalized racism [13]. Personally mediated racism occurs when an individual experiences or witnesses prejudice based on race [14]. Consequently, racism in society inflicts psychological stress on individuals, heightening the risk of chronic diseases [15]. Racist groups and individuals are increasingly employing sophisticated methods to promote cyber racism [16]. Sentiment analysis has gained prominence for its application in analyzing social media content for purposes like hate speech detection and racism identification [17]. Recent advancements in automatic detection methods aim to address the issue of abusive content [18]. Machine and deep learning approaches have proven their efficacy in various domains, including sentiment analysis [19], [20].

This study, therefore, employs a hybrid deep learning model to analyze racist tweets, with the following contributions: first, implementation of pre-processing techniques, such as TF-IDF and Bag of Words, for enhanced classification accuracy; second, utilization of the LSTM model within CNN-LSTM for effective detection of racist and offensive language; third, application of the QAHA to refine AHA performance, thereby optimizing the hyper-parameters of CNN-LSTM; finally, deployment of multiple methods for the detection of racist and offensive speech on a publicly available dataset. The subsequent sections of this study are structured as follows: Section 2 reviews related works on Twitter data analysis. Section 3 elucidates the proposed model, while Section 4 presents the experimental analysis. Section 5 concludes the study and outlines future research directions.

2. Related Works

The pervasive nature of hate speech on social media has catalyzed significant research efforts. Lee et al. [4] explored sentiment analysis to detect tweets laden with racist content, employing Gated Convolutional Recurrent Neural Networks (GCR-NNs). This stacked ensemble model, integrating Gated Recurrent Units (GRUs), CNNs, and Recurrent Neural Networks (RNNs), demonstrates a synergistic improvement over individual components. In GCR-NNs, GRUs serve to extract salient features from raw text, which are then processed by CNNs to assist RNNs in making accurate predictions. Comparative analyses with existing models underscored the GCR-NN's efficacy, achieving an accuracy of 98% and a 97% detection rate for racist tweets.

Peng et al. [21] employed a sophisticated Bidirectional Encoder Representations from Transformers (BERT) model, specifically optimized for Twitter sentiment classification, to analyze the tone of approximately one million tweets related to the Black Lives Matter (BLM) movement from July 2013 to March 2021. This model, tested on the Sentiment 140 dataset, achieved unparalleled results among machine learning models, registering an accuracy of 0.94 in the testing phase. The study utilized metrics such as retweet counts and word counts in tweets to visualize key concepts and milestones of the BLM movement. Public opinion analysis revealed varied degrees of support for issues like social justice and police brutality. The implications of this research extend to the promotion and analysis of social and political movements.

Ghosal and Jain [22] introduced the first unsupervised detection system, which encompasses the HateCircle algorithm, hate tweet classification, and code-switch data preparation techniques. The HateCircle method, employing word co-occurrence analysis, determines the hate orientation of each phrase. A multi-class system for hate tweet categorization was developed using part-of-speech tagging, Euclidean distance, and Geometric median methods. The system proved more effective in identifying hate content in local scripts compared to Roman script, advocating its use in code-switch data preparation. Utilizing an enhanced hate lexicon in conjunction with various dictionaries, the system achieved a maximum F1-score of 0.74 in the Hindi dataset and 0.88 in the Bengali dataset. Comparative evaluation of the proposed parts of speech tagging and Geometric detection strategies with the HateCircle method and hate tweet identification framework showed that HateCircle attained maximum accuracies of 0.73 and 0.78 on the Hindi and Bengali datasets, respectively. This study demonstrates the efficacy of contextual detection research incorporating a language-independent component in combating the spread of subtly harmful content on social media.

Ali et al. [23] proposed novel graph-based algorithms for identifying hate material on social media. Utilizing Twitter, a dataset was created for testing and validation purposes, involving the extraction and annotation of tweets by language experts. The authors introduced a custom LSTM-GRU model to categorize hate speech into distinct classes. Applied to the compiled dataset, the model achieved an accuracy of 98.14 percent. The Girvan-Newman method was employed to identify key individuals and intraclass communities on Twitter. This approach is significant for monitoring social media to detect potential disruptions, including the identification of hate tweets and groups.

In a separate study, Agarwal et al. [24] explored enhancing the efficiency of automatic hate speech detection on Social Media Platforms (SMPs) by parallelizing traditional ensemble learning techniques. The research involved parallelizing three popular hate speech detection algorithms—bagging, A-stacking, and random sub-space—and their performance was compared with their serial counterparts across various high-dimensional datasets. These datasets encompassed diverse topics such as the COVID-19 pandemic and the 2020 US farmers' agitation in India (2021). The parallel models demonstrated a considerable increase in speed and efficiency, validating their suitability for the intended applications. This study underscored the importance of generalization by testing the models in a cross-dataset environment, finding that the parallelized algorithms maintained accuracy comparable to their serial versions.

Joloudari et al. [25] proposed future research directions for the development of a BERT-based model tailored for sentiment analysis. In this approach, a deep CNN architecture captures the hierarchical structure of tweet embeddings, while the BERT model accumulates contextual representations of words, efficiently delineating the intricate semantics of tweets related to COVID-19. Comparative analysis with existing sentiment analysis techniques demonstrated that the BERT-deep CNN models excel in real-time classification of sentiments in COVID-19-related tweets. The study's findings contribute significantly to understanding public sentiment, offering insights that are crucial for policymakers in discerning public opinion, identifying misinformation, and formulating emergency response strategies. This research sets a new benchmark for future studies in sentiment analysis of social media data in crisis contexts and furthers the development of sentiment analysis methodologies.

Saleh et al. [26] explored the effectiveness of using domain-specific word embedding for the automatic identification of hate speech. This method assigns negative connotations to specific terms to detect coded words effectively. Additionally, the application of the transfer learning language model (BERT), known for its proficiency in various NLP tasks, was examined for hate speech classification. Experimental results revealed that a bidirectional LSTM-based model, employing domain-specific word embedding, achieved an F1-score of 93% on a balanced dataset comprising existing hate speech datasets. In contrast, the BERT model attained a 96% F1-score. The performance of pre-trained models was found to be influenced by the volume of training data. Despite the disparity in corpus size, the first method, focusing on domain-specific data during training, outperformed the BERT model, trained on a larger corpus. The study highlighted the advantage of creating large pre-trained models from rich domain-specific content for contemporary social media platforms.

Nagar et al. [27] introduced a novel methodology for the detection of hate dialogue on Twitter. This method integrates the author's content, social context, and linguistic characteristics to enhance the accuracy of hate speech detection. Incorporating textual content and the surrounding social environment, the approach employs an encoder to assimilate the unified features of the authors. The adaptability of this framework allows the use of various text encoders to capture the textual properties of the material, rendering it suitable for a broad spectrum of existing and future language models. The efficacy of this method was validated on two distinct Twitter datasets, demonstrating significant improvements over current state-of-the-art approaches. The results highlighted the importance of considering social context in enhancing the identification of hate speech on Twitter.

Liu et al. [28] proposed BotMoE, a system designed to detect fraudulent bots on Twitter by using multiple modalities of user information, including metadata, textual content, and network structure. The system incorporates a Mixture-of-Experts (MoE) layer, which considers the Twitter communities to augment domain generalization and adaptability. BotMoE constructs modal-specific encoders for metadata attributes and textual structure, subsequently employing a MoE layer that categorizes users into appropriate groups based on community expertise. The final stage involves an expert fusion layer that amalgamates user representations from metadata, text, and graph perspectives, ensuring consistency across all modalities. Extensive trials indicated that BotMoE significantly surpassed existing methods in identifying sophisticated and stealthy bots, demonstrating reduced reliance on training data and enhanced generalization capabilities for new and unknown user populations.

Mnassri et al. [29] addressed the challenge of unbalanced and sparsely labeled datasets by introducing a learning strategy that incorporates external emotional variables from diverse corpora. This study utilized BERT and mBERT, the latter focusing on cross-lingual identification of abusive material. Leveraging the shared encoder of transformers, the model concurrently recognizes abusive content and incorporates emotional aspects. This approach facilitates rapid learning through the use of auxiliary information and enhances data efficiency by minimizing overfitting through shared representations. The research indicated that incorporating emotional intelligence significantly improved the accuracy of databases in recognizing hate speech and abusive language. Notably, multi-task models exhibited fewer errors than single-task models in both hate speech identification and aggressive language detection tasks, presenting an intriguing development in this field.

Almaliki et al. [30] introduced a method for the precise identification of Arabic anti-Semitism on Twitter. This study implemented the Arabic BERT-Mini Model (ABMM) for detecting online bigotry. Twitter data were analyzed using bidirectional encoder representations from the model, categorizing the findings into typical, abusive, and hateful classes. Comparative tests were conducted against current state-of-the-art methods, and the results demonstrated that the ABMM model excelled in detecting Arabic hate speech, achieving a highly encouraging score of 0.986.

Gite et al. [31] explored the application of Ant Colony Optimization (ACO) as an optimization strategy, integrating it with four machine learning models utilizing various feature selection and extraction methods on K-Nearest Neighbour (KNN) and Logistic Regression (LR). The objective was to demonstrate the differences between findings using comparative analysis. The proposed feature selection and extraction methods facilitated the improvement of the machine learning models' efficiency. This study considered both numerical datasets for stroke prediction and textual datasets for hate speech detection. The text dataset, compiled from Twitter API data encompassing tweets with positive, negative, and neutral emotions, utilized the TF-IDF method in conjunction with ACO. The application of ACO to the Random Forest model resulted in a significant accuracy enhancement, reaching up to 10.07 percent.

Fazil et al. [32] presented a Bidirectional LSTM (BiLSTM) network for identifying xenophobic content. The model employed a multi-channel setup using contemporary word representation techniques to capture semantic relationships across various time frames using multiple filters of different kernel widths. The network processed encoded representations from several channels, with the output of a stacked 2-layer BiLSTM being combined and transmitted through a dense layer, subsequently weighted by an attention layer. The classification was conducted using a sigmoid function in the output layer. The performance of this model was evaluated on three Twitter datasets using four assessment metrics. Comparative analysis with five state-of-the-art models and an aggregate of baseline models indicated superior performance of the proposed model. The ablation study revealed that the removal of channels and the attention mechanism significantly impacted the model's performance. An empirical study was conducted to determine the optimal settings for the model's word representation methods, optimization algorithms, activation functions, and batch size.

3. Material and Methods

For the automatic detection of online hate crimes, access to annotated corpora is indispensable. In the absence of a standardized benchmark, researchers have been compelled to collect and categorize data on hate crimes independently. This research aims to fill the gap in literature focusing specifically on the identification and motivations of hate crimes, which has been previously unexplored, thus hampering the understanding of prevalent hate crime causes and their mitigation.

3.1 Corpus Construction

The HateMotiv corpus was developed through the collection of Twitter posts spanning nine years (1 January 2010–30 December 2019), using the TweetScraper too [6]. It is important to note that the presence of terms such as “hate crime” or “hate crimes” in a tweet does not necessarily imply the endorsement or incitement of violence against a specific group. Twitter users commonly employ hashtags to associate their posts with specific events or topics [33]. Consequently, prevalent hashtags related to hate crimes were identified using the “Hashtagify” application (https://hashtagify.me/) [34]. Hashtags including “hate crime”, “racist”, “racism”, and “Islamophobia” were among those selected for compiling relevant tweets. These hashtags, found to align closely with the FBI’s categorization of hate crimes, were employed as keywords to extract suitable tweets. An English instructor with extensive annotation experience selected these keywords, resulting in a query that returned 23,179 tweets containing the specified hashtags. To optimize the resources for manual annotation, a subset of 5,000 tweets was randomly selected for further analysis.

3.2 Annotation Process

Each tweet was annotated by two native English-speaking annotators [35]. Uniform standards were applied in the annotation process, focusing on identifying the type of hate crime and the motivation behind it. The corpus was annotated for four categories and causes of hate crimes, as outlined in Table 1. Regular discussions were held between the annotators and the judge overseeing the annotation process to address any arising inconsistencies or challenges.

Table 1. Glossed entity classes for HateMotiv corpus

Class Category	Explanation
Hate crime type	Hate crime type refers to categories identified by the FBI, including physical assault, verbal abuse, and incitement to hatred
Motivation	Motivation refers to the underlying motive for hate crimes, such as bias related to racism, religion, disability, and unknown

The HateMotiv corpus analysis revealed that physical assault constitutes the most common type of hate crime, while verbal abuse is the least frequently recorded category on Twitter. A notable observation from the data is the primary role played by the inability to accept diversity in terms of skin color and nationality in the perpetration of various forms of hate crimes. Conversely, hate crimes attributed to disability and negative attitudes towards disabled individuals constituted a minor percentage of the overall causes.

Furthermore, sexism or gender-based discrimination emerged as the second most prevalent justification for hate crimes, following racial prejudice. Notably, a proportion of hate crimes were committed with no apparent motive, reflecting the perpetrators' inherent biases or mental health issues. This is captured in the corpus under the term “unknown motive". The data indicated that assaults were the most frequent type of hate crime committed for reasons classified as unknown. However, the incidence of crimes committed for indeterminate reasons remains relatively low, accounting for approximately 0.011% of all categorized hate crimes.

3.3 Cleaning and Visualizing Data

The analysis of emojis within tweets was employed as a preliminary method to gauge the tone of the message. However, the primary focus was on textual data, which required extensive cleaning and preprocessing. This process involved the following four steps:

Step 1: Filtering. The first step involved the removal of hypertext links (e.g., http://google.com) and user handles typically starting with the “@” symbol on Twitter. This was crucial to eliminate irrelevant data and focus on the content of the tweets.

Step 2: Tokenization. In the second step, a Bag of Words representation was created. This involved the exclusion of punctuation and question marks to facilitate the appropriate representation of large datasets.

Step 3: Stop-word removal. Commonly occurring words such as “a”, “an”, and “the”, which do not contribute to the analysis, were eliminated in the third step.

Step 4: N-gram creation. This step involved generating n-grams, defined as sequences of ‘n' words or characters extracted from the text. While unigrams and bigrams have distinct utilities, this study utilized unigram tokens for tweet preparation. The decision to focus on unigrams was based on their comprehensive data coverage.

The selection between constructing unigrams or bigrams should be guided by the specific objectives of the study. Bigrams, such as “not good”, are effective in conveying emotions succinctly, making them particularly suitable for sentiment analysis and product reviews. Conversely, unigrams offer comprehensive data coverage. In this research, the focus was on unigram tokens for tweet preparation, to evaluate the efficacy of various stemmers and lemmatizers. It was found that while lemmatizers could deconstruct compound words into their elements, this process did not significantly enhance accuracy compared to the categorization models applied. Post-cleanup of text documents, tokenization was employed for more detailed analysis, necessitating the transformation of these tokens into feature vectors. Feature vectors serve as a crucial representation in the training of classification algorithms.

Two transformation techniques were compared: the Bag of Words method and the TF-IDF method. The Bag of Words approach, a straightforward transformation strategy, utilizes the corpus's diverse words as features, with each column indicating the frequency of a specific term's occurrence. Despite its computational simplicity, this method provides limited insights beyond word frequency. The TF-IDF method, on the other hand, combines the frequency of a term's occurrence in the text with its distribution across different document types to assign a weight to each word. This implies that commonly occurring words across various text types are assigned a lower weight. The feature vectors generated using these methods were successfully prepared for use in training classification models.

3.4 Classification Using the Proposed Architectures

This study introduced three distinct deep learning models for hate speech classification: a LSTM network, a CNN, and a hybrid model combining both. The performance of each model in executing classification tasks was evaluated comprehensively. The CNN model, characterized by higher efficiency and a manageable number of trainable parameters, was ultimately selected for deployment on System-on-Chip Field Programmable Gate Arrays (SoC-FPGAs). The choice was influenced by hardware limitations, as SoC-FPGAs, known for their high performance and low power consumption, are suitable for edge computing applications, but only the CNN model was compatible with this hardware.

3.4.1 LSTM architecture

As depicted in Figure 1, the employed LSTM architecture incorporates two LSTM layers, each consisting of 100 LSTM cells. These layers are designed to accurately represent the sequential relationships between the features and the labels of hate speech data. The LSTM layers process the incoming data, discerning complex connections between characteristics and labels. Subsequently, two fully connected layers receive the output from the LSTM layers and generate a final prediction based on the processed data.

Figure 1. Proposed LSTM architecture

3.4.2 CNN architecture

Figure 2 presents the CNN model structure, comprising several CNN layers followed by pooling layers, originally developed for image classification tasks. For the purpose of hate speech classification, data arrays, represented as images, were input into the model. The CNN was then trained to identify pertinent features and make predictions about the input data concerning the classification of racist content.

Figure 2. Proposed CNN architecture

3.4.3 CNN-LSTM architecture

The hybrid architecture, depicted in Figure 3, amalgamates the strengths of CNN and LSTM networks to enhance outcomes for complex deep learning tasks. This model harnesses the LSTM's ability to model temporal correlations among features in conjunction with the CNN's capacity to extract pivotal information from the data. The CNN simplifies the feature extraction process by identifying key elements, while the LSTM maintains the temporal relationships within the data.

The CNN model was ultimately chosen for hardware implementation due to its superior performance and efficiency compared to the LSTM and CNN-LSTM models. The selection was also influenced by the model's simplicity in terms of understanding and implementation on Field Programmable Gate Arrays (FPGAs). Hardware limitations were a contributing factor, as the FPGA was compatible only with the CNN architecture.

In summary, the study presented three distinct deep learning models - CNN, LSTM, and their hybrid - for the classification of racist content. Extensive testing led to the selection of the CNN model for implementation on SoC-FPGAs due to its high performance and computational efficiency. The CNN model demonstrated superiority over LSTM models in terms of accuracy and processing efficiency, with all models capable of analyzing temporal aspects of data.

Figure 3. Proposed CNN-LSTM architecture

3.4.4 CNN-LSTM architecture

Three models were implemented for comparative analysis: an LSTM model, a CNN, and a CNN-LSTM hybrid. The LSTM model consists of two layers, each with 100 units, following an input layer that processes 1200 features sequentially. The initial LSTM layer receives data in the format of (1, 1200) and outputs in the form of (1, 100), which is then fed into the subsequent LSTM layer. A series of fully connected layers compute the probability distribution for each class after the network learns the association between features and labels.

The proposed CNN architecture comprises six CNN blocks, each containing a CNN layer followed by a max pooling layer. After the CNN blocks, a flatten layer converts the aggregated features into a one-dimensional array. A dropout layer with a 50% rate is introduced to prevent overfitting, randomly eliminating neurons to reduce dependency on training data. The output from the CNN layers is passed to fully connected layers, which produce the probability distribution for each class.

In the CNN-LSTM hybrid:

Each of the four CNN processing units includes a CNN layer for feature extraction and a max pooling layer to capture the most salient features.

A reshape layer then transforms the output from the CNN blocks, converting 3D CNN output into 2D LSTM input.

The CNN block's output is fed into an LSTM layer to learn features that evolve over time.

Finally, a flatten layer followed by two fully connected layers generates a probability distribution for the classes.

3.4.5 Parameter tuning

The performance of the implemented deep learning models can be significantly influenced by the tuning of hyperparameters. These models necessitate meticulous adjustment of several hyperparameters, which affect memory and compute complexity. This section outlines the additional hyperparameters that facilitate the selection of a specific approach for given scenarios. It is observed that superior outcomes often require extensive tuning of these hyperparameters.

Hyperparameter optimization can be mathematically defined as:

$x^*=\arg _{x \in X}^m f(x)$

(1)

where, $f(x)$ represents the objective function to minimize the error measured on the validation set, and $x^*$ is the set of hyperparameters within the domain $X$ that yields the lowest error. The goal is to identify the hyperparameter values that result in optimal performance on the validation set metric. The selection between manual and automatic hyperparameter tuning establishes a balance between the maximal computational cost of automated models and the in-depth knowledge required for manual selection. In this study, a QBO model was utilized to fine-tune the CNN-LSTM model's hyperparameters.

QBO

This section delineates the principles of QBO, an approach utilized for feature selection in this study. In QBO, binary representation is used, where ‘1s' indicate features to be retained and ‘0s' denote features to be discarded. The quantum bit (Q-bit) operations in QBO involve each feature being represented as a superposition, characterized by a complex integer. This is mathematically expressed as:

$q=\alpha+i \beta=e^{i \theta},|a|^2+|\beta|^2=1$

(2)

where, $\alpha$ and $\beta$ correspond to the two potential states of the Q-bit, namely ‘0' and ‘1'. The angle of $q$ is adjusted using the arctan function.

The primary objective of QBO is to determine the change in the value of $q$. This process is conducted using a calculation method $\Delta \theta$.

$q(t+1)=q(t) \times R(\Delta \theta)=[a(t) \beta(t)] \times R(\Delta \theta)$

(3)

where, $R(\Delta \theta)$ stands for the rotation matrix associated with a change of $\Delta \theta$ in the angle, defined as:

$R(\Delta \theta)=\left[\begin{array}{l} \cos (\Delta \theta)-\sin (\Delta \theta) \\ \sin (\Delta \theta)-\cos (\Delta \theta) \end{array}\right]$

(4)

The optimal solution, denoted as $X_b$, is predetermined to set the value of the parameters influencing $q$. The binary representation of a solution $X_i$ is represented by its $j$-th bit $X_{i j}$, while the $j$-th bit of $X_b$ at time $t$ is denoted as $X_{b j}$. As reported in the study [36], the angle vector in QBO is capable of assuming one of eight distinct values, allowing for varied adjustments in the Q-bit representation.

The primary objective of QBO is to balance the exploration and exploitation potential. Initially, the data is split into a 70% training set and a 30% testing set. Random numbers are then used to determine the fitness of each agent. The agent with the lowest fitness score is selected as the best agent. The AHA [37] is employed for exploitation, and the solution is updated iteratively until the termination criteria are met. Following this, the implemented QAHA is evaluated on the reduced dimensionality of the test set based on the optimal solution. The QAHA will be expounded in the subsequent sections.

A. First stage

Initial agents representing the population are generated, with each solution comprising $D$ Q-bits. Consequently, each solution $X_i$ can be expressed as:

$X_i=\left[q_{i 1}\left|q_{i 2}\right| \ldots\left|q_{i D}\right|\right]=\left[\theta_{i 1}\left|\theta_{i 2}\right| \ldots \mid \theta_{i D}\right], i=1,2, \ldots N$

(5)

where, $X_i$ refers to the superposition of probabilities of selecting or not selecting features.

B. Second stage

The process of updating agents until a specified criterion is met constitutes a critical phase in the application of the QAHA. Initially, the binary representation of each solution, denoted as $Xi$, is determined through an equation. This representation involves a random charge rand $\in[ 0,1]$ and a parameter $\beta$ as defined in Eq. (2).

$B X_{i j}=\left\{\begin{array}{lcc} 1 & \text { if } & |\beta|^2>\text { rand } \\ 0 & & \text { otherwise } \end{array}\right.$

(6)

Subsequently, the fitness value for each agent is computed. This computation is achieved by training a CNN-LSTM classifier, with the features derived from $B X_{i j}$ serving as the model's hyperparameters. The fitness value is formulated as:

$\text { Fit }_i=\rho \times \gamma+(1-\rho) \times\left(\frac{\left|B X_{i j}\right|}{D}\right)$

(7)

where, $\left|B X_{i j}\right|$ represents the error rate in classifying features using the CNN-LSTM classifier, and denotes the total number of features employed. A normalization factor within the range $[ 0,1]$ ensures parity in fitness levels across different agents. The LSTM model is preferred due to its simplicity, efficiency, and reliance on a singular tuning parameter. Its ability to retain information from the training set contributes to its effectiveness, particularly when other classifiers might not yield desired results.

The subsequent stage involves identifying the most optimal agent $X_b$, characterized by the minimum fitness value $Fit _b$. This step is pivotal in the QAHA process as it determines the most suitable set of features for the classification task at hand.

C. Third stage

The test set is narrowed down to features equivalent to those in the binary representation of $X_b$. The reduceddimension test set is then used to apply the trained classifier for predictions. Subsequently, the output quality is thoroughly evaluated. The computational cost of QAHA is determined by the initial population size, population size $N$, fitness evaluation $N_{F it}$, and maximum iteration count.

$O(Q A H A)=O(T \times C \times N+T \times N \times D+T \times D / 2)+O(N \times D)$

(8)

In summary, the complexity of QAHA is given by:

$O(Q A H A)=O\left(T \times N_{F i t} \times N+T \times N \times D+T \times D / 2\right)$

(9)

4. Results and Discussion

4.1 Hardware and Software Used for the Experiments

For preprocessing the tweets and applying deep learning methodologies, a Jupyter notebook, scripted in Python 3.6, was employed. The computational tasks were executed on a workstation equipped with the following hardware specifications: an Intel Core i7-9700K processor operating at 3.60GHz, 32.0GB of RAM, and a 6GB NVIDIA GeForce graphics card. For text preprocessing, the spaCy and nltk libraries were utilized, facilitating the efficient processing of the datasets.

4.2 Performances Metrics

The performance of the model was evaluated using the confusion matrix, a tool that provides four distinct outcomes: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The effectiveness of the model was determined through the calculation of various metrics derived from the confusion matrix.

$\text { Accuracy }=(T N+T P) /(T N+T P+F N+F P)$

(10)

$\text { Sensitivity }=\left(\frac{T P}{T P+F N}\right)$

(11)

$\text { Specificity }=\left(\frac{T N}{T N+F P}\right)$

(12)

$\text { Precision }=\frac{T P}{T P+F P}$

(13)

$F-\text { Measure }=\frac{2 T P}{2 T P+F P+F N}$

(14)

4.3 Analysis of the Proposed Classifier

The efficacy of the proposed model was compared with that of generic deep learning models. The uniqueness of the dataset used in this study, which had not been previously employed for validation analysis, necessitated this comparison. The results of the analysis are presented in Table 2 and Table 3.

The analysis of the hybrid model's performance is detailed in Table 2. The Autoencoder (AE) model demonstrated an Area Under the Curve (AUC) of 0.758 and achieved an accuracy of 87.72%. Precision was recorded at 78.14%, recall at 89.92%, and the F-measure at 88.67%. Subsequently, the Deep Belief Network (DBN) model exhibited an AUC of 0.854, accuracy of 89.17%, precision of 70.91%, recall of 85.69%, and an F-measure of 82.33%. The RNN model registered an AUC of 0.687, with an accuracy of 90.28%. The precision was noted at 64.17%, recall at 86.66%, and F-measure at 80.24%. Following this, the CNN model showed an AUC of 0.947, an accuracy of 92.78%, precision of 91.94%, recall of 90.61%, and an F-measure of 86.86%. The LSTM model recorded an AUC of 0.957 and an accuracy of 94.34%. Its precision was 92.45%, recall 93.78%, and F-measure 91.36%. Finally, the combined CNN-LSTM model achieved the highest performance with an AUC of 0.967, accuracy of 95.97%, precision of 96.84%, recall of 97.24%, and an F-measure of 94.13%.

Table 2. Analysis of the proposed hybrid model without QAHA

Classification	AUC	Accuracy (%)	Precision (%)	Recall (%)	F-Measure (%)
AE	0.758	87.72	78.14	89.92	88.67
DBN	0.854	89.17	70.91	85.69	82.33
RNN	0.687	90.28	64.17	86.66	80.24
CNN	0.947	92.78	91.94	90.61	86.86
LSTM	0.957	94.34	92.45	93.78	91.36
CNN-LSTM	0.967	95.97	96.84	97.24	94.13

Table 3. Analysis of the proposed hybrid model with QAHA

Classification	AUC	Accuracy (%)	Precision (%)	Recall (%)	F-Measure (%)
AE	0.9343	91.56	84.56	90.66	90.67
DBN	0.8923	90.12	85.2	92.57	86.54
RNN	0.9082	93.22	78.7	90.67	88.67
CNN	0.9135	94.43	93.6	94.01	91.54
LSTM	0.9544	93.23	94.7	96.62	95.09
CNN-LSTM	0.9829	98.45	98.8	99.56	97.70

Table 3 presents the outcomes from the assessment of the hybrid model integrated with QAHA. The AE model exhibited an AUC of 0.9343, achieving an accuracy of 91.56%. Precision was recorded at 84.56%, recall at 90.66%, and the F-measure at 90.67%. The DBN model attained an AUC of 0.8923, accuracy of 90.12%, precision of 85.2%, recall of 92.57%, and an F-measure of 86.54%. The RNN model demonstrated an AUC of 0.9082 and achieved an accuracy of 93.22%. Its precision was noted at 78.7%, recall at 90.67%, and F-measure at 88.67%. The CNN model showed an AUC of 0.9135, an accuracy of 94.43%, precision of 93.6%, recall of 94.01%, and an F-measure of 91.54%. Further, the LSTM model registered an AUC of 0.9544, with an accuracy of 93.23%, precision of 94.7%, recall of 96.62%, and an F-measure of 95.09%. Lastly, the CNN-LSTM model, representing the pinnacle of this research, achieved an AUC of 0.9829, an exceptional accuracy of 98.45%, precision of 98.8%, recall of 99.56%, and an F-measure of 97.70%.

Figure 4. AUC comparison

Figure 5. Accuracy analysis with and without the optimization model

Figure 6. Graphical representation of various deep learning models

Figure 7. Recall analysis

Figure 8. Validation analysis of QAHA

Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8 provide graphical representations of these results, including AUC comparison (Figure 4), accuracy analysis with and without the optimization model (Figure 5), and recall analysis (Figure 7), among others. The validation analysis of QAHA is depicted in Figure 8. The analysis revealed that the implementation of QAHA significantly enhanced the performance of the deep learning models. The CNN-LSTM model with QAHA outperformed the other models, demonstrating the effectiveness of the proposed hybrid approach in the context of hate speech classification.

5. Conclusions and Future Work

The emergence of racist content on social media platforms, particularly Twitter, has necessitated the development of automated detection and removal mechanisms. This study has adopted a sentiment analysis approach to identify racist tweets, focusing on specific phrases and words. Following data preprocessing, neural network classification was conducted using LSTM, CNN, and a hybrid CNN-LSTM model. The experimental results demonstrated that the CNN and hybrid models significantly outperformed the LSTM model in both phases of the analysis. It was found that, despite its lower execution time, LSTM's complexity rendered it less suitable for SoC-FPGAs compared to the CNN model. The CNN's simpler architecture and high accuracy underscored its appropriateness for SoC-FPGA implementation. Furthermore, the QAHA was employed to optimize hyperparameters, enhancing the classification accuracy of the proposed model.

The dataset used in this study is publicly available, offering a valuable resource for future research into the automatic detection and prediction of hate crimes and their underlying motivations, including racism. This accessibility to the scientific community could spur further investigations into this domain. The experimental study revealed that the proposed model achieved superior performance compared to baseline models, with accuracy and recall rates exceeding 95% and 96%, respectively. Understanding the factors contributing to online hate crimes through advanced deep-learning techniques can be instrumental in curbing detrimental biases and reducing the incidence of crimes driven by such biases. Future work in this area aims to refine and employ sophisticated deep-learning methods to train models in recognizing the root causes of hate crimes shared online.

Data Availability

The data used to support the research findings are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

1.

J. A. Benítez-Andrades, Á. González-Jiménez, Á. López-Brea, J. Aveleira-Mata, J. M. Alija-Pérez, and M. T. García-Ordás, “Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT,” PeerJ Comput. Sci., vol. 8, p. e906, 2022. [Google Scholar] [Crossref]

2.

S. Sadiq, A. Mehmood, S. Ullah, M. Ahmad, G. S. Choi, and B. W. On, “Aggression detection through deep neural model on Twitter,” Future Gener. Comput. Syst., vol. 114, pp. 120–129, 2021. [Google Scholar] [Crossref]

3.

J. A. Benitez-Andrades, Á. González-Jiménez, Á. López-Brea, B. C., J. Aveleira-Mata, J. M. Alija-Pérez, and M. T. García-Ordás, “BERT model-based approach for detecting racism and xenophobia on Twitter data.,” in Research Conference on Metadata and Semantics Research, 2021, pp. 148–158. [Google Scholar] [Crossref]

4.

E. Lee, F. Rustam, P. B. Washington, F. El Barakaz, W. Aljedaani, and I. Ashraf, “Racism detection by analyzing differential opinions through sentiment analysis of tweets using stacked ensemble GCR-NN model,” IEEE Access, vol. 10, pp. 9717–9728, 2022. [Google Scholar] [Crossref]

5.

H. Macherla, G. Kotapati, M. T. Sunitha, K. R. Chittipireddy, B. Attuluri, and R. Vatambeti, “Deep learning framework-based chaotic hunger games search optimization algorithm for prediction of air quality index,” Ing. Syst. Inf., vol. 28, no. 2, pp. 433–441, 2023. [Google Scholar] [Crossref]

6.

N. Alnazzawi, “Using Twitter to detect hate crimes and their motivations: The hatemotiv corpus,” Data, vol. 7, no. 6, p. 69, 2022. [Google Scholar] [Crossref]

7.

G. A. De Souza and M. Da Costa-Abreu, “Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata.,” in 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–6. [Google Scholar] [Crossref]

8.

N. Vanetik and E. Mimoun, “Detection of racist language in French tweets,” Inf., vol. 13, no. 7, p. 318, 2022. [Google Scholar] [Crossref]

9.

C. Arcila-Calderón, J. J. Amores, P. Sánchez-Holgado, and D. Blanco-Herrero, “Using shallow and deep learning to automatically detect hate motivated by gender and sexual orientation on Twitter in Spanish,” Multimodal Technol. Interact., vol. 5, no. 10, p. 63, 2021. [Google Scholar] [Crossref]

10.

N. V. R. S. Reddy, C. Chitteti, S. Yesupadam, V. Subbaiah, S. S. V. Desanamukula, and N. J. Bommagani, “Enhanced speckle noise reduction in breast cancer ultrasound imagery using a hybrid deep learning model,” Ing. Syst. Inf., vol. 28, no. 4, pp. 1063–1071, 2023. [Google Scholar] [Crossref]

11.

B. Jia, D. Dzitac, S. Shrestha, K. Turdaliev, and N. Seidaliev, “An ensemble machine learning approach to understanding the effect of a global pandemic on Twitter users’ attitudes,” Int. J. Comput. Commun. Control., vol. 16, no. 2, pp. 243–264, 2021. [Google Scholar] [Crossref]

12.

A. Bisht, A. Singh, H. S. Bhadauria, J. Virmani, and Kriti, “Detection of hate speech and offensive language in Twitter data using LSTM model.,” in Recent Trends in Image and Signal Processing in Computer Vision, 2020, pp. 243–264. [Google Scholar] [Crossref]

13.

A. Toliyat, I. Sarah Levitan, Z. Peng, and R. Etemadpour, “Asian hate speech detection on Twitter during COVID-19,” Front. Artif. Intell., vol. 5, p. 932381, 2022. [Google Scholar] [Crossref]

14.

H. Herodotou, D. Chatzakou, and N. Kourtellis, “A streaming machine learning framework for online aggression detection on Twitter.,” in 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 2020, pp. 5056–5067. [Google Scholar] [Crossref]

15.

O. Istaiteh, R. Al-Omoush, and S. Tedmori, “Racist and sexist hate speech detection: Literature review,” in 2020 International Conference on Intelligent Data Science Technologies and Applications (IDSTA), Valencia, Spain, 2020, pp. 95–99. [Google Scholar] [Crossref]

16.

S. A. Kokatnoor and B. Krishnan, “Twitter hate speech detection using stacked weighted ensemble (SWE) model,” in 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Bangalore, India, 2020, pp. 87–92. [Google Scholar] [Crossref]

17.

S. Kaya and B. Alatas, “A new hybrid LSTM-RNN deep learning based racism, xenomy, and genderism detection model in online social network,” Int. J. Adv. Netw. Appl., vol. 14, no. 2, pp. 5318–5328, 2022. [Google Scholar]

18.

F. Rodríguez-Sánchez, J. Carrillo-de Albornoz, and L. Plaza, “Automatic classification of sexism in social networks: An empirical study on Twitter data,” IEEE Access, vol. 8, pp. 219563–219576, 2020. [Google Scholar] [Crossref]

19.

M. Mozafari, R. Farahbakhsh, and N. Crespi, “Hate speech detection and racial bias mitigation in social media based on BERT model,” PloS One, vol. 15, no. 8, p. e0237861, 2020. [Google Scholar] [Crossref]

20.

N. Pitropakis, K. Kokot, D. Gkatzia, R. Ludwiniak, A. Mylonas, and M. Kandias, “Monitoring users’ behavior: Anti-immigration speech detection on Twitter,” Mach. Learn. Knowl. Extr., vol. 2, no. 3, p. 11, 2020. [Google Scholar] [Crossref]

21.

J. Peng, J. S. Fung, M. Murtaza, A. Rahman, P. Walia, D. Obande, and A. R. Verma, “A sentiment analysis of the Black Lives Matter movement using Twitter,” STEM Fellow. J., vol. 8, no. 1, pp. 56–66, 2023. [Google Scholar] [Crossref]

22.

S. Ghosal and A. Jain, “HateCircle and unsupervised hate speech detection incorporating emotion and contextual semantics,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 4, pp. 1–28, 2023. [Google Scholar] [Crossref]

23.

M. Ali, M. Hassan, K. Kifayat, J. Y. Kim, S. Hakak, and M. K. Khan, “Social media content classification and community detection using deep learning and graph analytics,” Technol. Forecast. Social Change, vol. 188, p. 122252, 2023. [Google Scholar] [Crossref]

24.

S. Agarwal, A. Sonawane, and C. R. Chowdary, “Accelerating automatic hate speech detection using parallelized ensemble learning models,” Expert Syst. Appl., vol. 230, p. 120564, 2023. [Google Scholar] [Crossref]

25.

J. H. Joloudari, S. Hussain, M. A. Nematollahi, R. Bagheri, F. Fazl, R. Alizadehsani, R. Lashgari, and A. Talukder, “BERT-deep CNN: State of the art for sentiment analysis of COVID-19 tweets,” Social Netw. Anal. Min., vol. 13, no. 1, p. 99, 2023. [Google Scholar] [Crossref]

26.

H. Saleh, A. Alhothali, and K. Moria, “Detection of hate speech using BERT and hate speech word embedding with deep model,” Appl. Artif. Intell., vol. 37, no. 1, p. 2166719, 2023. [Google Scholar] [Crossref]

27.

S. Nagar, F. A. Barbhuiya, and K. Dey, “Towards more robust hate speech detection: Using social context and user data,” Soc. Netw. Anal. Min., vol. 13, no. 1, p. 47, 2023. [Google Scholar] [Crossref]

28.

Y. Liu, Z. Tan, H. Wang, S. Feng, Q. Zheng, and M. Luo, “BotMoE: Twitter bot detection with community-aware mixtures of modal-specific experts,” arXiv Preprint, vol. arXiv:2304.06280, 2023, [Online]. Available: https://arxiv.org/abs/2304.06280 [Google Scholar]

29.

K. Mnassri, P. Rajapaksha, R. Farahbakhsh, and N. Crespi, “Hate speech and offensive language detection using an emotion-aware shared encoder,” arXiv Preprint, 2023, [Online]. Available: https://arxiv.org/abs/2302.08777 [Google Scholar]

30.

M. Almaliki, A. M. Almars, I. Gad, and E. S. Atlam, “ABMM: Arabic BERT-Mini model for hate-speech detection on social media,” Electronics, vol. 12, no. 4, p. 1048, 2023. [Google Scholar] [Crossref]

31.

S. Gite, S. Patil, D. Dharrao, M. Yadav, S. Basak, A. Rajendran, and K. Kotecha, “Textual feature extraction using ant colony optimization for hate speech classification,” Big Data Cognitive Comput., vol. 7, no. 1, p. 45, 2023. [Google Scholar] [Crossref]

32.

M. Fazil, S. Khan, B. M. Albahlal, R. M. Alotaibi, T. Siddiqui, and M. A. Shah, “Attentional multi-channel convolution with bidirectional LSTM cell toward hate speech prediction,” IEEE Access, vol. 11, pp. 16801–16811, 2023. [Google Scholar] [Crossref]

33.

P. Burnap and M. L. Williams, “Us and them: Identifying cyber hate on Twitter across multiple protected characteristics,” EPJ Data Sci., vol. 5, no. 1, 2016. [Google Scholar] [Crossref]

34.

“Search and find the best twitter hashtags.” https://hashtagify.me/ [Google Scholar]

35.

“Training data for AI, ML with human empowered automation.” https://www.cogitotech.com/about-us [Google Scholar]

36.

K. Srikanth, L. K. Panwar, B. K. Panigrahi, E. Herrera-Viedma, A. K. Sangaiah, and G. G. Wang, “Meta-heuristic framework: Quantum inspired binary grey wolf optimizer for unit commitment problem,” Comput. Electr. Eng., vol. 70, pp. 243–260, 2018. [Google Scholar] [Crossref]

37.

W. Zhao, L. Wang, and S. Mirjalili, “Artificial hummingbird algorithm: A new bio-inspired optimizer with its engineering applications,” Comput. Meth. Appl. Mech. Eng., vol. 388, p. 114194, 2022. [Google Scholar] [Crossref]

Cite this:

APA Style

IEEE Style

BibTex Style

MLA Style

Chicago Style

GB-T-7714-2015

Jayapal, P. K., Ramachandraiah, K. R. D., & Lella, K. K. (2023). Racism and Hate Speech Detection on Twitter: A QAHA-Based Hybrid Deep Learning Approach Using LSTM-CNN. Int J. Knowl. Innov Stud., 1(2), 89-102. https://doi.org/10.56578/ijkis010202

cc

©2023 by the author(s). Published by Acadlore Publishing Services Limited, Hong Kong. This article is available for free download and can be reused and cited, provided that the original published version is credited, under the CC BY 4.0 license.

pdf

Figure 1. Proposed LSTM architecture

Table 1. Glossed entity classes for HateMotiv corpus

Citations

Crossref: 0