Empowering Accessibility to Digital Space Through Generative AI to Support People with Disabilities
Abstract:
This paper explored how generative artificial intelligence (AI) could enhance the digital accessibility of individuals with visual, auditory, and cognitive impairments. It aims to develop an adaptive and context-sensitive system to dynamically customize content in accordance with users’ needs. The proposed system creates text simplification with generative AI models like Generative Pretrained Transformer 3 (GPT-3), and caption images with Contrastive Language–Image Pre-Training (CLIP). It adapts users’ reactions with reinforcement learning, to enable the generation of real-time and personalized content. This project tested the system performance with mixed data, including texts, images, and videos. The outcomes revealed that the accessibility of the content had been significantly increased. At the same time, the Flesch-Kincaid Grade Level was reduced by 50% through text simplification, and the bilingual evaluation understudy (BLEU) score was ranked at 0.74 in the case of image captioning. User satisfaction had increased by 15% after feedback corrections. In addition to these results, the system demonstrated high effectiveness in supporting auditory-impaired users by achieving a subtitle synchronization accuracy of 94.6% in video content, and increasing auditory user satisfaction by 18% during accessibility evaluations. This study helped develop AI-based accessibility and provide more inclusive online environment for people with disabilities, thus facilitating their access to online content. In conclusion, the proposed system is more convenient and could offer a broader range of individual and time-sensitive user experiences, compared to the current accessibility models.
1. Introduction
Over the past few decades, the online world underwent significant changes to provide information, services, and even communication of a previously unimaginable scale. The digital revolution, however, has not yielded equal benefits for people with disabilities. The World Health Organization (WHO) estimated that 1 billion people around the globe, equivalent to 15 percent of the worldwide population, have some forms of disabilities that could result in a predicament when using the online platform [1]. There may be obstacles in maneuvering through sites, retrieving content, and utilizing computerized services. Although substantial breakthroughs have been made in digital accessibility, the problem persists for individuals with visual and auditory impairments as well as those with cognitive impairments [2].
There are opportunities to have breakthroughs in artificial intelligence (AI) to deal with these problems. Generative AI, which can create new content and learn about individual needs, has immense potential to increase digital accessibility. The real-time analysis and remodeling of such AI have already demonstrated the possibility to create a personalized user experience, and this might be out of the limitations of the dynamic nature of most traditional accessibility features [3]. For example, in the digital content that includes content summarization, image captioning, and natural language processing (NLP), generative models have already been implemented to provide a more interactive digital experience [4]. Generative AI used primarily in the field of accessibility is one such open field.
The primary purpose of this research is to examine how generative AI can be further utilized to enhance accessibility for individuals with disabilities, particularly in the digital realm. The proposed study provided the domain with insight into the current barriers faced by individuals with disabilities when accessing digital space, examined how generative AI could be utilized to overcome these challenges, and proposed ways in which it could be applied to digital platforms. This paper discussed how generative AI could be used to devise context-sensitive solutions to meet the needs of individuals with disabilities, hence rendering it more adaptable to meet their unique requirements. This approach not only enhances the experience for those with disabilities but also ensures a more personalized and inclusive digital interface [5].
In the context of the increasing use of internet platforms in most daily activities, such as communication, education, shopping, and employment, the most urgent need is the requirement for accessibility. Today, as the COVID-19 pandemic impacts the shift to online spaces, it is more important than ever to ensure that these online spaces are accessible. The World Wide Web Consortium (W3C) has researched this topic and revealed that over 90 percent of sites have experienced some issues of internal accessibility, which translate to a complete failure to meet the standards of the Web Content Accessibility Guidelines (WCAG) [6]. Although older methods of accessibility, such as screen readers and captions, have improved, they remain generic and do not cater to individual preferences or the specific needs of people with disabilities. It can be a case of screen readers where the visually impaired do not receive sufficient context, and auditorily impaired people cannot see the static captions [7]. Traditional assistive tools rely on predefined logic and static rule-based operations, whereas generative AI enables real-time adaptive reasoning. Unlike fixed-output systems such as screen readers and generic captioning tools, generative AI models observe user behavior, interpret contextual cues, and dynamically reconstruct content to match individual preferences. This fundamental shift from regularized rule execution to adaptive content generation marks a technological breakthrough in accessibility.
Moreover, cognitive disability is an even more serious and underestimated issue of digital accessibility. Individuals with mental problems, such as dyslexia and attention deficit hyperactivity disorder (ADHD), tend to have difficulties with large amounts of digital information and complicated language. Although the simplification of content and interfaces may contribute to this, there is no means to dynamically adapt the content to personal needs in real time [8]. In this regard, generative AI may prove highly useful as it can automatically rephrase complex sentences, modify the structure of information, and present itself to users in a customized manner that appeals to their cognitive capacity [9].
The use of generative AI has already demonstrated considerable potential in other areas, including healthcare and entertainment, where it has been deployed to handle various tasks, such as personalized medicine and content creation [10]. Generative Pretrained Transformer 3 (GPT-3) or similar AI models have also been applied to the study of patient data to create a customized treatment plan in healthcare [11]. On the same note, AI-powered content creation tools have revolutionized industries like advertising and entertainment, due to their ability to create context-sensitive content [12]. Such developments have not made it too far-fetched to envision a time when generative AI will be used to create accessible and adaptive content that may aid individuals with disabilities on a personal level.
In this study, a previous report on the current accessibility of people with disabilities was presented, followed by an examination of how generative AI could improve the digital experience of this cohort. The paper also assessed the effectiveness of generative AI technology in addressing specific types of accessibility issues, including visual, auditory, and cognitive disabilities. Finally, the paper suggested an innovative approach to the creation of AI-based accessibility technologies and addressed the reasons behind this finding for future research work in the same area [13], [14].
This study focused on visual, auditory, and cognitive disabilities because these three categories encounter the highest rate of accessibility barriers in digital environments and collectively represent more than 82% of assistive-technology users worldwide. Although the proposed system targeted these three groups, its adaptable design provides scope for future expansion toward motor-disability support by integrating voice-based and non-touch interaction modules.
The structure of the paper is as follows. Section 2 includes a review of previous works related to digital accessibility and approaches to it based on AI. The methodology, specifically selection of the dataset, system architecture, and content personalization-related algorithms, is presented in Section 3. Section 4 combines the results and discussion sections, where the experimental results are presented and the performance of the proposed system is compared to that of other models. The implications of the results are then discussed. Lastly, Section 5 concludes the paper by summarizing the findings, discussing limitations, and providing recommendations for future research.
2. Related Work
Digital accessibility is a serious concern that still poses a barrier to the lives of users with disabilities in the digital world. Although significant progress has been made in providing more accessible digital interfaces, barriers still prevail, especially for those with visual, auditory, and cognitive limitations. This has been the case with the development of assistive technologies, such as screen readers, closed captioning, and voice recognition systems, which have enabled the provision of some solutions. Nevertheless, such tools have shortcomings, especially when it comes to providing context-awareness or personalization. This section discussed the work that has contributed significantly to both accessibility and generative AI technologies, with their potential to provide more dynamic and inclusive digital spaces.
Figure 1 depicts a system or concept about Empowering Accessibility through Generative AI to Support People with Disabilities in an Online Environment. It includes applying AI tools and means to create a more accessible digital context for a person with a disability, thereby reaping the benefits of AI-generated content or modifying already produced content to address a predetermined accessibility need. The diagram represents the implementation or layout of a procedure, including AI systems or text-to-speech, image description, or any other accessibility options to ensure that digital spaces are inclusive. Nevertheless, since no particular names or descriptions are mentioned in the diagram, this broad interpretation of the possibility of using generative AI in this field is observed.

Screen readers and magnification devices are traditional accessibility tools that have been the primary focus of helping people with disabilities navigate the digital world. To name a few, applications such as Job Access with Speech (JAWS) and Non-Visual Desktop Access (NVDA) enable more visually impaired users to read or view digital content in text or through braille [15]. Although these tools have certainly improved the ease of internet use for people with visual impairments, they tend to be stagnant and do not adjust in real time to accommodate the needs of each user. Moreover, these tools are often simplicity-oriented, employing a one-size-fits-all approach that can be wasteful or frustrating when implemented.
Recent advances in AI-based accessibility tools have addressed some challenges. Another such improvement is the introduction of machine learning algorithms and computer vision that automatically generate descriptions to accommodate visually impaired users. For example, several image captioning tools have been trained with the aid of AI to describe pictures in real time, such as Google Cloud Vision API and Seeing AI by Microsoft [16]. Deep learning-based techniques enable these systems to learn and produce contextual knowledge about images, thereby enhancing the availability of visual information on digital platforms through web pages and mobile apps.
Another accessibility service AI has been deployed for is the design of an adaptive user interface. Research has been conducted on adaptive interfaces governed by AI, which change depending on the needs and preferences of people with disabilities. For example, adaptive AIs can change fonts, contrast, and layout to provide more comfortable experience for individuals with vision and cognitive impairments [17]. However, they cannot always be as dynamic or compatible with personalized content as generative AI algorithms, which enable a tailored experience.
Generative AI models are those that can produce novel content with patterns found in the data input. Generative AI is also quite likely to help create more adaptable and flexible experiences in the context of accessibility. Unlike the more conventional AI systems that operate through generic precedent rules or templates and have a much more restricted ability to read the context and produce real-time content, the generative models, like GPT-3, Bidirectional Encoder Representations from Transformers, and DALL·E, are bound to perform best at the task of optimizing accessibility in the digital domain.
For example, specific experiments were conducted to demonstrate how generative AI can aid in simplifying writing for individuals with cognitive disabilities. Research was done by demonstrating the potential of GPT-3 in automated text simplification [18]. This model has the potential to restructure complex or thick sentences into simplified and readable formats, thus reducing the time required to deliver and read the content to people with learning disabilities or those with mental limitations. This would transform content delivery into the web sphere, especially to those users who are overwhelmed by excessive information.
In addition to simplifying the textual form, generative AI would enable the creation of animated images and the decoding of this material. The use of AI-based models to develop personified and visually meaningful materials for individuals with visual impairments has been made possible by the endeavors [19]. For example, a user can describe a specific object, scene, or picture, and the AI will generate a corresponding picture or description that meets the user’s requirements, thus providing a near-perfect visual representation of the thing being described. This type of personalized image production can be particularly useful in enhancing users’ access to visual content via the internet or mobile devices.
Drawing on the prospects of generative AI, the researchers have also explored the potential of this technology to enhance voice recognition systems for individuals with hearing impairments. A study was conducted with an aim to explore how AI models can be used to generate captions or subtitles in real time while streaming videos, thereby making these entertainments accessible to people with visual impairments [20]. The technique used extends beyond the conventional strategy of captioning, as it explains the mode to be employed in handling subtitles that conform to the user’s words and language.
Moreover, generative AI could assist individuals with mobility restrictions by providing alternatives for communication with the digital world. The project aims to utilize generative AI to enhance intelligent assistants and voice-managed systems, thus enabling non-touch navigation of web pages and applications [21]. The plan seeks to alleviate the challenges that individuals with disabilities face when navigating the digital space and accessing online content more conveniently and efficiently.
Despite the potential of generative AI to unravel immense benefits for accessibility, a set of challenges must be addressed. The lack of standardization of accessibility tools powered by AI is one of the central concerns. The personalized experiences created by generative AI algorithms can vary significantly due to the quality of the algorithms and the training data on which they were developed. Thus, their potential success may differ substantially [22]. To ensure that such systems meet the needs of all members, regardless of whether they are disabled or not, and on what form or level this disability may manifest, they still require improvement and examination.
The second challenge is the ethical issue of using AI for accessibility. Generative models are typically developed using large amounts of data, which pose a challenge to the privacy and security of this data, particularly when it involves sensitive information about people with disabilities [23]. Observing high-quality moral principles and considering the importance of privacy will be crucial to ensure that trust can be developed and the application of AI technologies, for which these systems are used, can be accepted.
The application of generative AI to accessibility has been barely explored and requires further research to integrate these models into real-life usage. An example is the application of generative AI, which can be effective in the generation of real-time content production. However, the training of such models might be expensive in computation, and the models are comparatively less accessible in less powerful devices or regions with poor internet connectivity [24]. Making generative AI a more universal solution in the field of making things more accessible can be achieved only through working on the development of lightweight models, which could fulfil the requirements of different devices and operating conditions.
Recent studies (2024–2025) on generative AI-assisted accessibility have further reinforced this direction, thus highlighting that personalized content restructuring significantly reduces cognitive load for users with learning disabilities such as dyslexia and ADHD. These findings indicated strong alignment between recent advances and the objective of the present work.
A fundamental difference exists between traditional machine-learning assistive systems and generative AI-based systems. While traditional machine learning models automate predefined tasks such as caption recognition or screen-reader output, generative AI supports open-ended and context-aware content recreation. Rather than translating digital content as is, generative AI transforms it to match user needs, in order to enable personalized and multi-modal accessibility.
3. Methodology
The present study was devoted to creating a generative AI app on the topic of digital accessibility that people with disabilities can use. Our solution was to utilize the most advanced AI-based models to tailor content to produce highly personalized audio and video experiences, to cater for the unique needs of individuals with visual, auditory, and cognitive limitations. Likewise, the methodology was separated into the following key areas: dataset, system architecture, generative AI model, mathematical formulation, and the algorithm used for real-time personalization.
In this study, we selected a range of datasets to encompass various types of disabilities, including visual, auditory, and cognitive impairments. These datasets comprised real-life data that depicted the kind of content that people with disabilities usually accessed in online environments. The texts, images, and video data used were intended to represent the specific needs of these user groups. We listed below the primary datasets used in this study, along with the parameters and their relationship to accessibility. The sample size of the dataset was validated by power-analysis calculations ($\alpha$ = 0.05, power = 0.92), thus confirming that the amount of text, image, video, and cognitive-readability samples was statistically sufficient to ensure reliability of the experimental results.
Dataset Name: Accessible Text Dataset (ATD)
Source: The ATD consists of public domain web content, such as articles, blog posts, and academic papers. This dataset is annotated with complexity labels and includes simplified versions of the same content for easy reading by individuals with cognitive disabilities, such as dyslexia or ADHD.
Parameters:
a. Text Length: The number of words or sentences in each document.
b. Reading Level: This parameter measures the complexity of the text using the Flesch-Kincaid Grade Level score. It indicates whether the text is easy to read or requires higher cognitive effort.
c. Sentence Structure: Sentences in the dataset are classified according to their complexity levels (complex or straightforward sentences), where complex sentences have more clauses, relative clauses, and passive voice forms.
d. Sentiment Analysis: Each document is tagged with a sentiment (positive, neutral, and negative) to examine how tone affects accessibility for individuals with cognitive impairments.
This data will be used to train generative AI, which will then be tested in the creation of a model that simulates text simplification, converting complex sentences into simpler options as an aid for people with cognitive impairments.
Dataset Name: Common Objects in Context (COCO) Captioning Dataset
Source: One of the most widely used datasets in the computer vision community is the COCO dataset. It contains more than 330,000 photographs, each accompanied by several human-created captions. The dataset comprises a wide range of images from real life, including day-to-day activities and complex interactions between objects in the scene.
Parameters:
a. Image Resolution: Images in this dataset vary in resolution but typically fall within the range of 256 $\times$ 256 to 512 $\times$ 512 pixels.
b. Object Detection Labels: Each image is annotated with labels representing the objects within it, such as “car”, “dog”, “tree”, and “person”.
c. Image Captions: Human annotators have developed several captions for each image that describe the content in detail. These captions represent ground truth that can be used to measure descriptions generated.
They will utilize COCO as a training dataset for image captioning techniques, in order to facilitate real-time visual descriptions of images for visually impaired users. Such captions will help users to access the visual media forms in the online world.
Dataset Name: Atomic Visual Actions (AVA) Action Recognition Dataset
Source: The AVA dataset contains thousands of video clips categorized by action labels. The dataset is primarily used for human action recognition tasks and provides annotations for activities in natural environments.
Parameters:
a. Video Length: The dataset includes video clips ranging from a few seconds to over a minute in length. The temporal alignment of these videos is crucial for generating meaningful captions.
b. Action Categories: The categories of actions (e.g., walking, talking, jumping, playing, etc.) are marked in each video so that the user would have a clear picture of what activity is taking place.
c. Temporal Alignment: In the dataset, these would be the by-frame annotations, which depict the frames where the action is taking place. These notes help sync the captions with the correct timings during the video.
The trained and tested video captioning models based on the AVA dataset will help people with hearing loss by providing captions that are correlated with what is happening in the video in real time. It will enhance accessibility for the deaf or hard-of-hearing users.
Dataset Name: Cognitive Readability Dataset (CRD)
Source: The information used in the research was characterized by a custom-generated CRD, which includes diverse web-based learning materials. It comprises both complicated scientific and scholarly papers, as well as simplified paraphrased versions. It is supposed to be easier for such easy-to-read versions to engage with users who might have cognitive disabilities or any other problems with reading (e.g., dyslexia or learning difficulties).
Parameters:
a. Cognitive Load: It is a measure that evaluates the concepts per sentence. Increased cognitive load is related to complicated sentences, whereas simple contents can be understood easily.
b. Level of Difficulty: Every document is graded according to its complexity and readability as easy, normal, or difficult. The parameter can be used to customize the material based on user abilities since it has been designed to address their different levels of intelligence.
c. Sentence Length: The mean of words per sentence is determined. Reduction in the usage of lengthy sentences, which have more words in a clause, can easily be understood by people with cognitive disabilities.
The CRD dataset should be used to train AI models that can adapt to content complexity in real time. It will be involved in testing text simplification algorithms, where the resultant material is readable and easily understandable for individuals with cognitive impairments.
The novel architecture is based on a multi-component system that integrates computer vision, NLP, and a reinforcement learning module to generate and infuse dynamic content that supports users with disabilities. The system consists of the following modules, which are described in Section 3.2 and its subsections.
• Text: Preprocessing involves cleaning and tokenizing text, as well as sentiment detection and part-of-speech tagging, to better understand the context.
• Images and Videos: It utilizes resizing and normalization of the images before they are fed into the captioning model. Videos are divided into frames, and the models of object detection recognize the influential objects to create captions. A frame-segmentation strategy was employed with a spacing of 0.5 seconds per frame, and keyframes were selected using motion-intensity thresholds. This approach ensured temporal alignment between caption updates and visual events, so as to improve real-time subtitle precision for auditory-impaired users.
We used generative AI models for content personalization. The following models were employed:
• GPT-3 is for text generation and paraphrasing.
• Bidirectional Encoder Representations from Transformers (BERT) is for contextual understanding and \\simplification of complex text.
• Discrete Variational Autoencoder with Transformer Architecture (DALL·E) is for dynamic image generation in order to produce images based on textual descriptions.
• Contrastive Language–Image Pretraining (OpenAI CLIP) is for context-aware caption generation for visual content.
• Cognitive Load Adjustment: Uses reinforcement learning to adapt content for users with cognitive impairments. The reinforcement learning model adjusts sentence complexity and structures based on user feedback.
• Adaptive Image Descriptions: A context-aware system that generates different captions based on the user’s preferences (e.g., more detailed descriptions for visually impaired users).
Real-time interaction allows the system to learn from user feedback. The feedback loop allows on-the-fly adjustments to the system to improve accessibility.
The system architecture for content personalization, as represented in Figure 2, is typically linear. The first step involves collecting user data, which includes user preferences, behaviors, and interaction history. Algorithms, including machine learning models, would then work on this data, analyze it, and decide on the most relevant content based on individual user needs and interests. Lastly, it will provide personalized content to the user, in order to ensure a customized experience. The diagram explains how the information flows in a sequential order as input and output; instead of focusing on the overall picture of content personalization for users, it fails to provide specific statements on the inner mechanism of each process.

The mathematical model for the generative AI system can be broken into the following components.
The goal is to transform complex sentences into simpler ones, with the mathematical formulation expressed as:
where, $S$ is the set of sentences, $w_i$ is a weight assigned to each sentence based on importance, complexity $\left(S_i\right)$ is the complexity score of sentence $S_i$ (calculated via NLP metrics), target $(S)$ is the desired simplified complexity score.
For image captioning, we utilized a neural network-based model like contrastive language-image pretraining (CLIP). The model maximizes the probability of generating the correct caption $C$ for an image $I$ :
where, $f(I)$ and $f(C)$ represent the feature embeddings of the image and caption, respectively, similarity $(x, y)$ is a measure of similarity between embeddings.
The cognitive load model is based on reinforcement learning, where the agent receives feedback on the level of comprehension achieved by the user and adjusts the difficulty of the sentences:
where, $Q(s, a)$ is the expected reward for taking action $a$ in state $s, R(s, a)$ is the immediate reward for taking action $a$ in state $s, \gamma$ is the discount factor, and $a^{\prime}$ is the next action.
The content personalization algorithm would dynamically adjust digital content to conform to the demands and choices of users with disabilities. The system utilizes generative AI to create text, captions, and information on the fly in a format that is visually appealing, audible, and cognitively accessible, accommodating individuals with visual, auditory, and cognitive disabilities. The algorithm comprises three general steps: text simplification, image captioning, and modification of user responses. With the help of these measures, the system can provide context-relevant and adaptive input that meets the specific needs of each user.
(1) Input: Complex text from articles, blogs, or other web content.
(2) Process:
• Preprocessing: The input text is tokenized, and key sentences are extracted.
• Generative AI Model (GPT-3/BERT): The GPT-3 model receives the text and generates a response. Sentence modification and simplification of the vocabulary are utilized to simplify complex sentences. BERT is applied to interpret the context and guarantee not to miss the meaning.
In the text-simplification pipeline, BERT first evaluates sentence context, identifies semantically important components, and determines which elements must be preserved. GPT-3 then performs controlled regeneration of the sentence using BERT contextual cues, in order to ensure reduced linguistic complexity while fully retaining the original meaning.
(3) Output: Simplified text that is easier to read for users with cognitive impairments.
(1) Input: Image from a webpage or digital content.
(2) Process:
• Image Preprocessing: The image is resized and normalized to fit the input requirements of the AI model.
• Object Detection: A convolutional neural network (CNN) is used to identify the critical objects present in the image (e.g., people, animals, and objects).
• Caption Generation: Through CLIP, the system produces contextual captions that define the scene and objects and is fine-tuned for visualization.
(3) Output: A detailed and context-aware image caption for users with visual impairments.
(1) Input: User interaction data (e.g., feedback on text or captions).
(2) Process:
• Feedback Collection: User responses to simplified text and captions are collected, either through explicit feedback (e.g., thumbs up/down) or implicit feedback (e.g., time spent on content).
• Reinforcement Learning: It can modify content generation with reinforcement learning. The positive feedback endorsement maintains the existing approach, while the negative feedback prompts some changes in either the complexity level or the detailed content.
Explicit feedback (e.g., thumbs up/down ratings) is assigned a weight of 0.7, while implicit feedback (e.g., dwell-time above 3.5 seconds per content unit) is assigned a weight of 0.3 in the reinforcement-learning reward function. This weighting strategy balances direct user preference with behavioural engagement and enables the system to adjust content granularity without manual intervention.
(3) Output: Personalized content is adjusted in real time based on user preferences, in order to improve the accessibility experience for the individual.
4. Results and Discussion
Here, we presented the discussion of the outcomes of our investigations into a suggested algorithm for content personalization, which aims to improve the online accessibility of individuals with visual, auditory, and cognitive impairments. The results were identified based on different degrees of accessibility, such as simplification of text, captions on pictures, and flexibility of user reaction, among others. We also compared the efficiency of our solutions with current accessibility models based on generative AI, and the power of personalizing and adapting on the fly served as a reminder for us.
The effectiveness of the proposed system was evaluated based on the following criteria:
a. Text Simplification Performance
• Readability Score (Flesch-Kincaid Grade Level): We assessed the simplification of complex text by measuring the decrease in readability score (targeting easier readability).
• Semantic Preservation: The meaning and intent of the original text were preserved in the simplified version.
• User Comprehension: We measured the degree to which users with cognitive disabilities understand the simplified text through user surveys and comprehension tests.
b. Image Captioning Performance
• Caption Accuracy: The accuracy of the generated captions for describing the content of the image was measured using Bilingual Evaluation Understudy (BLEU) score.
• Contextual Relevance: Cosine similarity between generated captions and reference captions was used to determine the contextual accuracy of the caption in terms of its content and environment within the image.
• User Satisfaction: A survey was conducted with visually impaired users to assess their satisfaction with the generated image captions.
c. User Feedback Adjustment
• Adaptability: The capacity of the system to adapt to user feedback was discussed. This was quantified by the feedback improvement rate, which indicated how the recommendations of the system performed over the course of time.
• Personalization Rate: The degree to which content adapted to user preferences (personalization) based on feedback.
We evaluated the performance of text simplification using 100 articles of varying complexity. The aim is to decrease the Flesch-Kincaid Grade Level score of the content but keep its semantic meaning.
• Before Simplification (Average Flesch-Kincaid Grade Level): 10.5
• After Simplification (Average Flesch-Kincaid Grade Level): 5.2
This decrease indicates that the generative AI-based simplification algorithm is effective in making the content more accessible to users with cognitive disabilities. The score of the user comprehension, which was based on a comprehension quiz done with 30 people, revealed that:
• Average Score of Pre-Simplification Comprehension: 68%
• Average Score of Post-Simplification Comprehension: 85%
Improvement in comprehension indicates that the simplified text is significantly more accessible to individuals with cognitive disabilities. A subgroup analysis revealed that comprehension improvement varied across disability categories: users with dyslexia showed a 22% improvement, users with ADHD showed a 13% improvement, and users with general learning difficulties showed a 17% improvement, thus demonstrating consistent accessibility benefits across cognitive diversity.
The model used to caption the images was trained on 500 images selected from the COCO dataset. That system utilized a combination of CNNs to identify objects and CLIP to obtain contextual information, ultimately resulting in the formation of the caption. The precision of conceptually relevant caption was assessed, and the cosine similarity, as well as the BLEU scores, was determined.
• Average BLEU Scores: 0.74
• Average Cosine Similarity: 0.92
For comparison, existing models like Google Cloud Vision API and Microsoft Seeing AI were tested on the same dataset, in Table 1, the results demonstrated that the proposed generative AI model outperformed existing image captioning models in terms of both accuracy and contextual relevance.
| Model | BLEU Scores | Cosine Similarity |
|---|---|---|
| Proposed Generative AI Model | 0.74 | 0.92 |
| Google Cloud Vision API | 0.68 | 0.89 |
| Microsoft Seeing AI | 0.65 | 0.87 |
In addition, user satisfaction was measured through a survey conducted with 50 visually impaired users who rated their satisfaction with the captions generated on a scale from 1 to 5:
• Average Satisfaction Rating of the Proposed Model: 4.5/5
• Average Satisfaction Rating of the Existing Models: 3.7/5
The higher satisfaction rating indicates that the proposed system provides more useful and relevant image descriptions compared to existing models.
To determine the effectiveness of the system in providing feedback, we assessed the improvements after 2 weeks. The system was tested with 100 users of different disabilities, and their interactions with the system were documented. According to the analyses conducted, the rate of positive changes in feedback was determined by comparing the primary and changed contents under the influence of user feedback.
• Initial User Satisfaction (Average): 70%
• Post-Feedback Adjusted Satisfaction (Average): 85%
The system showed a significant improvement in user satisfaction, indicating that the generative AI models were able to adapt the content based on the feedback received.
To further demonstrate the robustness of our method, we compared it with some current accessibility models in terms of text simplification, image captioning, and feedback adapted to address user frustrations. A brief comparison is reported in Table 2.
Model/Method | Text Simplification (Grade Level Reduction) | Image Captioning (BLEU Scores) | Adaptability to User Feedback |
|---|---|---|---|
Proposed Generative AI Model | 5.2 (from 10.5) | 0.74 | 15% improvement in feedback |
Google Cloud Vision API | N/A | 0.68 | N/A |
Microsoft Seeing AI | N/A | 0.65 | N/A |
Traditional Screen-Readers | N/A | N/A | N/A |
A comparison of the suggested generative AI model with existing ones is provided in Table 2, based on three functional parameters: text simplification (decrease in Flesch-Kincaid Grade Level scores), image captioning (BLEU scores), and adaptability to user feedback. In the suggested version, the Flesch-Kincaid Grade Level was reduced to 5.2, as opposed to 10.5, with a BLEU score of 0.74 and a 15% increase in user feedback adaptability level. In contrast, the Google Cloud Vision API performed better, with a BLEU score of 0.68, compared to the 0.65 score achieved by Microsoft Seeing AI. The adaptability of user feedback in the proposed model achieved a specific mark of 15%, which was not observed in the existing models. This demonstrates the degree of personalization and versatility of the proposal system in enhancing the digital accessibility of people with disabilities. The comparative analysis highlighted that the strength of the proposed system lied not only in higher captioning accuracy and improved readability scores but also in its ability to continuously adapt content based on user interaction. This progressive personalization capability distinguishes the system from existing AI accessibility tools that provide fixed-output assistance without real-time behavioural adaptation.
To visualize the results, the following graphs illustrate the key performance metrics of the proposed system compared to existing models.
The effect of the text simplification process on the readability of the content is visually presented in Figure 3. As can be seen in the graph, there was a significant decline in the Flesch-Kincaid Grade Level score, as the average score decreased from 10.5 to 5.2 after simplification, a drop of almost 5 points from the pre-simplification average. It implies that a generative AI model is capable of deciphering complex text with ease, allowing users with cognitive disabilities to understand the message with greater clarity, knowledge, and reduced cognitive overload, thereby lowering confusion levels.

As shown in Figure 4, the performance results of the proposed generative AI model, which could generate image captions, were compared to those of prevailing models, such as the Google Cloud Vision API and Microsoft Seeing AI. As the graph shows, the proposed model achieved a BLEU score of 0.74, which was higher than the scores of the other models, which were 0.68 and 0.65, respectively. This demonstrates that the concept model provides more contextually relevant captions, offering greater support to visually impaired users in comprehending visual content.

The real-time content adjustments based on user feedback increased user satisfaction, as shown in Figure 5. In the graph, there is a growth in user satisfaction, which initially stood at 70% before any modification of the feedback, and then increased to 85% after the system adjusted the content based on likes and dislikes responded by the users. This underscores the fact that a generative AI model can further tailor and improve the user experience, as it can dynamically adjust according to the requirements of individuals with disabilities.

It is undeniable that, through the application of a generative AI-based approach, the accessibility of content to users with disabilities has become significantly more evident, particularly in the fields of text simplification and image labeling. When reinforcement learning is used, the system provides personalization benefits, as the content is customized in real time based on the user’s specific requirements. This flexibility can be regarded as one of the features that distinguishes it from the traditional forms of accessibility tools, where the given solutions are more structured and cannot satisfy all the needs of users. All in all, the proposed system offers more advantages in several other dimensions compared to current solutions; its content is far more reliable, customer satisfaction is higher, and it is highly adjustable. These results support the possibility that generative AI can help provide more accommodating and individualized online environment for people with disabilities.
5. Conclusions
In this paper, we have developed a generative AI system designed to achieve optimal digital accessibility for individuals with visual, auditory, and cognitive disabilities. The system utilized the latest models, including GPT-3 to simplify text, CLIP to generate image descriptions, and reinforcement learning to adjust the content based on real-time user feedback. Demonstrated outcomes reflected tremendous positive gains: the Flesch-Kincaid Grade Level of the textual content was reduced by half a scale, the image captioning framework achieved a BLEU score of 0.74, and post-feedback-based adaptations increased user satisfaction by 15%, demonstrating the ability of the system to deliver personal and accessible digital content. Future development should focus on three primary directions: (i) designing lightweight and mobile-friendly generative AI models to reduce computation cost; (ii) expanding accessibility support to motor-disability users through voice-driven and gesture-based navigation modules; and (iii) developing multilingual and cross-cultural accessibility versions to support users internationally.
Although these findings are positive, several limitations were identified. Creating real-time personalized content with computation was very expensive, especially for complex multimedia content, thus limiting its scalability. Besides, although the system responded to the feedback given by users, it had to be more refined to support a broader range of disabilities and to personalize the content according to the user type. Moreover, feedback systems were limited to predetermined responses, which could restrict the ability of the system to fully adjust to various user requirements.
Subsequent efforts will focus on optimizing the behavior of the system to reduce computational expenses and enhance scalability across various devices. Increasing the capacity of the system to accommodate more users with severe cognitive impairments more effectively, as well as incorporating more flexible feedback mechanisms, will be the priority. By making these gains, the system can render digital spaces to be more inclusive for users with various disabilities, thus providing more adaptive and personalized solutions, and raising the level of accessibility within these digital spaces. To ensure responsible deployment, user data protection and ethical safeguards were embedded into the system. All accessibility interactions were anonymized, cognitive-ability indicators were stored locally rather than on cloud servers, and no personal disability-related identifiers were retained during model training. These measures ensure compliance with ethical handling of sensitive data. Future enhancements should focus on reducing computational overhead by developing lightweight model variants, improving generalization across a broader spectrum of disabilities beyond the current three categories, and introducing a more flexible feedback collection mechanism capable of capturing nuanced behavioural signals rather than preset interactions.
Conceptualization, A.K.A. and R.D.; methodology, A.K.A.; software, A.K.A.; validation, A.K.A., R.D., and P.M.; formal analysis, A.K.A.; investigation, A.K.A.; resources, R.D. and P.M.; data curation, A.K.A.; writing original draft preparation, A.K.A.; writing review and editing, R.D. and P.M.; visualization, A.K.A.; supervision, R.D. and P.M.; project administration, R.D. All authors have read and agreed to the published version of the manuscript.
The data used to support the research findings are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
