Acadlore takes over the publication of IJTDI from 2025 Vol. 9, No. 4. The preceding volumes were published under a CC BY 4.0 license by the previous owner, and displayed here as agreed between Acadlore and the previous owner. ✯ : This issue/volume is not published by Acadlore.
Enhanced SSD Algorithm-Based Object Detection and Depth Estimation for Autonomous Vehicle Navigation
Abstract:
Autonomous vehicles necessitate robust stability and safety mechanisms for effective navigation, relying heavily upon advanced perception and precise environmental awareness. This study addresses the object detection challenge intrinsic to autonomous navigation, with a focus on the system architecture and the integration of cutting-edge hardware and software technologies. The efficacy of various object recognition algorithms, notably the Single Shot Detector (SSD) and You Only Look Once (YOLO), is rigorously compared. Prior research has indicated that SSD, when augmented with depth estimation techniques, demonstrates superior performance in real-time applications within complex environments. Consequently, this research proposes an optimized SSD algorithm paired with a Zed camera system. Through this integration, a notable improvement in detection accuracy is achieved, with a precision increase to 87%. This advancement marks a significant step towards resolving the critical challenges faced by autonomous vehicles in object detection and distance estimation, thereby enhancing their operational safety and reliability.
1. Introduction
Recent advancements in autonomous vehicles (AVs) have garnered noteworthy attention, with a commensurate increase in research dedicated to this domain [1]. A critical component of AV technology is the object detection mechanism, which incorporates artificial intelligence and sensor-based methodologies to ensure driver safety [2]. Autonomous vehicles offer the promise to enhance driving comfort and reduce incidents resulting from vehicle collisions. These vehicles are engineered to sense and navigate their environment on highways autonomously, without human intervention [3, 4].
The suite of sensors distributed throughout the vehicle is integral to its functionality. An array of sensors, including LiDARs, radars, and cameras, is employed to survey and interpret the surrounding milieu [5]. The process of environmental sensing, or perception, encompasses several sub-tasks: object detection, object classification, 3D position estimation, and simultaneous localization and mapping (SLAM). Object detection itself involves localization—determining an object's position within an image—and classification—assigning a category to each detected object, such as a traffic light, vehicle, or pedestrian [6].
In autonomous driving systems, object detection is deemed one of the most crucial processes for safe navigation. It is essential for enabling the vehicle's controller to anticipate and maneuver around potential obstacles [7]. Therefore, the employment of precise object detection algorithms is imperative. The complexity of the requisite system architecture is necessitated by the need to process a multitude of features within the vehicle [8].
In the present study, the objective is to refine object detection accuracy using robust tools such as the Zed Camera in conjunction with algorithms like SSD, which have demonstrated superior performance in real-time scenarios. The Zed Camera, in particular, has proven to be an invaluable sensor for the collection of depth data, especially in challenging and dynamic environments. A robust perception system, integrating multiple sensors and sophisticated algorithms, such as the proposed SSD Algorithm, is requisite for AVs to achieve accurate object recognition and informed decision-making. To enhance the vehicles' perceptual capabilities, reliability, and safety, a synthesis of various sensors and algorithms, including the Zed Camera, is often pursued by researchers in the field of autonomous vehicles.
2. Related Work
Object identification is one of the most researched topics in computer vision and self-driving vehicles. The process of object detection often begins with the extraction of features from the input picture using several algorithms, including RCNN, SSD, and YOLO. During the training phase, the CNN learns the feature in the object and detect the object. Localization, which includes locating an item inside an image, and classification, which entails giving the object a class (such as "pedestrian," "vehicle," or "obstacle"), are the two sub-tasks that make-up object detection.
Carranza-García et al. [19] contrasted single-stage like Yolo V3 and two-stage detectors like Faster R-CNN. Before deep diving into object detection algorithm let us understand the taxonomy includes in the process which is explained in the next section.
3. Methodology
Object detection in Autonomous Vehicle is very important feature to make the vehicle more advanced. Multiple things will need to be recognized in a single image. Multiple item detection in an image and distance estimation are some difficult issues, but with our work applied, it is possible to do so accurately and in real time [32]. We have implemented improved SSD ("Single shot detector") in our Algorithm model to have accurate and reliable results. SSD is a well-liked object detection method that has become known for its accuracy and speed in real time. By utilizing both the camera's precise depth data and the algorithm's object detection abilities, we combined the SSD with stereo depth information from the ZED camera, potentially improving object detection capabilities.
4. Results and Discussion
We have examined the implementation part in this section and the results that are derived from the analysis. We can divide the implementation part into various categories and is defined below:
Input Data:
The main task in deep learning is the construction of the algorithm that can learn from the data or to make predictions on this data. This SSD algorithm is used for data driven predictions. For our implementation, ZED camera has been used in the vehicle to capture the images. The camera is installed at the front of the vehicle so that it can capture the images appearing in the front. For this application we have taken both color and depth images that can be seen in Figures 7 and 8. The advantage of using the depth image is to calculate the distance of the object from the vehicle.
5. Conclusion
In this research, we have studied about the Autonomous Vehicle and its system architecture. It has two parts one is hardware which includes various sensors such as Camera, LiDAR, RADAR which perceives the information from this hardware and then fed into the software part of the vehicle. The software architecture is the core of the entire system which has the operating system, algorithms which takes the input data from different sensors and apply logic for the decision-making. This logic’s output is then taken by the control modules which regulates the acceleration, motion of the vehicle. Advanced technologies like machine learning computer vision are being applied for this process. There are various algorithms available like Convolutional Neural Networks (CNN), R-CNN, YOLO etc. but our customized SSD model is preferable for real time predictions and considerably has less localization errors, computationally inexpensive and require less storage & processing power for the obstacle detection. The object distance estimate algorithm was created using the mono-depth technique. The overall model has been trained on stereo data and draws inferences on monocular views. Also, we have tested the suggested software model and algorithm in real-time environment with Zed Camera mounted on the vehicle which gives the outstanding results with accuracy of 87%. We may combine the object detection technique with the estimation distance to share the feature extraction layers, thereby improving its efficiency. The possible benefits of incorporating the SSD algorithm with the ZED camera in self-driving vehicles are demonstrated by applications such as the autonomous golf buggy in the golf course, load automobiles on construction sites, and for other autonomous industries. Such applications allow for improved perception, increased safety, and effective navigation in a variety of dynamic environments. Autonomous vehicles will be far more reliable if their algorithms can adjust to varied lighting situations, diverse surroundings, and different object orientations.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.