Final year thesis / capstone project

Developed a system to detect road hazards such as potholes and speed breakers, collecting 1,213 real-world images relevant to challenging road conditions in Bangladesh. Explored and evaluated multiple neural network architectures, including VGG16, Xception, ResNet50, YOLOv8, Vision Transformer, and Swin Transformer, using metrics like Precision, Recall, F1-score, and Accuracy. Achieved the best performance with the Swin Transformer (Accuracy 0.98, Precision 1.00, Recall 0.98, F1-score 0.99), demonstrating a reliable system for real-time road hazard detection

Figure: Block diagram of the proposed system

Figure: Modified head structure of Swin Transformer

A patch splitting module first divides the input image in non-overlapping patches in a way that Vision Transformer follows in figure 13. Each patch of data is regarded as a “token,” with its feature set to be a concatenation of the RGB values of the individual pixels. The feature is further projected to an arbitrary dimension which is referred to as C with the help of a linear embedding layer.

figure: network architecture diagram of Swin Transformer