Call For Paper Volume: V, Issue: 06 | JUNE 2026 | International Journal of Advanced Trends in Engineering and Management (IJATEM)
Volume | Issue | | Paper ID: IJATEM_ICRICC 23_017 | DOI: https://doi.org/10.59544/hozg9877/icricc23p17

Yolo Algorithm Based Multimodal Sentiment Analysis on Image and Text Data

J. Sindhuja A, C. Brintha

Sentiment analysis's objective is to examine public sentiment in a way that will support
corporate growth. It emphasizes emotions as well as polarity (positive, negative, and neutral).
It makes use of a variety of Natural Language Processing techniques, including Automatic,
Hybrid, and Rule-based. Users are becoming accustomed to uploading text and photographs
on social networks to express their feelings or ideas. As a result, multimodal sentiment analysis
has drawn more attention as a research area in recent years. Usually, an image has emotional
areas that trigger human emotion, which are typically expressed by corresponding words in
comments. Similarly, while writing visual descriptions, people frequently depict the emotive
areas of an image. As a result, for multimodal sentiment analysis, the association between
picture affective areas and the accompanying text is extremely important. This paper exhibits
one of the best CNN representatives You Only Look Once (YOLO), which breaks through the
CNN family's tradition and innovates a completely new way of solving object detection with
most simple and highly efficient way. Its name derives from the fact that, unlike earlier object
detector algorithms like Faster R-CNN, it only requires that an image or video travel once
through its network. Its outcomes surpassed the performance of Faster R-CNN greatly. The
performance of YOLO is compared with faster R-CNN in terms of accuracy and F1 Measure.