REVIEW PAPER
Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning
,
 
,
 
 
 
More details
Hide details
1
WSEI University
 
2
Lublin University of Technology
 
 
Submission date: 2024-05-28
 
 
Acceptance date: 2024-07-13
 
 
Publication date: 2024-08-20
 
 
Corresponding author
Michał Maj   

WSEI University
 
 
JoMS 2024;57(Numer specjalny 3):388-401
 
KEYWORDS
TOPICS
ABSTRACT
The human face hides many secrets and is one of the most expressive human features. Human faces even contain hidden information about a person's personality. Considering the fundamental role of the human face, it is necessary to prepare appropriate deep-learning solutions that analyze human face data. This technology is becoming increasingly common in many industries, such as online retail, advertising testing, virtual makeovers, etc. For example, facial analysis technology now allows online shoppers to virtually apply makeup and try on jewelry or new glasses to get an accurate picture of what these products will look like. The human sense of hearing is a treasure trove of information about the current environment and the location and properties of sound-producing objects. For instance, we effortlessly absorb the sounds of birds singing outside the window, traffic passing in the distance, or the lyrics of a song on the radio. The human auditory system can process the intricate mix of sounds reaching our ears and create high-level abstractions of the environment by analyzing and grouping measured sensory signals. The process of obtaining segregation and identifying sources of a received complex acoustic signal, known as sound scene analysis, is a domain where the power of deep learning shines. The machine implementation of this functionality (separation and classification of sound sources) is pivotal in applications such as speech recognition in noise, automatic music transcription, searching and retrieving multimedia data, or recognizing emotions in statements.
 
REFERENCES (20)
1.
Caruana, R. (1997). Multitask Learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007....
 
2.
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2010). ImageNet: A large-scale hierarchical image database. 248–255. https://doi.org/10.1109/cvpr.2....
 
3.
Gong, Y., Chung, Y.-A., & Glass, J. (2021). AST: Audio Spectrogram Transformer. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 1, 56–60. http://arxiv.org/abs/2104.0177....
 
4.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. https://doi.org/10.1109/CVPR.2....
 
5.
Kehtarnavaz, N. (2008). Frequency Domain Processing. In Digital Signal Processing System Design (pp. 175–196). Elsevier. https://doi.org/10.1016/b978-0....
 
6.
Khaireddin, Y., & Chen, Z. (2021a). Facial Emotion Recognition: State of the Art Performance on FER2013. http://arxiv.org/abs/2105.0358....
 
7.
Khaireddin, Y., & Chen, Z. (2021b). Facial Emotion Recognition: State of the Art Performance on FER2013. http://arxiv.org/abs/2105.0358....
 
8.
Liu, S., Pan, S. J., & Ho, Q. (2016). Distributed Multi-Task Relationship Learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F129685, 937–946. http://arxiv.org/abs/1612.0402....
 
9.
Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). https://doi.org/10.5281/ZENODO....
 
10.
Long, M., Cao, Z., Wang, J., & Yu, P. S. (2015). Learning Multiple Tasks with Multilinear Relationship Networks. Advances in Neural Information Processing Systems, 2017-December, 1595–1604. http://arxiv.org/abs/1506.0211....
 
11.
Maj, M., Rymarczyk, T., Cieplak, T., & Pliszczuk, D. (2022). Deep learning model optimization for faster inference using multi-task learning for embedded systems. Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, 892–893. https://doi.org/10.1145/349524....
 
12.
Maj, M., Rymarczyk, T., Maciura, Ł., Cieplak, T., & Pliszczuk, D. (2023). Cross-Modal Perception for Customer Service. Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 1–3. https://doi.org/10.1145/357036....
 
13.
Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-Stitch Networks for Multi-task Learning. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 3994–4003. https://doi.org/10.1109/CVPR.2....
 
14.
Muaidi, H., Al-Ahmad, A., Khdoor, T., Alqrainy, S., & Alkoffash, M. (2014). Arabic audio news retrieval system using dependent speaker mode, mel frequency cepstral coefficient and dynamic time warping techniques. Research Journal of Applied Sciences, Engineering and Technology, 7(24), 5082–5097. https://doi.org/10.19026/rjase....
 
15.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, 32. http://arxiv.org/abs/1912.0170....
 
16.
Rao, P. (2008). Audio signal processing. In Studies in Computational Intelligence (Vol. 83, pp. 169–189). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-....
 
17.
Ruder, S., Bingel, J., Augenstein, I., & Søgaard, A. (2017). Latent Multi-task Architecture Learning. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 4822–4829. http://arxiv.org/abs/1705.0814....
 
18.
Sarker, I. H. (2021). Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. In SN Computer Science (Vol. 2, Issue 6, p. 420). Springer. https://doi.org/10.1007/s42979....
 
19.
Wang, M., & Deng, W. (2018). Deep Face Recognition: A Survey. Neurocomputing, 429, 215–244. https://doi.org/10.1016/j.neuc....
 
20.
Wyse, L. (2017). Audio Spectrogram Representations for Processing with Convolutional Neural Networks. http://arxiv.org/abs/1706.0955....
 
eISSN:2391-789X
ISSN:1734-2031
Journals System - logo
Scroll to top