Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning

Skip to content

PL EN

PL EN

SEARCH

Archive

Publication procedure

Contact

About the Journal Editorial Board Scientific Council Reviewers Ethical Code AI Policy Special Issue GDRP - information on the processing of personal data Terms of the journal Indexing Licenses & Access

Archive

For Authors Publishing Policy Technical instruction for the authors Agreement CC BY-SA Copyright statement

Publication procedure

Peer Review Process Reviewer’s form .doc Reviewer’s form .pdf

Contact

Numer specjalny 3/2024 vol. 57

REVIEW PAPER

Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning

Magdalena Hałas ¹

,

Michał Maj ¹

,

Ewa Guz ¹

,

Marcin Stencel ¹

,

Tomasz Cieplak ²

1

WSEI University

2

Lublin University of Technology

Submission date: 2024-05-28

Acceptance date: 2024-07-13

Publication date: 2024-08-20

Corresponding author

Michał Maj

WSEI University

JoMS 2024;57(Numer specjalny 3):388-401

DOI: https://doi.org/10.13166/jms/191163

References (20)

KEYWORDS

Multi-task learning

Computer Vision

Person Classifier

Emotions Classifier

TOPICS

Psychology

ABSTRACT

The human face hides many secrets and is one of the most expressive human features. Human faces even contain hidden information about a person's personality. Considering the fundamental role of the human face, it is necessary to prepare appropriate deep-learning solutions that analyze human face data. This technology is becoming increasingly common in many industries, such as online retail, advertising testing, virtual makeovers, etc. For example, facial analysis technology now allows online shoppers to virtually apply makeup and try on jewelry or new glasses to get an accurate picture of what these products will look like. The human sense of hearing is a treasure trove of information about the current environment and the location and properties of sound-producing objects. For instance, we effortlessly absorb the sounds of birds singing outside the window, traffic passing in the distance, or the lyrics of a song on the radio. The human auditory system can process the intricate mix of sounds reaching our ears and create high-level abstractions of the environment by analyzing and grouping measured sensory signals. The process of obtaining segregation and identifying sources of a received complex acoustic signal, known as sound scene analysis, is a domain where the power of deep learning shines. The machine implementation of this functionality (separation and classification of sound sources) is pivotal in applications such as speech recognition in noise, automatic music transcription, searching and retrieving multimedia data, or recognizing emotions in statements.

License

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA)

REFERENCES (20)

1.

Caruana, R. (1997). Multitask Learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007....

2.

Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2010). ImageNet: A large-scale hierarchical image database. 248–255. https://doi.org/10.1109/cvpr.2....

3.

Gong, Y., Chung, Y.-A., & Glass, J. (2021). AST: Audio Spectrogram Transformer. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 1, 56–60. http://arxiv.org/abs/2104.0177....

4.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. https://doi.org/10.1109/CVPR.2....

5.

Kehtarnavaz, N. (2008). Frequency Domain Processing. In Digital Signal Processing System Design (pp. 175–196). Elsevier. https://doi.org/10.1016/b978-0....

6.

Khaireddin, Y., & Chen, Z. (2021a). Facial Emotion Recognition: State of the Art Performance on FER2013. http://arxiv.org/abs/2105.0358....

7.

Khaireddin, Y., & Chen, Z. (2021b). Facial Emotion Recognition: State of the Art Performance on FER2013. http://arxiv.org/abs/2105.0358....

8.

Liu, S., Pan, S. J., & Ho, Q. (2016). Distributed Multi-Task Relationship Learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F129685, 937–946. http://arxiv.org/abs/1612.0402....

9.

Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). https://doi.org/10.5281/ZENODO....

10.

Long, M., Cao, Z., Wang, J., & Yu, P. S. (2015). Learning Multiple Tasks with Multilinear Relationship Networks. Advances in Neural Information Processing Systems, 2017-December, 1595–1604. http://arxiv.org/abs/1506.0211....

11.

Maj, M., Rymarczyk, T., Cieplak, T., & Pliszczuk, D. (2022). Deep learning model optimization for faster inference using multi-task learning for embedded systems. Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, 892–893. https://doi.org/10.1145/349524....

12.

Maj, M., Rymarczyk, T., Maciura, Ł., Cieplak, T., & Pliszczuk, D. (2023). Cross-Modal Perception for Customer Service. Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 1–3. https://doi.org/10.1145/357036....

13.

Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-Stitch Networks for Multi-task Learning. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 3994–4003. https://doi.org/10.1109/CVPR.2....

14.

Muaidi, H., Al-Ahmad, A., Khdoor, T., Alqrainy, S., & Alkoffash, M. (2014). Arabic audio news retrieval system using dependent speaker mode, mel frequency cepstral coefficient and dynamic time warping techniques. Research Journal of Applied Sciences, Engineering and Technology, 7(24), 5082–5097. https://doi.org/10.19026/rjase....

15.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, 32. http://arxiv.org/abs/1912.0170....

16.

Rao, P. (2008). Audio signal processing. In Studies in Computational Intelligence (Vol. 83, pp. 169–189). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-....

17.

Ruder, S., Bingel, J., Augenstein, I., & Søgaard, A. (2017). Latent Multi-task Architecture Learning. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 4822–4829. http://arxiv.org/abs/1705.0814....

18.

Sarker, I. H. (2021). Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. In SN Computer Science (Vol. 2, Issue 6, p. 420). Springer. https://doi.org/10.1007/s42979....

19.

Wang, M., & Deng, W. (2018). Deep Face Recognition: A Survey. Neurocomputing, 429, 215–244. https://doi.org/10.1016/j.neuc....

20.

Wyse, L. (2017). Audio Spectrogram Representations for Processing with Convolutional Neural Networks. http://arxiv.org/abs/1706.0955....

Submit your paper

Share

RELATED ARTICLE

Machine learning and IoT system for real-time cough detection and classification

Indexes

eISSN:	2391-789X
ISSN:	1734-2031

© 2006-2026 Journal hosting platform by Bentus

Scroll to top