Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning

PL EN

O czasopiśmie Kolegium Redakcyjne Rada Naukowa Recenzenci Kodeks Etyczny Wydania Specjalne Terminy czasopisma RODO - informacje o przetwarzaniu danych osobowych Indeksacja Licencje i dostęp Archiwum Dla autorów Zasady publikacji Techniczna instrukcja dla autorów Umowa o udzielenie nieodpłatnej licencji CC BY-SA Oświadczenie autora o prawach autorskich Procedura publikacji Kontakt

PL EN

SZUKAJ

Archiwum

Procedura publikacji

Kontakt

O czasopiśmie Kolegium Redakcyjne Rada Naukowa Recenzenci Kodeks Etyczny Wydania Specjalne Terminy czasopisma RODO - informacje o przetwarzaniu danych osobowych Indeksacja Licencje i dostęp

Archiwum

Dla autorów Zasady publikacji Techniczna instrukcja dla autorów Umowa o udzielenie nieodpłatnej licencji CC BY-SA Oświadczenie autora o prawach autorskich

Procedura publikacji

Kontakt

Numer specjalny 3/2024 vol. 57

Pobierz cytowanie

PRACA POGLĄDOWA

Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning

Magdalena Hałas ¹

,

Michał Maj ¹

,

Ewa Guz ¹

,

Marcin Stencel ¹

,

Tomasz Cieplak ²

1

WSEI University

2

Lublin University of Technology

Data nadesłania: 28-05-2024

Data akceptacji: 13-07-2024

Data publikacji: 20-08-2024

Autor do korespondencji

Michał Maj

WSEI University

JoMS 2024;57(Numer specjalny 3):388-401

DOI: https://doi.org/10.13166/jms/191163

Referencje (20)

SŁOWA KLUCZOWE

Multi-task learning

Computer Vision

Person Classifier

Emotions Classifier

DZIEDZINY

STRESZCZENIE

The human face hides many secrets and is one of the most expressive human features. Human faces even contain hidden information about a person's personality. Considering the fundamental role of the human face, it is necessary to prepare appropriate deep-learning solutions that analyze human face data. This technology is becoming increasingly common in many industries, such as online retail, advertising testing, virtual makeovers, etc. For example, facial analysis technology now allows online shoppers to virtually apply makeup and try on jewelry or new glasses to get an accurate picture of what these products will look like. The human sense of hearing is a treasure trove of information about the current environment and the location and properties of sound-producing objects. For instance, we effortlessly absorb the sounds of birds singing outside the window, traffic passing in the distance, or the lyrics of a song on the radio. The human auditory system can process the intricate mix of sounds reaching our ears and create high-level abstractions of the environment by analyzing and grouping measured sensory signals. The process of obtaining segregation and identifying sources of a received complex acoustic signal, known as sound scene analysis, is a domain where the power of deep learning shines. The machine implementation of this functionality (separation and classification of sound sources) is pivotal in applications such as speech recognition in noise, automatic music transcription, searching and retrieving multimedia data, or recognizing emotions in statements.

Licencja

Ta praca jest dostępna na licencji Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA)

REFERENCJE (20)

1.

Caruana, R. (1997). Multitask Learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007....

2.

Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2010). ImageNet: A large-scale hierarchical image database. 248–255. https://doi.org/10.1109/cvpr.2....

3.

Gong, Y., Chung, Y.-A., & Glass, J. (2021). AST: Audio Spectrogram Transformer. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 1, 56–60. http://arxiv.org/abs/2104.0177....

4.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. https://doi.org/10.1109/CVPR.2....

5.

Kehtarnavaz, N. (2008). Frequency Domain Processing. In Digital Signal Processing System Design (pp. 175–196). Elsevier. https://doi.org/10.1016/b978-0....

6.

Khaireddin, Y., & Chen, Z. (2021a). Facial Emotion Recognition: State of the Art Performance on FER2013. http://arxiv.org/abs/2105.0358....

7.

Khaireddin, Y., & Chen, Z. (2021b). Facial Emotion Recognition: State of the Art Performance on FER2013. http://arxiv.org/abs/2105.0358....

8.

Liu, S., Pan, S. J., & Ho, Q. (2016). Distributed Multi-Task Relationship Learning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F129685, 937–946. http://arxiv.org/abs/1612.0402....

9.

Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). https://doi.org/10.5281/ZENODO....

10.

Long, M., Cao, Z., Wang, J., & Yu, P. S. (2015). Learning Multiple Tasks with Multilinear Relationship Networks. Advances in Neural Information Processing Systems, 2017-December, 1595–1604. http://arxiv.org/abs/1506.0211....

11.

Maj, M., Rymarczyk, T., Cieplak, T., & Pliszczuk, D. (2022). Deep learning model optimization for faster inference using multi-task learning for embedded systems. Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, 892–893. https://doi.org/10.1145/349524....

12.

Maj, M., Rymarczyk, T., Maciura, Ł., Cieplak, T., & Pliszczuk, D. (2023). Cross-Modal Perception for Customer Service. Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, 1–3. https://doi.org/10.1145/357036....

13.

Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-Stitch Networks for Multi-task Learning. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 3994–4003. https://doi.org/10.1109/CVPR.2....

14.

Muaidi, H., Al-Ahmad, A., Khdoor, T., Alqrainy, S., & Alkoffash, M. (2014). Arabic audio news retrieval system using dependent speaker mode, mel frequency cepstral coefficient and dynamic time warping techniques. Research Journal of Applied Sciences, Engineering and Technology, 7(24), 5082–5097. https://doi.org/10.19026/rjase....

15.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, 32. http://arxiv.org/abs/1912.0170....

16.

Rao, P. (2008). Audio signal processing. In Studies in Computational Intelligence (Vol. 83, pp. 169–189). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-....

17.

Ruder, S., Bingel, J., Augenstein, I., & Søgaard, A. (2017). Latent Multi-task Architecture Learning. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 4822–4829. http://arxiv.org/abs/1705.0814....

18.

Sarker, I. H. (2021). Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. In SN Computer Science (Vol. 2, Issue 6, p. 420). Springer. https://doi.org/10.1007/s42979....

19.

Wang, M., & Deng, W. (2018). Deep Face Recognition: A Survey. Neurocomputing, 429, 215–244. https://doi.org/10.1016/j.neuc....

20.

Wyse, L. (2017). Audio Spectrogram Representations for Processing with Convolutional Neural Networks. http://arxiv.org/abs/1706.0955....

Wyślij swój artykuł

Udostępnij

ARTYKUŁ POWIĄZANY

Machine learning and IoT system for real-time cough detection and classification

Indeksy

Indeks słów kluczowych

Indeks dziedzin

Indeks autorów

eISSN:	2391-789X
ISSN:	1734-2031

© 2006-2025 Journal hosting platform by Bentus

Scroll to top