| Starije izmjene na obje strane
Starija izmjena
Novija izmjena | Starija izmjena | 
                        
                | racfor_wiki:fdd:identifikacija_govornika [2022/06/06 18:15] vmuzevic [Methods of FSR]
 | racfor_wiki:fdd:identifikacija_govornika [2024/12/05 12:24] (trenutno) 
 | 
        
|  |  | 
| ==== Automated FSR ==== | ==== Automated FSR ==== | 
| With advances in technology, new techniques were developed and many more will be. Some of those techniques include artificial intelligence, algorithms from machine learning, deep learning, NLP etc. There is a project that recognizes Sepedi home language speakers and for that are used four classifier models such as Support Vector Machines, K-Nearest Neighbors, Multilayer Perceptrons (MLP) and Random Forest (RF) [1]. In another article [2] speaker recognition is done by deep learning model and usage of convolution neural network (CNN). That model is text-independent, which means it doesn’t take text meaning in the equation, and if the model were text-dependent it would be much more complex. Model works with spectrograms extracted from speech. Deep learning models are also capable of outperforming human analysts when it comes to recognizing speakers from short, so-called "trivial events", trivial events being sneezes, coughs, "hmmm" sounds and so on. Datasets for training such models do exist, such as this one on [[https://www.kaggle.com/datasets/kongaevans/speaker-recognition-dataset|Kaggle]] which features 1500 samples from five prominent world leaders, as well as background noise which can be combined into the training. [[https://www.robots.ox.ac.uk/~vgg/data/voxceleb/|VoxCeleb]] features a much larger scale with 7000 speakers, but the library is unable to be downloaded from the site at the time of writing and it is unclear whether it will ever be available again. Currently there are 203 public [[https://github.com/topics/speaker-recognition|GitHub]] repositories with a topic for "speaker-recognition". | With advances in technology, new techniques were developed and many more will be. Some of those techniques include artificial intelligence, algorithms from machine learning, deep learning, NLP etc. There is a project that recognizes Sepedi home language speakers and for that are used four classifier models such as Support Vector Machines, K-Nearest Neighbors, Multilayer Perceptrons (MLP) and Random Forest (RF) [1]. In another article [2] speaker recognition is done by deep learning model and usage of convolution neural network (CNN). That model is text-independent, which means it doesn’t take text meaning in the equation, and if the model were text-dependent it would be much more complex. Model works with spectrograms extracted from speech. Deep learning models are also capable of outperforming human analysts when it comes to recognizing speakers from short, so-called "trivial events", trivial events being sneezes, coughs, "hmmm" sounds and so on [9]. Datasets for training such models do exist, such as this one on [[https://www.kaggle.com/datasets/kongaevans/speaker-recognition-dataset|Kaggle]] which features 1500 samples from five prominent world leaders, as well as background noise which can be combined into the training. [[https://www.robots.ox.ac.uk/~vgg/data/voxceleb/|VoxCeleb]] features a much larger scale with 7000 speakers, but the library is unable to be downloaded from the site at the time of writing and it is unclear whether it will ever be available again. Currently there are 203 public [[https://github.com/topics/speaker-recognition|GitHub]] repositories with a topic for "speaker-recognition". | 
|  |  | 
| ==== Tools for FSR ==== | ==== Tools for FSR ==== | 
| * Atriculation | * Atriculation | 
| * Acoustic patterns | * Acoustic patterns | 
| Among others. These are closely examined if differences are due to a different pronunciation or a different speaker. In the case of sufficient evidence a positive identification or elimination is reached. In the case of insufficient evidence, a probable identification or elimination is reached. In the case that the audio is of too poor quality or contains too little information with which to compare, the conclusion is described as unresolved [6]. | Among others. These are closely examined if differences are due to a different pronunciation or a different speaker. In the case of sufficient evidence a positive identification or elimination is reached. In the case of insufficient evidence, a probable identification or elimination is reached. In the case that the audio is of too poor quality or contains too little information with which to compare, the conclusion is described as unresolved [6][8]. | 
|  |  | 
|  |  | 
|  |  | 
| [7] [[https://voicefoundation.org/health-science/voice-disorders/anatomy-physiology-of-voice-production/#:~:text=Resonance%3A%20Voice%20sound%20is%20amplified,lips)%20modify%20the%20voiced%20sound|“Voice Anatomy & Physiology.” THE VOICE FOUNDATION, 30 July 2015.]] | [7] [[https://voicefoundation.org/health-science/voice-disorders/anatomy-physiology-of-voice-production/#:~:text=Resonance%3A%20Voice%20sound%20is%20amplified,lips)%20modify%20the%20voiced%20sound|“Voice Anatomy & Physiology.” THE VOICE FOUNDATION, 30 July 2015.]] | 
|  |  | 
|  | [8] [[https://ieeexplore.ieee.org/document/8093505|M. M. Karakoç and A. Varol, "Visual and auditory analysis methods for speaker recognition in digital forensic," 2017 International Conference on Computer Science and Engineering (UBMK), 2017, pp. 1113-1116, doi: 10.1109/UBMK.2017.8093505.]] | 
|  |  | 
|  | [9] [[https://ieeexplore.ieee.org/document/8462027|M. Zhang et al., "Human and Machine Speaker Recognition Based on Short Trivial Events," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5009-5013, doi: 10.1109/ICASSP.2018.8462027.]] | 
|  |  | 
| ~~DISCUSSION~~ | ~~DISCUSSION~~ | 
|  |  |