Razlike

Slijede razlike između dviju inačica stranice.

--- racfor_wiki:fdd:identifikacija_govornika [2022/06/06 18:20]
vmuzevic [Literature]
+++ racfor_wiki:fdd:identifikacija_govornika [2024/12/05 12:24] (trenutno)
@@ Redak 37: / Redak 37: @@
 ==== Automated FSR ====
-With advances in technology, new techniques were developed and many more will be. Some of those techniques include artificial intelligence, algorithms from machine learning, deep learning, NLP etc. There is a project that recognizes Sepedi home language speakers and for that are used four classifier models such as Support Vector Machines, K-Nearest Neighbors, Multilayer Perceptrons (MLP) and Random Forest (RF) [1]. In another article [2] speaker recognition is done by deep learning model and usage of convolution neural network (CNN). That model is text-independent, which means it doesn’t take text meaning in the equation, and if the model were text-dependent it would be much more complex. Model works with spectrograms extracted from speech. Deep learning models are also capable of outperforming human analysts when it comes to recognizing speakers from short, so-called "trivial events", trivial events being sneezes, coughs, "hmmm" sounds and so on. Datasets for training such models do exist, such as this one on [[https://www.kaggle.com/datasets/kongaevans/speaker-recognition-dataset|Kaggle]] which features 1500 samples from five prominent world leaders, as well as background noise which can be combined into the training. [[https://www.robots.ox.ac.uk/~vgg/data/voxceleb/|VoxCeleb]] features a much larger scale with 7000 speakers, but the library is unable to be downloaded from the site at the time of writing and it is unclear whether it will ever be available again. Currently there are 203 public [[https://github.com/topics/speaker-recognition|GitHub]] repositories with a topic for "speaker-recognition".
+With advances in technology, new techniques were developed and many more will be. Some of those techniques include artificial intelligence, algorithms from machine learning, deep learning, NLP etc. There is a project that recognizes Sepedi home language speakers and for that are used four classifier models such as Support Vector Machines, K-Nearest Neighbors, Multilayer Perceptrons (MLP) and Random Forest (RF) [1]. In another article [2] speaker recognition is done by deep learning model and usage of convolution neural network (CNN). That model is text-independent, which means it doesn’t take text meaning in the equation, and if the model were text-dependent it would be much more complex. Model works with spectrograms extracted from speech. Deep learning models are also capable of outperforming human analysts when it comes to recognizing speakers from short, so-called "trivial events", trivial events being sneezes, coughs, "hmmm" sounds and so on [9]. Datasets for training such models do exist, such as this one on [[https://www.kaggle.com/datasets/kongaevans/speaker-recognition-dataset|Kaggle]] which features 1500 samples from five prominent world leaders, as well as background noise which can be combined into the training. [[https://www.robots.ox.ac.uk/~vgg/data/voxceleb/|VoxCeleb]] features a much larger scale with 7000 speakers, but the library is unable to be downloaded from the site at the time of writing and it is unclear whether it will ever be available again. Currently there are 203 public [[https://github.com/topics/speaker-recognition|GitHub]] repositories with a topic for "speaker-recognition".
 ==== Tools for FSR ====
@@ Redak 51: / Redak 51: @@
    * Atriculation
    * Acoustic patterns
-Among others. These are closely examined if differences are due to a different pronunciation or a different speaker. In the case of sufficient evidence a positive identification or elimination is reached. In the case of insufficient evidence, a probable identification or elimination is reached. In the case that the audio is of too poor quality or contains too little information with which to compare, the conclusion is described as unresolved [6].
+Among others. These are closely examined if differences are due to a different pronunciation or a different speaker. In the case of sufficient evidence a positive identification or elimination is reached. In the case of insufficient evidence, a probable identification or elimination is reached. In the case that the audio is of too poor quality or contains too little information with which to compare, the conclusion is described as unresolved [6][8].

racfor_wiki/fdd/identifikacija_govornika.1654539611.txt.gz · Zadnja izmjena: 2024/12/05 12:23 (vanjsko uređivanje)