Slijede razlike između dviju inačica stranice.
Starije izmjene na obje strane Starija izmjena Novija izmjena | Starija izmjena | ||
racfor_wiki:fdd:identifikacija_govornika [2022/05/26 11:23] vmuzevic [Speech Analysis] |
racfor_wiki:fdd:identifikacija_govornika [2024/12/05 12:24] (trenutno) |
||
---|---|---|---|
Redak 1: | Redak 1: | ||
- | ====== Identification of speech and speakers in a sound recording ====== | + | ====== Identification of speech and speakers in an audio recording ====== |
===== Abstract ===== | ===== Abstract ===== | ||
Redak 12: | Redak 12: | ||
- | IMAGE HERE | + | {{ racfor_wiki: |
This filter lowers all frequencies below approximately 80Hz and above 300Hz by 27dB, while boosting frequencies between 80Hz and 300Hz by 12dB. | This filter lowers all frequencies below approximately 80Hz and above 300Hz by 27dB, while boosting frequencies between 80Hz and 300Hz by 12dB. | ||
- | This filter however does not always produce ideal results, as the volume levels can still be too low or imbalanced. To resolve this, compression is applied to further balance volume levels of parts of the audio file, and in the case of excessive background noise a gate can be used to suppress said noise. Compressors are a type of amplifier where the gain is dependent on the signal passing through. For example, a compressor can be set to boost quieter sounds more than louder ones, or to reduce gains on any sounds where the amplitude is above a certain threshold. Noise gates reduce unwanted sounds in a similar fashion, however they completely remove any signals that are below the set amplitude, meaning any quiet distractions below the threshold are removed entirely. The final step involving audio enhancement is de-reverb and, once again, noise reduction. Audio recordings to be used as evidence are seldom recorded in ideal conditions and even with the aforementioned enhancements, | + | This filter however does not always produce ideal results, as the volume levels can still be too low or imbalanced. To resolve this, compression is applied to further balance volume levels of parts of the audio file, and in the case of excessive background noise a gate can be used to suppress said noise. Compressors are a type of amplifier where the gain is dependent on the signal passing through. For example, a compressor can be set to boost quieter sounds more than louder ones, or to reduce gains on any sounds where the amplitude is above a certain threshold. Noise gates reduce unwanted sounds in a similar fashion, however they completely remove any signals that are below the set amplitude, meaning any quiet distractions below the threshold are removed entirely |
Redak 25: | Redak 25: | ||
===== Speech Analysis ===== | ===== Speech Analysis ===== | ||
- | Once the audio is enhanced to an acceptable level, the problem we are now faced with is recognizing the speaker, which is known as Forensic Speaker Recognition (FSR). FSR is based on the theory that every human’s voice is unique, like fingerprints and DNA. The human voice is determined by a multitude of factors which, as a result of DNA being unique, are considered to influence a human’s voice and cause it to be unique. Speech is comprised of three main mechanisms, those being respiration, | + | Once the audio is enhanced to an acceptable level, the problem we are now faced with is recognizing the speaker, which is known as Forensic Speaker Recognition (FSR). |
+ | ==== The Human Voice ==== | ||
+ | FSR is based on the theory that every human’s voice is unique, like fingerprints and DNA. The human voice is determined by a multitude of factors which, as a result of DNA being unique, are considered to influence a human’s voice and cause it to be unique. Speech is comprised of three main mechanisms, those being respiration, | ||
In most cases, the result of audio enhancement is an audio file that has amplified speech and muffled noise. Speech analysis takes that file as input and with uses some techniques to classify speech as positive, negative or an unresolved identification. Before the existence of computers as we know them today, people also tried to identify speakers using only parts of the human voice that can be considered a factor of difference. Those factors are: | In most cases, the result of audio enhancement is an audio file that has amplified speech and muffled noise. Speech analysis takes that file as input and with uses some techniques to classify speech as positive, negative or an unresolved identification. Before the existence of computers as we know them today, people also tried to identify speakers using only parts of the human voice that can be considered a factor of difference. Those factors are: | ||
Redak 34: | Redak 36: | ||
From all these factors it can be determined if the person is male or female, their approximate age, where that person is from, and in some cases even what emotions that person is feeling. The problem with that approach is that people that are trying to identify speakers need to have excellent hearing and a lot of knowledge, experience and training, especially when audio files are of low quality. | From all these factors it can be determined if the person is male or female, their approximate age, where that person is from, and in some cases even what emotions that person is feeling. The problem with that approach is that people that are trying to identify speakers need to have excellent hearing and a lot of knowledge, experience and training, especially when audio files are of low quality. | ||
- | With advances in technology, new techniques were developed and many more will be. Some of those techniques include artificial intelligence, | + | ==== Automated FSR ==== |
+ | With advances in technology, new techniques were developed and many more will be. Some of those techniques include artificial intelligence, | ||
- | Spectrograms are great tools when it comes to speakers recognition. One of the applications to analyze spectrograms from audio files is Audacity. | + | ==== Tools for FSR ==== |
- | + | Spectrograms are great tools when it comes to speakers recognition. One of the applications to analyze spectrograms from audio files is Audacity. | |
- | In practice, aural and spectrographic methods are used in forensic speech recognition. The analyst is typically sent multiple recordings that need to be identified, along with multiple recordings of the suspect repeating the same thing. However, in the case that the second voice sample is to be obtained without the suspects knowledge, a method that is used is discreetly recording a conversation, | + | ==== Methods of FSR ==== |
+ | In practice, aural and spectrographic methods are used in forensic speech recognition. The analyst is typically sent multiple recordings that need to be identified, along with multiple recordings of the suspect repeating the same thing. However, in the case that the second voice sample is to be obtained without the suspects knowledge, a method that is used is discreetly recording a conversation, | ||
* Bandwidth | * Bandwidth | ||
* Mean frequency | * Mean frequency | ||
Redak 47: | Redak 51: | ||
* Atriculation | * Atriculation | ||
* Acoustic patterns | * Acoustic patterns | ||
- | Among others. These are closely examined if differences are due to a different pronunciation or a different speaker. In the case of sufficient evidence a positive identification or elimination is reached. In the case of insufficient evidence, a probable identification or elimination is reached. In the case that the audio is of too poor quality or contains too little information with which to compare, the conclusion is described as unresolved. | + | Among others. These are closely examined if differences are due to a different pronunciation or a different speaker. In the case of sufficient evidence a positive identification or elimination is reached. In the case of insufficient evidence, a probable identification or elimination is reached. In the case that the audio is of too poor quality or contains too little information with which to compare, the conclusion is described as unresolved |
Redak 58: | Redak 62: | ||
===== Conclusion ===== | ===== Conclusion ===== | ||
- | Forensic Speech Recognition is more important than ever in the modern digital age. Communication through audio channels is at an all time high with the use of modern phone networks and internet voice call applications. These communication methods are used by all kinds of people including criminals, and as such being able to prove who said what is important, even with anonymous calls and VPN's. Forensic Speech Recognition has been used to help solve cases ranging all the way back to 1923, and will surely continue to be used in the future. | + | Forensic Speech Recognition is more important than ever in the modern digital age. Communication through audio channels is at an all time high with the use of modern phone networks and internet voice call applications. These communication methods are used by all kinds of people including criminals, and as such being able to prove who said what is important, even with anonymous calls and VPN's. Forensic Speech Recognition has been used to help solve cases ranging all the way back to 1923 [6], and will surely continue to be used in the future. |
===== Literature ===== | ===== Literature ===== | ||
Redak 74: | Redak 78: | ||
[7] [[https:// | [7] [[https:// | ||
+ | |||
+ | [8] [[https:// | ||
+ | |||
+ | [9] [[https:// | ||
+ | |||
+ | ~~DISCUSSION~~ | ||