Analysis of the effects of signal distance on spectrograms

Similar documents
Welcome to Vibrationdata

Week 6 - Consonants Mark Huckvale

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Pitch-Synchronous Spectrogram: Principles and Applications

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg

Comparison Parameters and Speaker Similarity Coincidence Criteria:

EVTA SESSION HELSINKI JUNE 06 10, 2012

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Music 170: Wind Instruments

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

Simple Harmonic Motion: What is a Sound Spectrum?

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

2. AN INTROSPECTION OF THE MORPHING PROCESS

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

SOUND LABORATORY LING123: SOUND AND COMMUNICATION

Acoustical comparison of bassoon crooks

Assessing and Measuring VCR Playback Image Quality, Part 1. Leo Backman/DigiOmmel & Co.

Rhythm and Melody Aspects of Language and Music

Swept-tuned spectrum analyzer. Gianfranco Miele, Ph.D

Vocal-tract Influence in Trombone Performance

Pitch. There is perhaps no aspect of music more important than pitch. It is notoriously

Kent Academic Repository

MULTISIM DEMO 9.5: 60 HZ ACTIVE NOTCH FILTER

A comparison of the acoustic vowel spaces of speech and song*20

CHAPTER 20.2 SPEECH AND MUSICAL SOUNDS

The role of vocal tract resonances in singing and in playing wind instruments

FPFV-285/585 PRODUCTION SOUND Fall 2018 CRITICAL LISTENING Assignment

Understanding Layered Noise Reduction

Laboratory 5: DSP - Digital Signal Processing

FLOW INDUCED NOISE REDUCTION TECHNIQUES FOR MICROPHONES IN LOW SPEED WIND TUNNELS

Application Note AN-708 Vibration Measurements with the Vibration Synchronization Module

ANALYSING DIFFERENCES BETWEEN THE INPUT IMPEDANCES OF FIVE CLARINETS OF DIFFERENT MAKES

THE DIGITAL DELAY ADVANTAGE A guide to using Digital Delays. Synchronize loudspeakers Eliminate comb filter distortion Align acoustic image.

3 Voiced sounds production by the phonatory system

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Music for the Hearing Care Professional Published on Sunday, 14 March :24

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

EE513 Audio Signals and Systems. Introduction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

Acoustic concert halls (Statistical calculation, wave acoustic theory with reference to reconstruction of Saint- Petersburg Kapelle and philharmonic)

Quarterly Progress and Status Report. Formant frequency tuning in singing

Computer-based sound spectrograph system

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Voice source and acoustic measures of girls singing classical and contemporary commercial styles

Advanced Signal Processing 2

Spectral Sounds Summary

Quarterly Progress and Status Report. X-ray study of articulation and formant frequencies in two female singers

Digital music synthesis using DSP

Signal Processing. Case Study - 3. It s Too Loud. Hardware. Sound Levels

WAVES Scheps Parallel Particles. User Guide

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

DETECTING ENVIRONMENTAL NOISE WITH BASIC TOOLS

Electrical and Electronic Laboratory Faculty of Engineering Chulalongkorn University. Cathode-Ray Oscilloscope (CRO)

UNIVERSITY OF DUBLIN TRINITY COLLEGE

The BAT WAVE ANALYZER project

Music 209 Advanced Topics in Computer Music Lecture 1 Introduction

Does Saxophone Mouthpiece Material Matter? Introduction

Open Rack Specification 2.1 Update

PicoScope 6407 Digitizer

Sunday, 17 th September, 2006 Fairborn OH

Real-time magnetic resonance imaging investigation of resonance tuning in soprano singing

Experiment 9A: Magnetism/The Oscilloscope

Guide to Analysing Full Spectrum/Frequency Division Bat Calls with Audacity (v.2.0.5) by Thomas Foxley

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

CTP 431 Music and Audio Computing. Basic Acoustics. Graduate School of Culture Technology (GSCT) Juhan Nam

White Paper JBL s LSR Principle, RMC (Room Mode Correction) and the Monitoring Environment by John Eargle. Introduction and Background:

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

Quest Chapter 26. Flying bees buzz. What could they be doing that generates sound? What type of wave is sound?

The Cathode Ray Tube

SSTV Transmission Methodology

Physics and Neurophysiology of Hearing

AP-40. AP-40 Series Features Industry s smallest-sensor head Ultra lightweight High-speed response Two-color LED digital pressure display

Jaw Harp: An Acoustic Study. Acoustical Physics of Music Spring 2015 Simon Li

How do clarinet players adjust the resonances of their vocal tracts for different playing effects?

Technical Guide. Installed Sound. Loudspeaker Solutions for Worship Spaces. TA-4 Version 1.2 April, Why loudspeakers at all?

Harmonic Analysis of the Soprano Clarinet

Gain/Attenuation Settings in RTSA P, 418 and 427

The Distortion Magnifier

Semester A, LT4223 Experimental Phonetics Written Report. An acoustic analysis of the Korean plosives produced by native speakers

PicoScope 6407 Digitizer

the mathematics of the voice. As musicians, we d both been frustrated with groups inability to

Spatio-temporal inaccuracies of video-based ultrasound images of the tongue

Ver.mob Quick start

Glossary of Singing Voice Terminology

Gyrophone: Recognizing Speech From Gyroscope Signals

Correlating differences in the playing properties of five student model clarinets with physical differences between them

Outline ip24 ipad app user guide. App release 2.1

OSCILLOSCOPE AND DIGITAL MULTIMETER

(Adapted from Chicago NATS Chapter PVA Book Discussion by Chadley Ballantyne. Answers by Ken Bozeman)

Saxophonists tune vocal tract resonances in advanced performance techniques

Rechnergestützte Methoden für die Musikethnologie: Tool time!

Rev.D SECTION 10. Acoustics

Understanding. FFT Overlap Processing. A Tektronix Real-Time Spectrum Analyzer Primer

Interactions between the player's windway and the air column of a musical instrument 1

How We Sing: The Science Behind Our Musical Voice. Music has been an important part of culture throughout our history, and vocal

Supplementary Course Notes: Continuous vs. Discrete (Analog vs. Digital) Representation of Information

Sub Kick This particular miking trick is one that can be used to bring great low-end presence to the kick drum.

Data Converter Overview: DACs and ADCs. Dr. Paul Hasler and Dr. Philip Allen

FC Cincinnati Stadium Environmental Noise Model

Activity P32: Variation of Light Intensity (Light Sensor)

Transcription:

2014 Analysis of the effects of signal distance on spectrograms SGHA 8/19/2014

Contents Introduction... 3 Scope... 3 Data Comparisons... 5 Results... 10 Recommendations... 10 References... 11

Introduction This article is a summary on back ground data that was collected measure the effects of speech on a spectrogram as the distance between the speaker and the microphone were increased. In effect, we are increasing the signal to noise ratio by making the signal weaker. The phrase "The quick brown fox jumped over the lazy dog" was spoken next the microphone. The same phrase was repeated at 10 foot intervals up to the final distance of 60 feet from the recorder. A DMR-40 four channel recorder was used with a Shure model microphone to record the audio. The spectrograms were analyzed with three programs, PRAAT, Voice Analyzer and Sonic Visualiser. Scope A spectrogram is a visual representation of the frequency content of a signal. A spectrogram shows how the quantity of energy in different frequency regions varies as a function of time. On a spectrogram, the signal is divided into many small time sections and each section is analyzed in terms of what frequency components are present in the section. This analysis is called spectral analysis because the spectrum of each section is calculated and the quantity of each frequency component (that is each sinusoid) is measured from the spectrum. The quantity of each component is then converted to a grey level in which (normally) low energy components are converted to a white color, while high energy components are converted to a black color. These colors are then plotted on a vertical strip corresponding to the time at which the original signal segment occurred. The height of the colored element on this vertical strip represents the frequency of the component. Thus a spectrogram is a 3-dimensional analysis of a signal, the horizontal dimension is time, the vertical dimension is frequency, and the grey-scale shows the amount of energy occurring in the signal at each time and frequency. If you study a wide-band spectrogram of say a couple of words of speech you should be able to see some of the following events: Larynx excitation pulses: these appear as vertical dark lines at intervals of between 5 and 10ms or so. These are also called striations. Each one of these is caused by the sudden pressure change that arises above the larynx when the vocal folds close suddenly, cutting off the flow of air from the lungs. This change

is so sudden it creates a kind of pressure "pulse" which contains energy at a wide range of frequencies, commonly up to 4kHz or more. Formant vibrations: between the striations you will see dark regions which only occur at particular frequencies. These regions, which often appear several hundred Hz across because of the limitations of the wide-band frequency analysis of the spectrograph, are caused by the ringing of the vocal tract resonances (or formants) as each pulse from the larynx excites them. If you look carefully, you may see that these vibrations are larger (darker) just after the pulse, and get paler as the energy in the vibrations is lost from the vocal tract. Changes in formant frequency: You should see that the dark regions caused by formant vibration change in frequency through the utterance. You may see that the resonances have a kind of continuity in time, and slowly rise and fall in frequency through a syllable, and from one syllable to the next. These slow and smooth changes in formant frequencies are because the frequencies of the vocal tract resonances are set by the shape of the vocal tract tube, which in turn is controlled by the position of the articulators. Since the articulators move relatively slowly (a few syllables per second) the formant frequencies appear to move slowly too. Turbulent sounds: In regions where there is no larynx vibration and hence no striations you should see some "speckled" rather noisy unstructured regions of dark color, often towards the high frequency end of the picture. These are "noise" sounds caused by turbulence in the vocal tract, for instance: bursts, aspiration and frication. Bursts are often short vertical bars, a lot like a striation, caused by the sudden pressure change in the vocal tract when a stop articulation is released. Aspiration is turbulence that occurs in the larynx, caused by a narrowing of the airway from the lungs made by the vocal folds coming close together. Frication is turbulence that occurs at other points of narrowing in the vocal tract, made with the tongue or the lips. If you look carefully you may see differences in the frequency content of bursts or fricatives originating from different places of articulation. This is because the different articulator configurations shape the sound generated by the turbulence in different ways depending on the size and shape of the vocal tract tube in front of the constriction. In the vowels, F1 can vary from 300 Hz to 1000 Hz. The lower it is, the closer the tongue is to the roof of the mouth. The vowel /i:/ as in the word 'beet' has one of the lowest F1 values - about 300 Hz; in contrast, the vowel /A/ as in the word 'bought' (or 'Bob' in speakers who distinguish the vowels in the two words)

has the highest F1 value - about 950 Hz. Pronounce these two vowels and try to determine how your tongue is configured for each. F2 can vary from 850 Hz to 2500 Hz; the F2 value is proportional to the frontness or backness of the highest part of the tongue during the production of the vowel. Data Comparisons The first comparisons of the data were analyzed by using Voice Analyzer software. Spectrogram with the speaker next to the recorder Spectrogram with the speaker 10' away from the recorder. Spectrogram with the speaker 20' away from the recorder.

Spectrogram with the speaker 30' away from the recorder. Spectrogram with the speaker 40' away from the recorder. Spectrogram with the speaker 50' away from the recorder. Spectrogram with the speaker 60' away from the recorder. The higher and mid-range frequencies are initially affected by the strength of the signal in relation to the distance of the speaker. The next image is a comparison between the 0' sample and the 60' sample in Sonic Visualiser. Some formant vibrations are still visible in the weaker signal.

The next analysis was performed using PRAAT software. The samples used for comparison are the samples taken at 0' and 60' from the recorder.

When speech is corrupted by stationary noise, it creates missing features in the spectrogram. The first thing to vanish is most of the higher frequencies, then the midrange frequencies that are at lower decibels. The next test was to replicate the standard analysis technique used by most ghost hunters and paranormal researchers. The sample that was recorded at 60' from the microphone was amplified by 60dB. A noise profile was then captured and noise reduction was applied to the audio file. The spectrogram of the new file was then analyzed in PRAAT.

PRAAT Analysis showing the missing segments of speech in the 60' sample (bottom) that has been amplified and had noise reduction applied. The sample recorded at microphone (o') is on top. Large amounts of speech have been lost because of application of noise reduction. Below is the same comparison but the sample recorded at 60' has not been amplified or had noise reduction applied. More segments of speech are visible that would increase the accuracy of determining what was spoken.

Results The study clearly shows that as the signal becomes weaker and embedded in the noise floor, segments of speech are lost. The techniques used by ghost hunters and other paranormal enthusiasts compound the problem by applying noise reduction in an attempt to hear the voice more clearly. This process also destroys essential elements of the speech which increases the probability of pareidolia when attempting to identify words. Recommendations By using software designed for the analysis of spectrograms, it is possible to identify vowels and other features of speech without applying noise reduction. This is accomplished by selecting a section in the middle of the formant and measuring the frequencies of the F1 and F2 formants. These frequencies are then plotted against a vowel triangle to determine the most probable vowel that is in the audio file.

Using our data, we were able to identify the same vowels in the sample recorded at the recorder and the sample recorded 60' from the recorder. There were some minor differences in the frequencies of the F1 and F2 formants but they were within the tolerance of the frequencies typically assigned to the vowels on the triangle. References http://www.speechandhearing.net/faq/faq1.php