Short-Time Fourier Transform

Similar documents
Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Lecture 10 Harmonic/Percussive Separation

Lecture 9 Source Separation

Supplementary Course Notes: Continuous vs. Discrete (Analog vs. Digital) Representation of Information

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Experiments on musical instrument separation using multiplecause

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Tempo and Beat Analysis

CZT vs FFT: Flexibility vs Speed. Abstract

CSC475 Music Information Retrieval

Robert Alexandru Dobre, Cristian Negrescu

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

Dr. David A. Clifton Group Leader Computational Health Informatics (CHI) Lab Lecturer in Engineering Science, Balliol College

Real-time spectrum analyzer. Gianfranco Miele, Ph.D

Sampling. Sampling. CS 450: Introduction to Digital Signal and Image Processing. Bryan Morse BYU Computer Science

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

QUIZ. Explain in your own words the two types of changes that a signal experiences while propagating. Give examples!

ni.com Digital Signal Processing for Every Application

Please feel free to download the Demo application software from analogarts.com to help you follow this seminar.

FFT Laboratory Experiments for the HP Series Oscilloscopes and HP 54657A/54658A Measurement Storage Modules

Lab 5 Linear Predictive Coding

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Music Segmentation Using Markov Chain Methods

MUSIC/AUDIO ANALYSIS IN PYTHON. Vivek Jayaram

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

Course Web site:

ELEC 691X/498X Broadcast Signal Transmission Fall 2015

Real-Time Spectrogram (RTS tm )

CAP240 First semester 1430/1431. Sheet 4

DIGITAL COMMUNICATION

Module 1: Digital Video Signal Processing Lecture 3: Characterisation of Video raster, Parameters of Analog TV systems, Signal bandwidth

Brain-Computer Interface (BCI)

Effects of acoustic degradations on cover song recognition

Voice Controlled Car System

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

ECE438 - Laboratory 4: Sampling and Reconstruction of Continuous-Time Signals

Introduction to Data Conversion and Processing

Department of Communication Engineering Digital Communication Systems Lab CME 313-Lab

Digital music synthesis using DSP

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Figure 1: Feature Vector Sequence Generator block diagram.

Music Genre Classification

Spectrum Analyser Basics

PS User Guide Series Seismic-Data Display

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

10:15-11 am Digital signal processing

Fundamentals of DSP Chap. 1: Introduction

Introduction to Digital Signal Processing (DSP)

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Realizing Waveform Characteristics up to a Digitizer s Full Bandwidth Increasing the effective sampling rate when measuring repetitive signals

Agilent PN Time-Capture Capabilities of the Agilent Series Vector Signal Analyzers Product Note

Laboratory 5: DSP - Digital Signal Processing

Experiment P32: Sound Waves (Sound Sensor)

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

GG450 4/12/2010. Today s material comes from p in the text book. Please read and understand all of this material!

Polyphonic music transcription through dynamic networks and spectral pattern identification

Music Genre Classification and Variance Comparison on Number of Genres

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

Auto-Tune. Collection Editors: Navaneeth Ravindranath Tanner Songkakul Andrew Tam

What s New in Raven May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Features for Audio and Music Classification

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Singing Pitch Extraction and Singing Voice Separation

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

A prototype system for rule-based expressive modifications of audio recordings

Topic 10. Multi-pitch Analysis

A Survey of Audio-Based Music Classification and Annotation

Analysis, Synthesis, and Perception of Musical Sounds

Spectral toolkit: practical music technology for spectralism-curious composers MICHAEL NORRIS

REPORT DOCUMENTATION PAGE

1 Overview. 1.1 Digital Images GEORGIA INSTITUTE OF TECHNOLOGY. ECE 2026 Summer 2018 Lab #5: Sampling: A/D and D/A & Aliasing

Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series

Experiment 2: Sampling and Quantization

Audio spectrogram representations for processing with Convolutional Neural Networks

Virtual Vibration Analyzer

The Effect of Time-Domain Interpolation on Response Spectral Calculations. David M. Boore

Digital Fundamentals. Introduction to Digital Signal Processing

1 Overview. 1.1 Digital Images GEORGIA INSTITUTE OF TECHNOLOGY. ECE 2026 Summer 2016 Lab #6: Sampling: A/D and D/A & Aliasing

Query By Humming: Finding Songs in a Polyphonic Database

A NEW LOOK AT FREQUENCY RESOLUTION IN POWER SPECTRAL DENSITY ESTIMATION. Sudeshna Pal, Soosan Beheshti

Supervised Learning in Genre Classification

2. AN INTROSPECTION OF THE MORPHING PROCESS

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

Adaptive Key Frame Selection for Efficient Video Coding

Understanding. FFT Overlap Processing. A Tektronix Real-Time Spectrum Analyzer Primer

Chapter 2 Signals. 2.1 Signals in the Wild One-Dimensional Continuous Time Signals

NanoGiant Oscilloscope/Function-Generator Program. Getting Started

EE-217 Final Project The Hunt for Noise (and All Things Audible)

CS229 Project Report Polyphonic Piano Transcription

Audio Processing Exercise

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

Adaptive Resampling - Transforming From the Time to the Angle Domain

Transcription:

@ SNHCC, TIGP April, 2018 Short-Time Fourier Transform Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research Center for IT Innovation, Academia Sinica

Sampling Rate Def: number of samples per second Why: analog to digital Examples EEG signal: 128 Hz Telephone audio: 8k Hz Music audio: 44k HZ MATLAB code https://www.brightbraincentre.co.uk/electroenc ephalogram-eeg-brainwaves/ [a,sr] = wavread( ) % sr = sampling rate length(a) % length of the signal in number of samples length(a)/sr % length of the signal in seconds 2

Sampling Rate (Cont ) MATLAB code a2 = downsample(a,2); sr2 = sr/2; length(a2) % length of the signal in number of samples length(a2)/sr2 % length of the signal in seconds wavwrite(a2,sr2, test.wav') 3

Sinusoids MATLAB code sr = 200; t = 0:1/sr:1; f0 = 10; % frequency a = 1; % amplitude y = a*sin(2*pi*f0*t + pi/2); stem(t,y) Why? sin(2*pi*f0*t + pi/2) = 1, when t = 1/f0, 2/f0, 3/f0, 4/f0, frequency = inverse of the period 4

Nyquist Shannon Sampling Theorem A signal must be sampled at least twice as fast as the bandwidth of the signal to accurately reconstruct the waveform; otherwise, the high-frequency content will alias at a frequency inside the spectrum of interest Sampling freq > 2* the highest freq in the signal http://zone.ni.com/reference/en-xx/help/370524t- 01/siggenhelp/fund_nyquist_and_shannon_theorems/ 5

Nyquist Shannon Sampling Theorem f0 = 10 y = sin(2*pi*f0*t) 6

Nyquist Shannon Sampling Theorem Telephone audio: 8k Hz Via phone, we cannot hear frequency higher than 4k Hz https://www.quora.com/how-do-hrt-sex-reassignment-and-other-such-proceduresaffect-vocal-production-particularly-the-singing-voice Question: With sr=128 Hz, we assume that we don t need to care freq higher than Hz in brain waves 7

Nyquist Shannon Sampling Theorem http://altered-states.net/barry/update236/ Question: With sr=128 Hz, we assume that we don t need to care freq higher than 64 Hz in brain waves 8

Fourier Transform To get the spectrum of a signal MATLAB code https://www.mathworks.com/help/matlab/ref/fft.html doc fft Y = abs(fft(y)); 9

Fourier Transform MATLAB code x1 = 0.7*sin(2*pi*50*t); x2 = sin(2*pi*120*t); 10

Fourier Transform Problem: cannot localize signal of interest 11

Fourier Transform Problem: cannot localize signal of interest 12

Short Time Fourier Transform (STFT) Windowed version of the Fourier Transform Output: a time-frequency representation MATLAB code https://www.mathworks.com/help/signal/ref/spectrogram. html doc spectrogram spectrogram(y,window,noverlap,nfft) spectrogram(y,100,50,100,sr,'yaxis') 13

Short Time Fourier Transform (STFT) 14

Short Time Fourier Transform (STFT) 15

Short Time Fourier Transform (STFT) window size = 100 16

Short Time Fourier Transform (STFT) window size = 100 17

Short Time Fourier Transform (STFT) Hop size hop_size = win_size hop_size = 0.5*win_size hop_size = 0.1*win_size 18

Quiz Time 1. When the sampling rate (sr) is 1k Hz, what would be the time interval (in seconds) between two neighboring samples? t 2. When the sr=1k Hz, if we use a window size of 100 samples for the STFT, what is the actual duration of the window (in seconds)? 19

Quiz Time 3. When the sr=1k Hz and we use we use a STFT a STFT window window size of 100 size of samples 100 samples with no with no overlaps between consecutive windows, how many how many times times do we do need we to move need to the move window the to cover a window signal with to cover 300 samples? a signal with 300 samples? 4. And, if there is 50% overlaps between windows, how many times do we need to move the window? 20

Quiz Time 5. Given the following spectrogrms, try to draw the corresponding waveforms 21

Quiz Time 5. Given the following spectrogrms, try to draw the corresponding waveforms (SOLUTION) 22

Quiz Time 6. Given the following spectrogrms, try to draw the corresponding the spectra computed by Fourier Transform 23

Quiz Time 6. Given the following spectrogrms, try to draw the corresponding the spectra computed by Fourier Transform (SOLUTION) 24

Quiz Time MATLAB code for (c) sr = 1e3; f = 100; t1 = 1/sr:1/sr:1; t2 = 1/sr:1/sr:0.5; y1 = [sin(2*pi*f*t2) zeros(1,1.5*sr)]; y2 = [zeros(1,sr) sin(2*pi*f/2*t1)]; y = [y1+y2]; figure(1), spectrogram(y,256,250,256,1e3,'yaxis') figure(2), plot(t,y) figure(3), NFFT = 2^nextpow2(length(y)); Y = fft(y,nfft)/length(y); ff = sr/2*linspace(0,1,nfft/2+1); plot(ff,2*abs(y(1:nfft/2+1))) 25

Understanding STFT Different window size (win_size = 50, 100, 150) size: 26 x 39 size: 51 x 19 size: 76 x 12 26

Understanding STFT Shorter window worse frequency resolu on Longer window worse temporal resolution size: 26 x 39 size: 51 x 19 size: 76 x 12 27

Understanding STFT f_max = sr/2 sampling freq > 2* the highest freq in the signal (Nyquist Shannon sampling theorem) size: 26 x 39 size: 51 x 19 size: 76 x 12 28

Understanding STFT freq_resolution = sr/win_size longer window better frequency resolution freq_resolution = 20, 10, 6.6667 (Hz), respectively size: 26 x 39 size: 51 x 19 size: 76 x 12 29

Understanding STFT freq_resolution = sr/win_size longer window be er frequency resolu on freq_resolution = 20, 10, 6.6667 (Hz), respectively 30

Understanding STFT temporal_resolution: hop_size longer window worse temporal resolution temp_resolution = 25, 50, 75 (ms), respectively 31

Trade-off Between Temp/Freq Resolution sr = 1000; hop_size = win_size/2; win_size (sample) freq_resolution (hz) temp_resolution (ms) 50 20 25 100 10 50 150 6.6667 75 Shorter window worse frequency resolution win_size = 150 can distinguish two frequency components that differ by 8 Hz, but others cannot Longer window worse temporal resolution win_size = 50 can distinguish two neighboring events that differ in time by 40ms, but others cannot 32

Quiz Time 1. The figure on the top-right is the spectrogram of a signal. What is the sampling rate of this signal? 2. The figure on the bottom-right is a zoom-in of the above figure. We can see that the frequency resolution is 20 Hz. What is the window size (in samples)? 3. The temporal resolution is close to 6.6 ms. What s the hop size (in samples), approximately? 33

Quiz Time 4. Given an EEG headset that samples signals at 128 Hz, if we want to be able to discriminate frequency components that differ by 0.5 Hz in frequency, what is the minimal window size (in samples) we need to use? What is the length of such a window in seconds then? 5. Following the previous question, if we further want to discriminate events that differ in time by 0.5 second, what is the maximal hop size (in samples) we need to use? 34

Quiz Time https://newt.phys.unsw.edu.au/jw/notes.html 6. Given a music signal with sr = 44,100 Hz, when we use a window size of 1,024 samples, what would be the frequency resolution? 7. According to the figure on the right, we know that the fundamental frequency (f0) of A1 is 55 Hz, that of A 1 is 58.27 Hz, etc. Following the previous question, which notes does the first frequency bin in the STFT cover? 35

Quiz Time 6. Given a music signal with sr = 44,100 Hz, when we use a window size of 1,024 samples, what would be the frequency resolution? Sol: 43.1 Hz 7. According to the figure on the right, we know that the fundamental frequency (f0) of A1 is 55 Hz, that of A 1 is 58.27 Hz, etc. Following the previous question, which notes does the first frequency bin in the STFT cover? https://newt.phys.unsw.edu.au/jw/notes.html [0, 43.1) [43.1, 86.2) [86.2, 129.3) [129.3, 172.4) [172.4, 215.5) 36

Quiz Time 8. Given a music signal with sr = 44,100 Hz, how if we use a window size of 4,096 samples? [0, 10.8) [10.8, 21.5) [21.5, 32.3) [32.3, 43.1) [43.1, 53.8) https://newt.phys.unsw.edu.au/jw/notes.html [21.5, 32.3) [32.3, 43.1) [43.1, 53.8) 9. Following the previous question, now the STFT can distinguish musical notes after the F 3 note.

Mel-Spectrogram The mel scale is a perceptual scale of pitches judged by listeners to be equal in distance from one another Finer resolution in the low-frequency range Dimension reduction linear scale mel scale 38

Feature Extraction Spectrogram mel-spectrogram MFCC ( mbre) Spectrogram CQT chroma feature (harmony) Feature learning and deep architectures: new directions for music informatics, J Intell Inf Syst (2013) https://link.springer.com/content/pdf/10.1007%2fs10844-013-0248-5.pdf 39

Feature Learning by Convolutional Layers Deep learning and music adversaries, IEEE Trans. Multimedia (2015) https://arxiv.org/pdf/1507.04761.pdf 40