Wind Noise Reduction Using Non-negative Sparse Coding

Similar documents
Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Speech Enhancement Through an Optimized Subspace Division Technique

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Lecture 9 Source Separation

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Analysis, Synthesis, and Perception of Musical Sounds

EVALUATION OF SIGNAL PROCESSING METHODS FOR SPEECH ENHANCEMENT MAHIKA DUBEY THESIS

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Topic 10. Multi-pitch Analysis

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Module 8 : Numerical Relaying I : Fundamentals

2. AN INTROSPECTION OF THE MORPHING PROCESS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Voice & Music Pattern Extraction: A Review

Automatic Construction of Synthetic Musical Instruments and Performers

UNIVERSITY OF DUBLIN TRINITY COLLEGE

AUDIO/VISUAL INDEPENDENT COMPONENTS

Digital Signal Processing. Prof. Dietrich Klakow Rahil Mahdian

White Noise Suppression in the Time Domain Part II

Hidden melody in music playing motion: Music recording using optical motion tracking system

Inverse Filtering by Signal Reconstruction from Phase. Megan M. Fuller

The Effect of Plate Deformable Mirror Actuator Grid Misalignment on the Compensation of Kolmogorov Turbulence

Vannevar Bush: As We May Think

Topic 4. Single Pitch Detection

A Novel Speech Enhancement Approach Based on Singular Value Decomposition and Genetic Algorithm

Digital Representation

Communication Theory and Engineering

Understanding PQR, DMOS, and PSNR Measurements

Tempo and Beat Tracking

USING MICROPHONE ARRAYS TO RECONSTRUCT MOVING SOUND SOURCES FOR AURALIZATION

The effect of nonlinear amplification on the analog TV signals caused by the terrestrial digital TV broadcast signals. Keisuke MUTO*, Akira OGAWA**

Tempo and Beat Analysis

ESG Engineering Services Group

A. Ideal Ratio Mask If there is no RIR, the IRM for time frame t and frequency f can be expressed as [17]: ( IRM(t, f) =

Murdoch redux. Colorimetry as Linear Algebra. Math of additive mixing. Approaching color mathematically. RGB colors add as vectors

Digital Audio: Some Myths and Realities

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

REPORT DOCUMENTATION PAGE

EPI. Thanks to Samantha Holdsworth!

ECG Denoising Using Singular Value Decomposition

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Extending the Usable Range of Error Vector Magnitude Testing

Improvement of MPEG-2 Compression by Position-Dependent Encoding

Reference Manual. Using this Reference Manual...2. Edit Mode...2. Changing detailed operator settings...3

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Adaptive bilateral filtering of image signals using local phase characteristics

LCD and Plasma display technologies are promising solutions for large-format

Perceptual Analysis of Video Impairments that Combine Blocky, Blurry, Noisy, and Ringing Synthetic Artifacts

R&S FSW-B512R Real-Time Spectrum Analyzer 512 MHz Specifications

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

Video Quality Evaluation with Multiple Coding Artifacts

Data Driven Music Understanding

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Query By Humming: Finding Songs in a Polyphonic Database

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

R&S FSW-K160RE 160 MHz Real-Time Measurement Application Specifications

Calibrate, Characterize and Emulate Systems Using RFXpress in AWG Series

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Dynamic Range Extension using Interleaved Gains

Psychoacoustics. lecturer:

Assessing and Measuring VCR Playback Image Quality, Part 1. Leo Backman/DigiOmmel & Co.

OBJECTIVE VIDEO QUALITY METRICS: A PERFORMANCE ANALYSIS

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Chapter 1. Introduction to Digital Signal Processing

Iterative Direct DPD White Paper

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Dithering in Analog-to-digital Conversion

1 Introduction to PSQM

Digital audio and computer music. COS 116, Spring 2012 Guest lecture: Rebecca Fiebrink

Figure 2: Original and PAM modulated image. Figure 4: Original image.

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT

Automatic music transcription

Noise Cancellation in Gamelan Signal by Using Least Mean Square Based Adaptive Filter

Agilent CSA Spectrum Analyzer N1996A

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Improving Color Text Sharpness in Images with Reduced Chromatic Bandwidth

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

DCI Requirements Image - Dynamics

Further Topics in MIR

A Novel Approach towards Video Compression for Mobile Internet using Transform Domain Technique

Impact of DMD-SLMs errors on reconstructed Fourier holograms quality

ONE SENSOR MICROPHONE ARRAY APPLICATION IN SOURCE LOCALIZATION. Hsin-Chu, Taiwan

Smoothing Techniques For More Accurate Signals

Lecture 2 Video Formation and Representation

[source unknown] Cornell CS465 Fall 2004 Lecture Steve Marschner 1

Progress in calculating tonality of technical sounds

Searching for Similar Phrases in Music Audio

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Experiments on tone adjustments

Acoustic Scene Classification

MILLIMETER WAVE VNA MODULE BROCHURE

Recognising Cello Performers Using Timbre Models

Error Resilience for Compressed Sensing with Multiple-Channel Transmission

Acoustical Noise Problems in Production Test of Electro Acoustical Units and Electronic Cabinets

Transcription:

www.auntiegravity.co.uk Wind Noise Reduction Using Non-negative Sparse Coding Mikkel N. Schmidt, Jan Larsen, Technical University of Denmark Fu-Tien Hsiao, IT University of Copenhagen

8000 Frequency (Hz) 7000 6000 5000 4000 3000 2000 1000 0 Wind Noise Reduction Single channel recording Unknown speaker Prior wind recordings available 0.5 1 1.5 2 2.5 Time Wind Noise Reduction System

The spectrum of alternative methods Wiener filter (Wiener, 1949) Spectral subtraction (Boll 1979; Berouti et al. 1979) AR codebook-based spectral subtraction (Kuropatwinski & Kleijn 2001) Minimum statistics (Martin et al. 2001, 2005) Masking techniques (Wang; Weiss & Ellis 2006) Factorial models (Roweis 2000,2003) MMSE (Radfar&Dansereau, 2007) Non-negative sparse coding (Schmidt & Olsson 2006)

Noise Reduction Estimate the speaker, s(t), given a noisy recording x(t)... based on prior knowledge of the noise, n(t)

Single Channel Source Separation Hard problem: There is no spatial information we cannot use Beamforming Independent component analysis

Signal Representation Exponentiated magnitude spectrogram γ = 2 Power spectrogram γ = 1 Magnitude spectrogram γ = 0.67 Cube root compression (Steven s power law - perceived intensity) Ignore phase information. Reconstruct by re-filtering

Non-negative Sparse Coding Factorize the signal matrix 250 200 150 100 Spectrogram Dictionary 250 20 150 10 1 2 0 1 0 0 8 0 6 0 4 0 2 0 Sparse Code 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 50 50 50 100 150 200 250 300 350 400 20 40 60 80 10 120

Non-negative Sparse Coding Factorize the signal matrix where D and H are non-negative and H is sparse Non-negativity: Parts-based representation, only additive and not subtractive combinations Sparseness: Only few dictionary elements active simultaneously. Source specific and more unique.

The Dictionary and the Sparse Code Dictionary, D Source dependent over-complete basis Learned from data Sparse Code, H Time & amplitude for each dictionary element Sparseness: Only a few dictionary elements active simultaneously

Non-negative Sparse Coding of Noisy Speech Assume sources are additive

Permutation Ambiguity Precompute both dictionaries (Schmidt & Olsson 2006) Devise a grouping rule (Wang & Plumbley 2005) Precompute wind dictionary and learn speech dictionary from noisy recording Use multiplicative update rule (Eggert&Körner 2004) Other rules could be used e.g. projected gradient (Lin, 2007)

Importance and sensitivity of parameters Representation STFT exponent Sparseness Precomputed wind noise dictionary Wind noise Speech Number of dictionary elements Wind noise Speech

Quality Measure Signal to noise ratio Simple measure, has only indirect relation to perceived quality Representation-based metrics In systems based on time-frequency masking, evaluate the masks Perceptual models Promising to use PEAQ or PESQ High-level Attributes For example word error rate in a speech recognition setup Listening-tests Expensive, time-consuming, aspects (comfort, intelligibility)

Signal Representation Exponentiated magnitude spectrogram

Sparseness Qualitatively: Tradeoff between residual noise and speech distortion learn noise dictionary Separation: Speech Separation: Noise

4000 3500 Number of Noise-Dictionary Elements Noisy Signal 3000 Frequency (Hz) 2500 2000 1500 1000 500 0 0 1 2 3 4 5 4000 Time (seconds) 3500 3000 Clean Signal 4000 3500 3000 Processed Signal Frequency (Hz) 2500 2000 1500 Frequency (Hz) 2500 2000 1500 1000 1000 500 500 0 0 1 2 3 4 5 Time (seconds) 0 0 1 2 3 4 5 Time (seconds)

4000 3500 Number of Speech-Dictionary Elements Noisy Signal 3000 Frequency (Hz) 2500 2000 1500 1000 500 0 0 0.5 1 1.5 2 2.5 3 4000 Time (seconds) 3500 3000 Clean Signal 4000 3500 3000 Processed Signal Frequency (Hz) 2500 2000 1500 Frequency (Hz) 2500 2000 1500 1000 1000 500 500 0 0 0.5 1 1.5 2 2.5 3 Time (seconds) 0 0 0.5 1 1.5 2 2.5 3 Time (seconds)

Comparison Signal-to-Noise Ratio Proposed method No noise reduction Spectral subtraction Word Error Rate Qualcomm-ICSI-OGI aka adaptive Wiener filtering (Adami et al. 2002)

Conclusions and outlook Sparse coding of spectrogram representations is a useful tool for reduction of wind noise Only samples of wind noise are required Careful evaluation and integration of perceptual measures Handling nonlinear saturation effects Optimization of performance (fewer freq. bands, adaptation to new situations)