GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

Similar documents
Gyrophone: Recognizing Speech from Gyroscope Signals

Gyrophone: Recognizing Speech From Gyroscope Signals

Features for Audio and Music Classification

Getting Started with the LabVIEW Sound and Vibration Toolkit

Voice Controlled Car System

Automatic Laughter Detection

What is the minimum sound pressure level iphone or ipad can measure? What is the maximum sound pressure level iphone or ipad can measure?

Speech and Speaker Recognition for the Command of an Industrial Robot

MUSI-6201 Computational Music Analysis

Figure 1: Feature Vector Sequence Generator block diagram.

Classification of Timbre Similarity

Automatic Laughter Detection

Supervised Learning in Genre Classification

DETECTING ENVIRONMENTAL NOISE WITH BASIC TOOLS

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Music Genre Classification and Variance Comparison on Number of Genres

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Improving Frame Based Automatic Laughter Detection

CS229 Project Report Polyphonic Piano Transcription

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Perceptual dimensions of short audio clips and corresponding timbre features

A fragment-decoding plus missing-data imputation ASR system evaluated on the 2nd CHiME Challenge

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

HomeLog: A Smart System for Unobtrusive Family Routine Monitoring

An Ultra-low noise MEMS accelerometer for Seismology

What s New in Raven May 2006 This document briefly summarizes the new features that have been added to Raven since the release of Raven

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Automatic Rhythmic Notation from Single Voice Audio Sources

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Major Differences Between the DT9847 Series Modules

THE importance of music content analysis for musical

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

DT9857E. Key Features: Dynamic Signal Analyzer for Sound and Vibration Analysis Expandable to 64 Channels

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Experiments on tone adjustments

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1,

Hidden melody in music playing motion: Music recording using optical motion tracking system

idrims Resampler After resampling idrims Resampler provides a functionality to resample measurement data. The data is 2016/08/21 Tomonori Nagayama

Signal Processing. Case Study - 3. It s Too Loud. Hardware. Sound Levels

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

An Introduction to the Spectral Dynamics Rotating Machinery Analysis (RMA) package For PUMA and COUGAR

Results of Vibration Study for LCLS-II Construction in FEE, Hutch 3 LODCM and M3H 1

Adaptive Resampling - Transforming From the Time to the Angle Domain

Recognising Cello Performers using Timbre Models

BEAMAGE 3.0 KEY FEATURES BEAM DIAGNOSTICS PRELIMINARY AVAILABLE MODEL MAIN FUNCTIONS. CMOS Beam Profiling Camera

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

A Music Retrieval System Using Melody and Lyric

Effects of acoustic degradations on cover song recognition

Recognising Cello Performers Using Timbre Models

Chapter 1. Introduction to Digital Signal Processing

ISSN ICIRET-2014

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Benefits of the R&S RTO Oscilloscope's Digital Trigger. <Application Note> Products: R&S RTO Digital Oscilloscope

Generating the Noise Field for Ambient Noise Rejection Tests Application Note

SCM820 Digital IntelliMix Automatic Mixer SEAMLESS MIXING. ADVANCED CONTROL.

Sensor Development for the imote2 Smart Sensor Platform

Music Information Retrieval for Jazz

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Singer Identification

Data Converter Overview: DACs and ADCs. Dr. Paul Hasler and Dr. Philip Allen

DESIGNING OPTIMIZED MICROPHONE BEAMFORMERS

UNDERSTANDING the timbre of musical instruments has

Musical instrument identification in continuous recordings

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Lecture 15: Research at LabROSA

Blind Identification of Source Mobile Devices Using VoIP Calls

Audio-Based Video Editing with Two-Channel Microphone

SOFTWARE INSTRUCTIONS REAL-TIME STEERING ARRAY MICROPHONES AM-1B AM-1W

WE ADDRESS the development of a novel computational

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

An ecological approach to multimodal subjective music similarity perception

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

DT9837 Series. High Performance, USB Powered Modules for Sound & Vibration Analysis. Key Features:

Dithering in Analog-to-digital Conversion

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

Entwicklungen der Mikrosystemtechnik. in Chemnitz

Proceedings of Meetings on Acoustics

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Haptic, Acoustic, and Visual Short Range Communication on Smartphones

ni.com Digital Signal Processing for Every Application

Working with BuzzMaster

2. AN INTROSPECTION OF THE MORPHING PROCESS

Subjective Similarity of Music: Data Collection for Individuality Analysis

Music Database Retrieval Based on Spectral Similarity

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

DT8837 Ethernet High Speed DAQ

Setup Guide. SpectraCal MobileForge. Pattern Generator App. Rev. 1.6

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Transcription:

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0

MICROPHONE ACCESS REQUIRES PERMISSIONS

GYROSCOPE ACCESS DOES NOT REQUIRE PERMISSIONS

CORIOLIS FORCE An effect whereby a mass moving in a rotating system experiences a force (Coriolis force) acting perpendicular to the direction of motion and to the axis of rotation. The Coriolis force is fictitious force, like centripetal force. MEMS Gyroscope measures this force to compute the angular velocity.

MEMS GYROSCOPES Major vendors: STM Microelectronics (Samsung Galaxy) InvenSense (Google Nexus)

STMicroelectronics 3-AXIS GYRO DESIGN Driving Frequency: 20KHz Samsung Galaxy, Apple Iphones and Ipads. Dominates market - ~80%

InvenSense 3-AXIS GYRO DESIGN Driving Frequency: between 25KHz 30KHz Google Nexus, Galaxy tabs

GYROSCOPES ARE SUSCEPTIBLE TO SOUND 70 HZ TONE POWER SPECTRAL DENSITY 50 HZ TONE POWER SPECTRAL DENSITY

GYROSCOPES ARE (LOUSY, BUT STILL) MICROPHONES Hardware sampling frequency: InvenSense: up to 8000 Hz STM Microelectronics: 800 Hz Software sampling frequency: Android: 200 Hz ios: 100 Hz

GYROSCOPES ARE (LOUSY, BUT STILL) MICROPHONES Very low SNR (Signal-to-Noise Ratio) Acoustic sensitivity threshold: ~70 db Comparable to a loud conversation. Sensitive to sound angle of arrival Directional microphone (due to 3 axes)

IS GYROSCOPE DIRECTIONAL? Gyroscope is omni-directional audio sensor. 3 axes --> 3 different sets for 1 reading. Can sense in all directions.

BROWSERS ALLOW GYROSCOPE ACCESS TOO

BROWSERS ALLOW GYROSCOPE ACCESS TOO

BROWSERS ALLOW GYROSCOPE ACCESS TOO

BROWSERS ALLOW GYROSCOPE ACCESS TOO

PROBLEM: HOW DO WE LOOK INTO HIGHER FREQUENCIES? SPEECH RANGE Adult male 85-180 Hz Adult female 165-255 Hz

ALIASING

WE CAN SENSE HIGH FREQUENCY SIGNALS DUE TO ALIASING THE RESULT OF RECORDING TONES BETWEEN 120 AND 160 HZ ON A NEXUS 7 DEVICE

FREQUENCIES > SAMPLING FREQUENCY

EXPERIMENTAL SETUP Room. Simple speakers. Smartphone. Subset of TIDigits speech recognition corpus 10 speakers 11 samples 2 pronunciations = 220 total samples

SPEECH ANALYSIS USING A SINGLE GYROSCOPE Gender identification Speaker identification Isolated word recognition

Speech recognition engine developed at CMU Tested for isolated word recognition 14% success rate (random guess is 9%)

PREPROCESSING All samples are converted to audio files in WAV format Upsampled to 8 KHz Silence removal (based on voiced/unvoiced segment classification)

FEATURES MFCC - Mel-Frequency Cepstral Coefficients Statistical features are used (mean and variance) delta-mfcc Spectral centroid RMS energy STFT - Short-Time Fourier Transform

CLASSIFIERS SVM (and Multi-class SVM) GMM (Gaussian Mixture Model) DTW (Dynamic Time Warping)

DYNAMIC TIME WARPING EUCLIDEAN DISTANCE DTW DISTANCE

GENDER IDENTIFICATION Binary SVM with spectral features DTW with STFT features Window size: 512 samples - corresponds to 64 ms under 8 KHz sampling rate

WE CAN SUCCESSFULLY IDENTIFY GENDER NEXUS 4 84% (DTW) GALAXY S III 82% (SVM) Random guess probability is 50%

SPEAKER IDENTIFICATION Multi-class SVM and GMM with spectral features DTW with STFT features (same as before)

A GOOD CHANCE TO IDENTIFY THE SPEAKER Nexus 4 Mixed Female/Male 50% (DTW) Female speakers 45% (DTW) Male speakers 65% (DTW) Random guess probability is 20% for one gender and 10% for a mixed set

ISOLATED WORDS RECOGNITION SPEAKER INDEPENDENT Nexus 4 Mixed Female/Male 17% (DTW) Female speakers 26% (DTW) Male speakers 23% (DTW) Confusion matrix corresponds to the mixed set results using DTW Random guess probability is 9%

ISOLATED WORDS RECOGNITION SPEAKER DEPENDENT Confusion matrix corresponds to the DTW results Random guess probability is 9%

HOW CAN WE LEVERAGE EAVESDROPPING SIMULTANEOUSLY ON TWO DEVICES?

SIMILAR TO TIME-INTERLEAVED ADC's

SIMILAR TO TIME-INTERLEAVED ADC's DC component removal

SIMILAR TO TIME-INTERLEAVED ADC's Normalization / use a reference signal

SIMILAR TO TIME-INTERLEAVED ADC's Background or foreground calibration

NON-UNIFORM RECONSTRUCTION REQUIRES KNOWING PRECISE TIME-SKEWS Filterbank interpolation based on Eldar and Oppenheim's paper

PRACTICAL COMPROMISE Interleaving samples from multiple devices

EVALUATION (Tested for the case of speaker dependent word recognition) Single device Two devices Exhibits improvement over using a single device Using even more devices might yield even better results Not a proper non-uniform reconstruction

FURTHER ATTACKS

SOURCE SEPARATION Use the 3 axes of the gyro Learn the number of sound sources around Use angle of arrival information for source separation

AMBIENT SOUND RECOGNITION IS THE USER IN A ROOM/OUTDOORS/ON A STREET?

DEFENSES

SOFTWARE DEFENSES Low-pass filter the raw samples 0-20 Hz range should be enough for browser based applications (according to WebKit) Access to high sampling rate should require a special permission

HARDWARE DEFENSES Hardware filtering of sensor signals (Not subject to configuration) Acoustic masking (won't help against vibration of the surface)

CONCLUSION Giving applications direct access to hardware is dangerous. Especially given the high sampling rate.

THANK YOU VERY MUCH QUESTIONS? CRYPTO.STANFORD.EDU/GYROPHONE

IT IS POSSIBLE TO SAMPLE THROUGH JAVASCRIPT

FAQ Did you experiment with an anechoic chamber? Yes, and did not find it beneficial at this stage.

FAQ Perhaps the gyro actually measures the vibrations of the surface? Maybe, but tests suggest it's not only that. In any case it is still dangerous.

FAQ Is it possible to use measurements from multiple devices in other ways? Yes. For example as in MIMO: EGC (Equal Gain Combining).