HomeLog: A Smart System for Unobtrusive Family Routine Monitoring

Size: px
Start display at page:

Download "HomeLog: A Smart System for Unobtrusive Family Routine Monitoring"

Transcription

1 HomeLog: A Smart System for Unobtrusive Family Routine Monitoring Abstract Research has shown that family routine plays a critical role in establishing good relationships among family members and maintaining their physical and mental health. In particular, regularly eating dinner as a family and limiting screen viewing time significantly reduce prevalence of obesity. Fine-grained activity logging and analysis can enable a family to track their daily routine and modify their life styles for improved wellness. This paper presents Home- Log a practical system to log family routine using offthe-shelf smartphones and smartwatches. HomeLog automatically detects and logs details of several important family routine activities, including family dining, TV viewing and conversation, in an unobtrusive manner. By providing a detailed family routine history, HomeLog empowers family members to actively engage in making positive changes to improve family wellness. Based on the sensor data collected from real families, we carefully design robust yet lightweight signal features for classification of various family activities. HomeLog keeps track of the ambient noise characteristics and adapts its learning algorithms in response to the dynamics of the environment. Our extensive experiments involving 8 families with children show that HomeLog can detect family routine activities with over 8% precision and 2% recall rates across different families and home environments. 1 Introduction Research has shown that family routine plays a critical role in establishing good relationships among family members and maintaining their physical and mental health [5][13][11][21]. For instance, regularly eating dinner as a family, and limiting screen viewing time significantly reduced prevalence of obesity [45][1][55] [18][52][54]. Reducing sedentary behavior (e.g., screen viewing) was as effective as increasing physical activity in preventing obesity Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM [45]. According to the Social Ecological Model [38], the environment of family can have tremendous effect on child obesity. For instance, parents eating habits and physical activities [41][47][52] strongly predict children s tendency towards obesity. In addition to the implications for family health, fine-grained analysis of family routine enables important studies in sociology and home economy. For instance, research has showed that the amount of shared time (including conversation and eating) between spouses and between parents and children have strong links with family income, mother s employment status, ages of children, and geographic location (urban or rural) [6][2][19]. Unfortunately, to date, there has been no unobtrusive and convenient approach to logging family activities for family routine assessment. Some of the available methods for family activity monitoring rely on videotaping, which not only incurs considerable installation/analysis costs, but also raises privacy concerns. Other methods resort to daily self-report from family members. However, the accuracy of the result is susceptible to human errors and subjectivity of users. This paper presents HomeLog a practical system to log family routine using off-the-shelf smartphones and smartwatches. HomeLog automatically detects and logs details of several important family routine activities in an unobtrusive manner. It uses the built-in accelerometer and microphone of the smartphone and smartwatches to detect activities that are closely related to family wellness, including start/end time and participants of family dining and TV viewing. By providing a detailed family routine history, HomeLog empowers family members to actively engage in making positive changes to improve family wellness, e.g., preventing child obesity. The design of HomeLog faces several challenges such as the significant inferences from various noises in the home. Moreover, the microphone in smart devices is designed to capture close vocals, and usually has low sensitivity. We carefully analyze the acoustic data collected from real families and choose the acoustic features that are robust against various noises and the low sensitivity of built-in microphone. In addition, the runtime features of an activity may substantially deviate from that in training data. HomeLog adapts its algorithms in response to the dynamics of the environment under the users supervising. In order to preserve the privacy of the family, HomeLog adopts lightweight yet effective algorithms to process high-rate acoustic data stream on-

2 the-fly without transmitting any data to the server. We have evaluated HomeLog with extensive experiments involving 8 families with children (one or two week recording in each family). Our results show the effectiveness of HomeLog in family activity detection (with average 88.8% precision and 9.2% recall) across different families and home environments. Moreover, the long-term, fine-grained family activity history provided by HomeLog makes it possible to analyze routine patterns/anomalies and improve family life styles. 2 Related Work The studies by American Academy of Pediatrics have shown that, a healthy family routine is not only helpful in establishing good relationships among family members, but also critical for the proper development of children s physical and mental health [5][13][11][21]. Family meals create opportunities for communication within family members and greatly benefits the parent-child relationship [4][17]. Unlike family meal, research found that TV viewing could be a double-edged sword it helps children better explore the world, but may also reduce their physical activities, or even lead to risk behaviors [15][14]. Traditionally, keeping diaries is the primary approach for long-term studies on family activities [26][32]. However, manually recording is often susceptible to subjectivity of individuals, leading to biased routine analysis. Video recording has also been used for monitoring family activities [12]. However, it often incurs high installation costs and may cause privacy concerns. Recently, several systems are designed to detect the usage of electrical appliances based on the electromagnetic interference and ambient sensors [4][23][28]. However, these systems can only detect family activities that involve substantial appliance usage. Recently, activity monitoring using mobile devices has received significant attention. Several systems are developed to provide in-home behavior monitoring, such as motionbased fall detection, especially for elderly people [37][7][9]. In [29], a system based on 3D camera is developed to leverage imaging processing techniques to identify human pose and infer corresponding activities. In [8], a sound-based system is used for bathroom monitoring, aiming to help caregiving of dementia patients. However, these studies focus on monitoring accidents or risk behaviors for either elderly people or patients. Moreover, they typically require custom hardware and hence present barriers for large-scale adoption. Several recent mobile health systems are designed based on off-the-shelf smartphones. The system presented in [43] detects conversation and physical activities by analyzing data collected from built-in microphone and accelerometer, and uses them to assess user s mental and physical wellbeing. A mobile App called isleep [24], which is designed to analyze sound samples from the phone s built-in microphone, monitors users sleep quality along with other sleep-related events, such as snoring and coughing. Sleep Hunter [22] leverages both motion and acoustic data to detect light or deep sleep stage. The system presented in [46] detects breathing rate during sleep by sound samples. Some recent studies focus on user experiences with mobile health systems such as privacy concerns [1] and sharing behaviors [42]. Acoustic event recognition algorithms have been widely adopted in smartphone-based activity monitoring systems. Auditeur [36] is designed as mobile-cloud service platform to allow client s smartphone to recognize various sound events such as car honks or dog barking. SoundNet associates environmental sounds with words or concepts in natural languages to infer activities [33]. Crowd++ [56] counts the number of speakers in a conversation using MFCC (Melfrequency cepstral coefficient) [48] features. Row mean vector of spectrogram [3] is a simple but effective method for speaker recognition by comparing the Euclidean distance of the energy distributional features. However, these studies are not focused on voice recognition in the presence of significant noise such as during a family meal. 3 Requirements and Challenges HomeLog is designed to be an unobtrusive system that helps users keep track of their family routine. Specifically, it employs the built-in accelerometer and microphone of smartphones and smartwatches to detect two important family routine activities, including family meal and TV viewing. For each of these activities, HomeLog logs the start and end time. We believe such fine-grained monitoring of family routine helps users better understand their lifestyle. In particular, these two activities constitute significant portion of time the family members spent together at home. Specifically, HomeLog is designed to meet the following requirements: 1) Since HomeLog needs to operate in parallel with family routine, it needs to be unobtrusive to use. It should minimize the burden on the user, and should not interfere with the users daily activities by any means. 2) Home- Log needs to detect the family activities and their start/end time as well as the participants in a robust fashion, i.e., across different users, smartphones, smartwatches and households. 3) Since family routine involves privacy sensitive activities such as family conversation, the privacy of the family needs to be strictly protected. For example, the system should process the collected sensor samples on the fly and only keep the results, instead of storing or transmitting any raw data, which may contain sensitive information such as conversations. To meet these requirements, four challenges need to be addressed in developing HomeLog. First, in order to monitor the family routine in an unobtrusive manner, HomeLog samples and analyzes both acceleration and acoustic signal to detect various family activities. Any family members are not required to carry the phone with them all the time, and the children are not supposed to wear any devices. However, the distance between the microphone and the sound source (e.g., family members in a conversation or TV speaker) varies over time (possibly even during the same activity), which leads to highly variable acoustic features. For example, in a typical TV viewing scene and the user is wearing a smartwatch, the loudness of the sound from TV keeps changing as the user moves around the room, which makes it difficult to know whether TV is turned on simply by the captured sound volume. Moreover, the microphone in smart devices is designed to capture close vocals, and usually has low sensitivity. Second, due to the inherent dynamic nature of home environment, the activity detection is susceptible to various

3 Motion Pre-processing Acceleration on X-axis Changing Rate of Acceleration Motion Features Model Building HMM of Family Routine Training Data Survey Sound Admission Control Acoustic Features Clattering Sounds Event Detection Family Meal Detection Review Feature Extraction Conversations Volume Distribution Pitch Variance TV Viewing Detection Meal TV Family Routine Time/Date Feature Fusion (Optional) Figure 1. System overview noises such as sound caused by pets. Therefore, in order to provide fine-grained accurate monitoring results, we need to design robust event detection and classification algorithms that can handle various noises in practice. Third, since the daily habits, the living environment, and the smart devices owned by users vary significantly among different families, a training process is required before using HomeLog. Moreover, the training period must be short in order to minimize the user s burden. Therefore, the training process only gains basic understand of the family s regular routine and captures several snapshots of the family activities. HomeLog needs to adapt to the dynamics of the family routine, by adjusting the parameters from user s limited feedback. Lastly, HomeLog needs to process the sensor samples on the fly, in order to preserve the user s privacy. As the acoustic signal is sampled at a relatively high frequency, the processing algorithm must also be lightweight to run in real-time, yet effective at recognizing family routine activities. 4 System Design Fig.1 shows the software architecture of HomeLog. This system can be installed on multiple smartphones (Tablets are also supported) and smartwatches that own by the parents in the family, without any constraint for children. It ensures a high likelihood that at least one monitoring device is always close to the occurring family activity. Motion of the smartwatches and sound signal captured by each smart devices are recorded. HomeLog detects family meal and TV viewing by features extracted from above data. In pre-processing, the smartphone keeps sampling the data from built-in accelerometer on smartwatches and microphone on every smart devices. If the device is carried out of home, or none of the family member is active, the data will not be further processed. The motion signal from accelerometer is processed to get the user s current motion status on wrist, which can potentially provide information about the user s activity. The collected acoustic signal is translated into 21 energy channels for each 5ms frame. Specifically, HomeLog conducts FFT on each frame, and applies Mel Filter [5] to divide the energy spectrum from FFT into 21 energy channels. Based on the experimental data collected in real families, we have identified a set of unique features for different family activities. Specifically, HomeLog uses the motion of user s wrist and the clattering sound (mostly produced by tableware) to identify family meal, and the special features of the frames with low energy of sound for TV viewing. We divide the time into 3-min windows and consider all features within each window to perform an activity detection. Such a window is referred to as detection window hereafter. If one s voice appears within a window for certain times, he/she can be classified as a participant of the activity. The algorithm of detection has a framework of Hidden Markov Model (HMM). The model is built and updated by three sources: a survey for the family, data recorded in the family for one day with manually labeled activities, and the interaction with the users. Under the definition of our HMM, for one specific family routine activity, each detection window is in either of two possible states, which indicate whether this activity occurs in the detection window. The goal is to find the best sequence of states (Markov Chain) that describes the family meals and the TV viewings in a whole day.

4 4.1 Pre-processing To reduce energy consumption for sound recording, we apply 16 khz, i.e. lowest sampling rate for smartphones in common, for our acoustic signal analysis. For every 5 ms frame of sound signal with 16 khz sampling rate, Fast Fourier Transform (FFT) can provide energy distribution from 2 Hz to 8 khz, with 4 effective values every 2 Hz [25][44]. However, the data size after FFT is relatively large, which often prevents a real-time and lightweight analysis in later stages of the signal processing pipeline. Mel Filter [31][39][5] provides a solution for this problem. The basic idea of Mel Filter is to use Mel Scale, which is based on just-noticeable differences of pitch, to build a series of triangle overlapping windows and to transform data from frequency domain into energies on channels. It simulates the hearing process of human. After discarding the noise which mainly falls under 8 Hz, we apply the rule of 1/3 octave bands from 8 Hz to 8 khz [53] to spread 21 channels for sound from 8 Hz to 8 khz. By auditory sense, each two neighbor channel keeps same distance under this design. Then, we apply Mel Filter to transform FFT result into 21 channels. The energy value on channel i is represented as e i. Volume and pitch can be extracted from energy distribution. Specifically, the volume of sound signal represented as V, is given by: V = 21 i=1 e i (1) The pitch of sound signal is represented as P. In our design, pitch corresponds the channel with the highest energy, and the number of dominating channel can be used to describe the pitch feature of acoustic signal: P = arg max i [1,21] e i (2) This process is performed each 5 ms on 1 ms acoustic signals, i.e. 2 Hz framing rate with 5% overlap. This framing rate is also applied to all collected motion data. In order to reduce the computational load, we discard detection windows that only contain environmental noises (e.g., silence or sound of dishwasher) based on variance of sound volume V within a detection window. Low variance of V indicates that the window only contains continuous noise, which is usually captured during night when all family members are sleeping, or when no one is at home. Another scene is the device is carried out of home, which can be detected by location sensors like the Global Positioning System (GPS) or Wi-Fi connection to the home hotspot. If one device cannot provide useful information for a detection window, it will not further process the data at the moment. When the HomeLog run on multiple devices, it is possible that only few of them can capture some interested data (Phones may be left for charging at somewhere far away from the activities). This step ensures that power consumption is reduced when the family routine activities are not likely to be sensed, and the devices are always active when they are able to detect the activities. Another step in the pre-processing aims to extract useful features that are related to the family routine activities from the motion data and the sound signal. In next sections, we introduce these features, why we select them and how to use them to facilitate the detection. 4.2 Family Meal Detection A well-defined family meal should contain two characteristics. First, the participants should be eating something. Second, parents and children should both appear in this scene. According to these characteristics, the family meal can be described with three main features. First feature is the clattering sound caused by clashes between tableware. The clattering sound is the most distinctive acoustic characteristic of family dining activity, regardless of other dynamics, such as the type of food and variation of tableware. Second feature is the wrist gestures of users (only parents) that are detected by smartwatches. A typical example is the user usually sits with the arm on the table during the family meal, and the hand is neither so active as the physical exercise nor so stable as reading a book. Third feature is the conversation between family members. It is very common that a family meal contains a large amount of conversations between parents and children Clattering Sound To describe the occurrences and frequency of clattering sound within a detection window, the system looks for an energy peak from channel 12 to 16 (associated with frequency ranging from 1 4kHz) for each 5ms frame. To detect whether a frame contains clattering sound, we adopt the a lightweight algorithm. For each frame, we compute e all, the average energy over all channels; and e 12 16, the average energy across channel 12 to 16. We use r = e /e all to detect clattering sound. For example, Fig.2 shows an example of clattering sound detection in a typical family meal scenario. Fig.2(a) shows the energy on 21 channels over time, and Fig.2(b) shows the corresponding e and e all. We can see that one occurrence of clattering sound may result in several continuous clattering frames, and always leads to a higher e 12 16, even when the clattering sound and human voice is overlapped around 1 second. Therefore, comparing e and e all is a simple and effective way of detecting clattering sound in typical family meal scenario. In our study, we setup a training data set that contains oneweek sound recording from 5 families, and manually label 3 frames with clattering sound. For this training set, we apply the Gaussian Kernel Density Estimation (Gaussian KDE)[49] to calculate the Probability Density Function (PDF) p(r Clattering), which represents the distribution of r for clattering sound. Then we apply the Bayesian rule to derieve P(Clattering r), which represents the probability of a frame contains clattering sound given r. To describe the occurrence and density of the clattering sound in a detection window, the expression E[N Clattering ] should be used, which is the expectation of number of frames containing clattering sound, derived by the sum of P(Clattering r) for all frames in the detection window. Fig.3 shows an example of family meal detection based on the real data set collected in a home. We can see that all family meal windows contain large numbers of clattering frames. The clash of other objects such as keys and coins can also produce a similar sound. Different from clattering

5 (a) (b) Channel Energy ClatteringuSound HumanuVoice MeanuofuEnergyuonuChannelu12utou16 MeanuofuEnergyuonuAlluChannels Timeu(s) Figure 2. An example of clattering sound detection in a typical family meal scenario. (a) shows the energy on 21 channels over time, where clattering sound and human voice are marked with rectangles. (b) shows the comparison between e and e all for the same sound clip. frames of dining activity, such false alarms are usually isolated and not likely to occur in a burst. E[N Clattering ] Family Meal Time (min) Figure 3. An example of family meal detection. Each bar represents the expectation of number of frames containing clattering sound in a detection window Wrist Gesture The features from motion data can be used to demonstrate the behavior of user. In order to reduce the computational load, we yield the acceleration data from the accelerometers every 5 ms. Under the design of low sampling rate, we cannot focus on microscope features for motion data. Here we focus on the overall gesture of user in a detection window. According to the fact that watch is always worn on the wrist, as shown in Fig.4, the direction of the X-axis is always parallel to the arm. Therefore, the acceleration on X-axis on a smartwatch is always determined by the gravity and the overall gesture of the arm. In addition, to learn how active the user s hand is, the changing rate of acceleration should be acknowledged. If the acceleration data from each frame is represented by a vector (acceleration data from X-axis, Y- axis, and Z-axis), the changing rate of acceleration between two vectors will be represented as the angle between them. This angle describes the rotation of the smartwatch. From these observations, we select two features of motion data for the detection of family meal, which are the acceleration on X-axis (A x ) and the changing rate of acceleration (Rc). The value of A x is derived by the average of acceleration on X- axis for each 5 ms, and the value of Rc is derived by the average of the changing rate of acceleration for each two neighbor frames. As shown in Fig.??, the behavior while eating is distinct from other common activities at home (I am adding cooking, reading, video gaming, TV viewing, physical exercising), and can be observed from the acceleration data. (Need a figure here) Y X Z Figure 4. The direction of X, Y, and Z-axis of the accelerometer in a smartwatch. The acceleration depends on the gravity and the motion of the wrist. When the watch is not moving, the acceleration on one axis will be read as 9.8m/s 2 if the direction of that axis points straightly to the ground Conversation The goal of conversation detection is to identify the occurrence of the conversation, as well as the family members who participate in the conversation. A family meal is supposed to include large amount of voice of family members. The speaker recognition technique presented in [16] shows that pronunciation of vowels is identical characteristic of human. However, its maintenance of the whole database for voice of each family member is high-cost for smart devices. Row mean vector of spectrogram [3] provides an effective and efficient approach to recognize speakers by measuring Euclidean distance of energy distribution on frequency domain, which is already given by the step of pre-processing. For each 5 ms frame, we run the speaker recognition to calculate the probability that the frame contains voice of at least one family member, represented as P(Voice {e i i [1,21]}). In a detection window, the feature that we extract related to the conversation is expressed as E[N Voice ], which means the expectation number of frames that contains the family members voice, and is derived by sum of P(Voice {e i i [1,21]}) from each frame in the detection window. 4.3 TV Viewing Detection TV viewing is difficult to detect because it often consists of a vast variety of different sounds. Even for a particular TV program, it is often challenging to find the underlying acoustic features to uniquely identify the TV viewing activity. Therefore, instead of relying on frame-based acoustic features, HomeLog exploits the characteristics within a detection window to detect TV viewing. These characteristics, reflecting energy distribution and variance of pitch, are not only efficient to calculate, but also more robust across different TV programs, and much less susceptible to dynamics such as distance between the smart devices and TV. Specifically, to detect TV viewing, the system applies the following two features for each detection window. The first feature is the volume/pitch distribution, and the second feature is a fusion of sound signal captured by multiple devices Volume Distribution & Pitch Variance In order to detect TV viewing, two features is effective for each detection window. 1) Percentage of low-energy frames (Percent): The percentage of frames with root mean Y

6 square (RMS)[2] less than 5% of the mean RMS within a detection window. 2) Variance of pitch of low-energy frames (Var pitch ): The variance of pitch which is defined as P in Eqn.(2) of Section 4.1 for frames with RMS less than 5% of the mean RMS within a detection window. The Percent reflects the energy distribution within the window, and works well in separating TV sound from other foreground sounds that often involve human activities, such as family meal and conversation. This is primarily due to the fact that TV sound is usually more continuous (i.e., containing less pauses or quiet frames), as opposed to foreground sounds. Therefore, the energy distribution of TV sound is more right-skewed, resulting in less low-energy frames, and therefore has smaller Percent. Var pitch also focuses on the low-energy frames within a detection window and describes the stability of pitch, making it a good supplement to identify TV sound. Due to its continuous nature, TV sound has a more stable pitch for low-energy frames, compared with other foreground sounds. Fig.5 shows an example of identifying TV viewing activity based on the feature space formed by Percent and Var pitch. We can see that, the dining and TV viewing activities can be separated in the feature space. Variance of Pitch of Low-energy Frames Dining & Conversation TV Viewing 1% 2% 3% 4% 5% 6% Percentage of Low-energy Frames Figure 5. A data set including TV viewing and family meal in the feature space. Each mark represents a detection window, labeled with ground truth Feature Fusion When there are multiple devices in a household and all of them are not moved significantly, HomeLog can leverage them to improve the sensing performance. A novel feature fusion algorithm can significantly improve the accuracy of TV detection by fusing the acoustic data captured by multiple phones in a home. Our idea of multi-device fusion is based on the following observation: TV is a sound source with fixed location, whose volume stays within a limited range for a relatively short period of time. This means detection can benefit from localization of sound sources. We focus on the fusion algorithm for two devices although it can be extended to more generic scenarios. Our design is based on binaural hearing[34], which is a technique for determining of the direction and origin of sounds by two sound receivers. In particular, we present a fusion algorithm based on the interaural level difference (ILD), the differences of sound volume captured, which is a basic features to recognize sound sources. The process of feature fusion consists of three steps: similarity check, sound source detection in high-energy frames, and sound source detection in low-energy frames. In the first step, the goal of similarity check is to figure out whether two devices are at home and near each other by examining the similarity between sound captured by two devices based on ILD. We define the detection windows that start at the same time instance on two different devices as the binaural detection windows. To describe the similarity between binaural detection windows A and B, we define Average Cosine Similarity per Frame (C(A,B)) as: l cos(e(a, i), E(B, i)) i=1 (3) C(A,B) = l Here, the energy distribution for frame i in detection window X is represented as vector E(X,i). If C(A,B) is above a threshold, the two devices are likely close to each other, and the feature fusion algorithm continues to the next step. The next step in feature fusion is to detect the number of (a) (b) Volume 15 1 Volume:Ratio(A:B) Conversations Phone:A Phone:B Time(s) Figure 6. An example of TV viewing with conversation. (a) shows captured volume by two smartphones. (b) shows the volume ratio between corresponding frames. sound sources in binaural detection windows. If the acoustic signals captured by two devices are highly similar, the number of sound sources can be estimated by ILD. If the acoustic signal in high-energy frames originates from a single sound source, it is more likely caused by TV. In contrast, if the acoustic signal in low-energy frames originates from multiple sound sources, it is more likely to be caused by human activities other than TV. The method we use to detect sound sources is based on acoustic localization by ILD [3]. Specifically, if the acoustic signal is from a single source and captured by two receivers, it satisfies V 1 /V 2 = d1 2/d2 2 = V, where V 1 and V 2 are volumes received by receivers and d 1 and d 2 are distances between receivers and sound source. This equation can be applied to compute the relative distances between the sound source and the devices. In indoor scenarios, V may be impacted by various factors (e.g., echoes and obstacles), but its coefficient of variation is limited when d 1 and d 2 are fixed. To detect whether the acoustic signals come from the same source, we define Coefficient of Variation of Volume Ratio per Frame (CV (A,B)) in binaural detection windows A and B as: CV (A,B) = σ( V (A,B)) { µ( V (A,B))} VA,i (4) V (A,B) =,i [1,l] V B,i Here, the volume of frame i in detection window X is repre- TV

7 sented by V X,i from Eqn.(1), µ( V (A,B)) is the mean of volume ratios between A and B, and σ( V (A,B)) is the standard deviation of volume ratios. CV (A,B) thus is the ratio of the standard deviation to the mean. The lower CV (A,B) is, the more likely the acoustic signals come from a single source. Fig.6 shows an example of how to detect sound sources by volume ratio. In the first 2 seconds, phone B is carried by user from the dining table to the sofa. TV is turned on at the 3th second. During the 7th-75th second and the 14th- 15th second, the subjects talk to each other. We can see that when the frames only contain TV sound, volume ratio is relatively stable. In contrast, as conversation involves multiple sound sources, the variance of the volume ratio is significantly increased. We now discuss how the above feature fusion algorithm can improve the accuracy of TV detection in several challenging scenarios: 1) Quiet TV programs that contain discontinuous low volume sound may be recognized as noise. 2) TV programs that contain similar sound as family meal or conversation may be misclassified; and 3) Home parties or party-like events with a continuous sound profile (e.g., large amount of conversation) that may be misclassified as TV. In a relatively quiet scenario, where TV sound is discontinuous, the acoustic signal in high-energy frames is still from a single sound source, so the event will be recognized as TV viewing by feature fusion. If the high-energy frames contain clattering sound and conversation, but they are from a single sound source, they will be more likely from TV than family activities. If the high-energy frames come from multiple sound sources, we also check CV (A,B) in all low-energy frames. These frames usually contain sound from TV and noise, which are continuous and have low volume. In the scenario of home parties, continuous conversation or music are similar to the TV sound. However, these low-energy frames come from multiple sound sources and hence will not be misclassified. 4.4 Model Building In order to establish the connection between the family routine activities and the data collected by the smartphones and the smartwatches, We built a mathematical model that can calculate the probability of the activities occurrences with the input of the features extracted from the data. The first step of our model building is to discover the influence between activity and informations that we can retrieve. Since we only focus on typical activities in family, two significant characteristics can be summarized. First, the occurrences of these activities are influenced by time and date. For example, it is reasonable for a family to watch a specific TV program at a fixed time in a day. Second, the durations of these activities are predictable. For example, a family meal is more possible to last for 2 minutes than 6 or more minutes. Here we can focus on building model for one activity, because the family meals and the TV viewings can be described with the same model. If we assume that the data collected by the smartphones and the smartwatches can reflect the occurrence of the activity, we can combine all related factors to build a Bayesian Network (BN) for one detection window in this scenario, as shown in Fig.7(a). With this BN, the goal of our system can be expressed as: given all facts and observations, finding the probability of which state the detection window is in. (a) (b) Observation 1 Activity will continue? φ (1,1) Time, date, duration of the activity Fact 1 Fact 2 Fact 3 Whether the activity occurs S 1 : Activity φ (1,2) φ (2,1) State Observation 2 Observation 3 Features from the motion data and the sound signal S 2 : No Activity φ (2,2) Activity will start? Figure 7. Models for family routine activities. (a) The Bayesian Network model between facts, state, and observations; (b) The Hidden Markov Model of one activity A more practical model for this case is a HMM of family routine, as shown in Fig.7(b). The HMM assign two possible states for a detection window, which are activity occurs and activity does not occur, corresponding to the State node in BN. The observations of the HMM are data collected by smartphones and smartwatches, corresponding to the Observation nodes in BN. We define the HMM as λ, and the goal of our system can be expressed as Eqn.(5). arg maxp(x λ,o) (5) X Here, X is a sequence of states from l detection windows, given by X = {x 1,x 2,...,x l }; O is a sequence of observations in l detection windows, given by O = {o 1,o 2,...,o l }. A Markov Chain is built by X and O, which starts in the morning when the family members wake up (we have x 1 = 2, which means no activity occurs at the very beginning of the day), and ends with the family members sleep at the night. In our system, O can be observed by analysis with data from smart devices. Once the maximal-likelihood X that outputs O is found, the activity s occurrences in these l detection windows are determined. This can be done by the Viterbi algorithm. From the discussion above, we have a simple expression λ = {Φ,Θ} for this specific case. Here, Φ represents the transition probabilities between two states, given by Φ = {φ (1,1),φ (1,2),φ (2,1),φ (2,2) }; Θ represents the emission parameters for observations associated with two states, given by Θ = {θ (o,1),θ (o,2) }, where o is an observation from the smart devices in a detection window. However, according to the BN, the transition probabilities in Φ are not fixed for every detection window, but depend on Fact nodes such as time and date; the probability for a specific observation in a state contains features from multiple sources (motion data and sound signal), and follows a continuous distribution due to the complexity of the data from sensors. Thus, our HMM is called a Hidden Markov Model with Dynamic Transition......

8 Probabilities and Multiple Continuous Observations[27]. In next sections, we introduce how to decide the parameters Φ and Θ for this HMM Transition Probabilities The transition probabilities, i.e. Φ, contains four entries {φ (1,1),φ (1,2),φ (2,1),φ (2,2) }. According to the definition of our HMM, when the activity is not occurring, we only need to know the probability of its occurrence in next detection window. On the other side, while the activity is currently occurring, we only need to know the probability of whether it continues to next detection window. Therefore, two models are enough to describe all the transition probabilities, which are the probability distribution of one activity s occurrence related to time/date, and the probability distribution of its duration. The best way to gain knowledge of these probability distributions is study on the users descriptions of their own family routines. In our study, we collected the ground truth of the activities in 8 families, and conducted interviews with them about their self-recognition of their family routine activities after the data collections. They gave us the estimations of the starting time and the range of durations for the families meal and TV viewing. After we compare their descriptions and the ground truths, we found it is possible to build the models of these probability distributions just according to the content of interview. Assuming that a survey for one family shows that they consider they usually have 2 meals in home on weekday, which occur around 7:am and 6:pm, the realistic models will appear as the Gaussian distributions as shown in Fig.8(a). Specifically, we use 7:am and 6:pm as the means of Gaussian distributions, and choose 2 minutes as the standard deviation for our cases. The probability is also scaled by the number of meals at home, which means the sum of all probabilities in a day equals 2. We can also apply Gaussian distribution for the duration of each activity. For example, Fig.8(b) shows a probability distribution of duration of family meals for a family, where they described their meals at home as lasting for 15 to 25 minutes. This process establish the relationship between transition probabilities of our HMM and the time in a day. For the weekday and the weekend, different models should be applied, because of the family routine might be different. The transition probabilities of our HMM can be read directly on these probability distributions. When applying the Viterbi algorithm, the transition probability from S 2 to S 1 is equal to the probability of the activity s occurrence according to time/date; the transition probability from S 1 to S 1 is equal to the probability of the activity s continuousness to next 3 minutes. Because the transition probabilities are dependent to previous states (the influence of activity s duration), we need to adjust the structure of our HMM to ensure Viterbi algorithm to run properly. The structure is shown in Fig.9. All transition probabilities in the new structure are independent to previous states, but the required memory space are significantly improved. In practice, we run Viterbi algorithm for 4 states in this HMM (i.e. 4 detection windows) to ensure any activities within 2 hours can be handled. (a) (b) P : 4:AM 8:AM 12: 4:PM 8:PM Tim e P Duration (min) Figure 8. An example of the probability distributions of the family meal s occurrence and duration. (a) The probability distribution of the family meal s occurrence during weekday, according to usually on 7:am and 6:pm ; (b) The probability distribution of the family meal s duration, according to lasting for 15 to 25 minutes. Activity S S 1,3 1,2 S 1,4 S 1,1... S2: No Activity Figure 9. The adjusted HMM. One activity is divided into multiple states as S 1,k, indicating the activity already lasts for k detection windows. Here S 1,k can only transit to S 1,k+1 or S 2, meaning that the activity continues to next detection window or stops Observations The observation o within one detection window is represented as a vector of features, i.e. o =< f 1, f 2, f 3,... >, where f i represents a feature related to the activity. The features for the family meal and TV viewing detections are shown in Table.1 and Table.2. Here, each feature is described as a continuous value. Our target is to calculate P(o S) and find the best expression of θ (o,s). Table 1. Features for the family meal detection Term Description E[N Clattering ] The expectation of number of frames containing clattering sound A x The average of acceleration on X-axis Rc The changing rate of acceleration E[N Voice ] The expectation of number of frames containing the family members voice Table 2. Features for the family meal detection Term Description Percent The percentage of low-energy frames Var pitch The variance of pitch of low-energy frames CV (A,B) The coefficient of variation of volume ratio per frame (optional) in binaural detection windows A and B After HomeLog is deployed in a new family and complete the survey of the regular family routines as we discuss in section 4.4.1, it runs for one day to collect data and label the data with ground truth according to the result of survey. A : 6

9 training data set is constructed through this way. Based on the training data set, we can apply Gaussian KDE to calculate two PDFs, which are p(o 1) and p(o 2), corresponding to the observations while activity is occurring or not occurring. To transform these PDFs into probabilities P(o 1) and P(o 2), we follow the assumption below: P(o S) = lim P(B S) = lim p(o S)do = βp(o S) µ(b) µ(b) B (6) Here, B is a small continuous space with a measure approaching, and we have o B. We can treat the probability of observation o as the probability of observations in its neighborhood B. Then, we use a constant β to describe the measure of B, also build a connection between P(o S) and p(o S). We choose θ (o,s) = p(o S), because β is treated a constant that does not affect the outcome of the Viterbi algorithm. A special feature is the CV (A,B) from the feature fusion in TV viewing, because it is not always available to be observed. However, this still does not affect the outcome of the Viterbi algorithm, because the influence of adjusting the observation vector o in some detection windows is limited within those detection windows but does not expand to the whole Markov Chain Updating the HMM In order to achieve high detection accuracy and maintain an up-to-date model for gradually changing environment, HomeLog keeps updating the parameters in HMM from the interaction with the user. Specifically, whenever the user view the detected family routine presented by the HomeLog, he or she may correct the error in it or confirm it is accurate. After receiving the review by users, HomeLog treat all recently confirmed result and corresponding features as a new training data set. The parameters of HMM will be updated consequently. Through this process, the probability distributions of activities occurrences and durations as shown in Fig.8 will be more closed to the truth in family, by gaining a new knowledge of average and standard derivation of these probability distributions. Furthermore, the PDF of observations associated with states also will be adjusted. The outof-date data will be discarded from the training set, and will be deleted to save the data storage. To keep HomeLog as an unobtrusive monitoring system, users are only requested to review and correct some errors to adjust the detecting ability of it. In our evaluation, we can see that the user does not need to adjust HomeLog for too many times, because of its high accuracy. 5 Performance Evaluation In order to evaluate the performance of HomeLog, we have recruited 8 families for data collection. Our study has been approved by the Institutional Review Boards (IRB) of the authors institute. This group of families is referred to as the second group hereafter. The period of experiment lasts one or two weeks for each family. We provided each family with multiple devices. The app pre-installed on the devices can continuously record audio and motion unless the device is taken out of home. User may manually start/end the app on any device. The parents of the family are required to carry a smartphone as his or her own smartphone, and the other smartphone is kept at a relatively fixed location at home (e.g., kept charging somewhere). At least one of the parents of the family are required to wear a smartwatch if the smartwatches are available. These requirements take into account the habits of different users (i.e., carrying the phone, leaving the phone at a relatively fixed position at home, etc.). They also ensure family activities to be captured by at least one devices. In order to get the ground truth, we firstly gain knowledge of the subjects regular family routine based on the interviews with them, then listen to the recordings and manually labeled the duration of activities. The details of the participated families and collected data are shown in Table 3. Because we offer the subjects the right to delete any recorded clips due to privacy concern, not all of the family routine activities during the experiments are recorded by the app. 5.1 Micro-scale Routine Analysis HomeLog is designed to provide fine-grained family routine logging, which allows family members to review their activities. We analyze the family routine in detail using data collected in one or two weeks. To show the novelty of our HMM for family routine activities, we compare its performance with the classifying result through the Support Vector Machine (SVM), which recognizes the activities only according to the observations by smart devices and without the awareness of the time/date or activity s duration. Such analysis also sheds light on the lifestyle of a family and motivates family members to make positive changes. In this section, we use the results from family 4 as an example. In Section 5.2, we analyze the detection accuracy of different activities of all families. The detection results along with the ground truth are shown in Fig.1. Detailed long-term family activity logs like this make it possible to analyze routine patterns/anomalies and suggest possible ways to improve family life styles. We can see that the family usually has dinner around 7-8 pm for about an hour, except for day 5, which is Friday, when they started dinner at around 8 pm for about 2 minutes. Moreover, the family watched more TV and finishes TV viewing later than most other days during the week, possibly due to the fact that it s a Friday. Compared with the ground truth, we can see that Home- Log is accurate in detecting most of the activities. In day 3 and 5, HomeLog produces a few misclassifications for dining activity through SVM detection, due to interferences caused by TV viewing. However, we can reduce the errors by the HMM detection. Furthermore, by considering the the time/date and activity s duration, HMM can eliminate the improper detection result that shows discontinuous activities or short activities that last only for few minutes. 5.2 Evaluation of Event Detection In this section, we focus on the performance of HomeLog in single activity detection. The main objective is to evaluate the detection accuracy of the occurrence and the duration of each activity. The data collected in the experiments are processed using the methods described in Section 4.2, and the training data is

10 Table 3. Families that participated in the experiment and their daily routine Family Children Phone Smartwatch Data Family Meal TV Viewing (Ages in Years) (Weeks) (Number) (Number) 1 1 daughter(5) Nexus 4 N/A daughter(4) Nexus 4 N/A daughters(5, 8),2 sons(1, 3) Nexus 4 Sony Smartwatch sons (1, 3, 5) Nexus 3 Sony Smartwatch sons (3, 5) Moto G N/A daughters(1,3),1 son(7) Moto G2 2 Sony Smartwatch daughters(3,11),2 sons(7,13) Moto G2 2 N/A daughters(7,1,18) Moto G2 2 Sony Smartwatch Day 1 Day 2 Day 4 Meal TV Meal TV Meal TV 6:pm 7:pm 8:pm 9:pm Meal Day 3 TV 7:pm 8:pm 9:pm 1:pm SVM Detection Family Meal{ HMM Detection Ground Truth SVM Detection TV Viewing{ HMM Detection 7:pm 8:pm 9:pm Ground Truth Day 5 Meal TV 7:pm 8:pm 9:pm 1:pm Figure 1. Detected family routine based on data collected from family 4 during 5 days. built based on the interviews with subjects and the collected data in the first day of experiment. First, for each detection window (3 minutes), we detect family activities including the family meal and TV viewing. In the rest of this section, we discuss three types of detection results: 1) The overall detection accuracy of HomeLog for each detection window. We use precision and recall as the metrics for this evaluation. Specifically, we define the precision of detecting activity A as the number of true-positive windows divided by the total number of windows detected as A. The recall of detecting activity A is defined as the number of true-positive windows divided by the total number of windows that actually associated with A. We do not take into account the true negatives, because most of the windows containing no activities are detected or discarded. 2) The detection accuracy of the occurrence of each activity. We show the sum of false negatives and false positives for this metric. 3) The detection accuracy of the duration of each activity. We show the average detection error of each activity s start/end time in minutes for this metric. In our design of HomeLog, the detection result is further calibrated by user s review. The HMM of family routine is updated in a dynamic environment as described in Section The performance of the updated HMM is also shown in next sections. 1% 9% 8% 7% 6% 5% Precision (HMM) Precision (SVM) Recall (HMM) Recall (SVM) Family 1 Family 2 Family 3 Family 4 Family 5 Family 6 Family 7 Family 8 Figure 11. Overall accuracy of family meal detection in detection windows. The average precision and recall by HomeLog with HMM are 8.7% and 89.5%, respectively. 2 1 Number of errors per week Family 1 Family 2 Family 3 Family 4 Family 5 Family 6 Family 7 Family 8 Figure 12. Errors in detecting the family meal s occurrence. The number of errors is the sum of false negatives and false positives for each family meal and detected result Family Meal Detection In this section, we evaluate the performance of family meal detection. Fig.11 shows the detection result of 8 families. We can see that by applying the HMM instead of SVM detection, HomeLog increases the recall by up to 11.45% (6.82% on average.) This is primarily because the HMM

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1 02/18 Using the new psychoacoustic tonality analyses 1 As of ArtemiS SUITE 9.2, a very important new fully psychoacoustic approach to the measurement of tonalities is now available., based on the Hearing

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Gyrophone: Recognizing Speech From Gyroscope Signals

Gyrophone: Recognizing Speech From Gyroscope Signals Gyrophone: Recognizing Speech From Gyroscope Signals Yan Michalevsky Dan Boneh Computer Science Department Stanford University Abstract We show that the MEMS gyroscopes found on modern smart phones are

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer by: Matt Mazzola 12222670 Abstract The design of a spectrum analyzer on an embedded device is presented. The device achieves minimum

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007 A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

A Statistical Framework to Enlarge the Potential of Digital TV Broadcasting

A Statistical Framework to Enlarge the Potential of Digital TV Broadcasting A Statistical Framework to Enlarge the Potential of Digital TV Broadcasting Maria Teresa Andrade, Artur Pimenta Alves INESC Porto/FEUP Porto, Portugal Aims of the work use statistical multiplexing for

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Voice Controlled Car System

Voice Controlled Car System Voice Controlled Car System 6.111 Project Proposal Ekin Karasan & Driss Hafdi November 3, 2016 1. Overview Voice controlled car systems have been very important in providing the ability to drivers to adjust

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories

More information

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine Project: Real-Time Speech Enhancement Introduction Telephones are increasingly being used in noisy

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad. Getting Started First thing you should do is to connect your iphone or ipad to SpikerBox with a green smartphone cable. Green cable comes with designators on each end of the cable ( Smartphone and SpikerBox

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Multi-modal Kernel Method for Activity Detection of Sound Sources

Multi-modal Kernel Method for Activity Detection of Sound Sources 1 Multi-modal Kernel Method for Activity Detection of Sound Sources David Dov, Ronen Talmon, Member, IEEE and Israel Cohen, Fellow, IEEE Abstract We consider the problem of acoustic scene analysis of multiple

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11

Processor time 9 Used memory 9. Lost video frames 11 Storage buffer 11 Received rate 11 Processor time 9 Used memory 9 Lost video frames 11 Storage buffer 11 Received rate 11 2 3 After you ve completed the installation and configuration, run AXIS Installation Verifier from the main menu icon

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Smart Traffic Control System Using Image Processing

Smart Traffic Control System Using Image Processing Smart Traffic Control System Using Image Processing Prashant Jadhav 1, Pratiksha Kelkar 2, Kunal Patil 3, Snehal Thorat 4 1234Bachelor of IT, Department of IT, Theem College Of Engineering, Maharashtra,

More information

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz

More information

PRODUCTION MACHINERY UTILIZATION MONITORING BASED ON ACOUSTIC AND VIBRATION SIGNAL ANALYSIS

PRODUCTION MACHINERY UTILIZATION MONITORING BASED ON ACOUSTIC AND VIBRATION SIGNAL ANALYSIS 8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING" 19-21 April 2012, Tallinn, Estonia PRODUCTION MACHINERY UTILIZATION MONITORING BASED ON ACOUSTIC AND VIBRATION SIGNAL ANALYSIS Astapov,

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 7.9 THE FUTURE OF SOUND

More information

Real-time body tracking of a teacher for automatic dimming of overlapping screen areas for a large display device being used for teaching

Real-time body tracking of a teacher for automatic dimming of overlapping screen areas for a large display device being used for teaching CSIT 6910 Independent Project Real-time body tracking of a teacher for automatic dimming of overlapping screen areas for a large display device being used for teaching Student: Supervisor: Prof. David

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

IMIDTM. In Motion Identification. White Paper

IMIDTM. In Motion Identification. White Paper IMIDTM In Motion Identification Authorized Customer Use Legal Information No part of this document may be reproduced or transmitted in any form or by any means, electronic and printed, for any purpose,

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 International Conference on Applied Science and Engineering Innovation (ASEI 2015) Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 1 China Satellite Maritime

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Telecommunication Development Sector

Telecommunication Development Sector Telecommunication Development Sector Study Groups ITU-D Study Group 1 Rapporteur Group Meetings Geneva, 4 15 April 2016 Document SG1RGQ/218-E 22 March 2016 English only DELAYED CONTRIBUTION Question 8/1:

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Proceedings of the 3 rd International Conference on Control, Dynamic Systems, and Robotics (CDSR 16) Ottawa, Canada May 9 10, 2016 Paper No. 110 DOI: 10.11159/cdsr16.110 A Parametric Autoregressive Model

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Full Disclosure Monitoring

Full Disclosure Monitoring Full Disclosure Monitoring Power Quality Application Note Full Disclosure monitoring is the ability to measure all aspects of power quality, on every voltage cycle, and record them in appropriate detail

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Reducing False Positives in Video Shot Detection

Reducing False Positives in Video Shot Detection Reducing False Positives in Video Shot Detection Nithya Manickam Computer Science & Engineering Department Indian Institute of Technology, Bombay Powai, India - 400076 mnitya@cse.iitb.ac.in Sharat Chandran

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

A probabilistic framework for audio-based tonal key and chord recognition

A probabilistic framework for audio-based tonal key and chord recognition A probabilistic framework for audio-based tonal key and chord recognition Benoit Catteau 1, Jean-Pierre Martens 1, and Marc Leman 2 1 ELIS - Electronics & Information Systems, Ghent University, Gent (Belgium)

More information

A New "Duration-Adapted TR" Waveform Capture Method Eliminates Severe Limitations

A New Duration-Adapted TR Waveform Capture Method Eliminates Severe Limitations 31 st Conference of the European Working Group on Acoustic Emission (EWGAE) Th.3.B.4 More Info at Open Access Database www.ndt.net/?id=17567 A New "Duration-Adapted TR" Waveform Capture Method Eliminates

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information

Design Project: Designing a Viterbi Decoder (PART I)

Design Project: Designing a Viterbi Decoder (PART I) Digital Integrated Circuits A Design Perspective 2/e Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolić Chapters 6 and 11 Design Project: Designing a Viterbi Decoder (PART I) 1. Designing a Viterbi

More information

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Features for Audio and Music Classification

Features for Audio and Music Classification Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

More information

Measurement of overtone frequencies of a toy piano and perception of its pitch

Measurement of overtone frequencies of a toy piano and perception of its pitch Measurement of overtone frequencies of a toy piano and perception of its pitch PACS: 43.75.Mn ABSTRACT Akira Nishimura Department of Media and Cultural Studies, Tokyo University of Information Sciences,

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Comparison Parameters and Speaker Similarity Coincidence Criteria: Comparison Parameters and Speaker Similarity Coincidence Criteria: The Easy Voice system uses two interrelating parameters of comparison (first and second error types). False Rejection, FR is a probability

More information

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval with Temporal Features and Timbre Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC

More information

Implementation of A Low Cost Motion Detection System Based On Embedded Linux

Implementation of A Low Cost Motion Detection System Based On Embedded Linux Implementation of A Low Cost Motion Detection System Based On Embedded Linux Hareen Muchala S. Pothalaiah Dr. B. Brahmareddy Ph.d. M.Tech (ECE) Assistant Professor Head of the Dept.Ece. Embedded systems

More information

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition

homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition INSTITUTE FOR SIGNAL AND INFORMATION PROCESSING homework solutions for: Homework #4: Signal-to-Noise Ratio Estimation submitted to: Dr. Joseph Picone ECE 8993 Fundamentals of Speech Recognition May 3,

More information

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Luiz G. L. B. M. de Vasconcelos Research & Development Department Globo TV Network Email: luiz.vasconcelos@tvglobo.com.br

More information

An Introduction to the Spectral Dynamics Rotating Machinery Analysis (RMA) package For PUMA and COUGAR

An Introduction to the Spectral Dynamics Rotating Machinery Analysis (RMA) package For PUMA and COUGAR An Introduction to the Spectral Dynamics Rotating Machinery Analysis (RMA) package For PUMA and COUGAR Introduction: The RMA package is a PC-based system which operates with PUMA and COUGAR hardware to

More information

Using the BHM binaural head microphone

Using the BHM binaural head microphone 11/17 Using the binaural head microphone Introduction 1 Recording with a binaural head microphone 2 Equalization of a recording 2 Individual equalization curves 5 Using the equalization curves 5 Post-processing

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Marcello Herreshoff In collaboration with Craig Sapp (craig@ccrma.stanford.edu) 1 Motivation We want to generative

More information

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric

More information

Modeling memory for melodies

Modeling memory for melodies Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Jana Eggink and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 11

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Journal of Energy and Power Engineering 10 (2016) 504-512 doi: 10.17265/1934-8975/2016.08.007 D DAVID PUBLISHING A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations

More information

NewsComm: A Hand-Held Device for Interactive Access to Structured Audio

NewsComm: A Hand-Held Device for Interactive Access to Structured Audio NewsComm: A Hand-Held Device for Interactive Access to Structured Audio Deb Kumar Roy B.A.Sc. Computer Engineering, University of Waterloo, 1992 Submitted to the Program in Media Arts and Sciences, School

More information

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Introduction Active neurons communicate by action potential firing (spikes), accompanied

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

GNURadio Support for Real-time Video Streaming over a DSA Network

GNURadio Support for Real-time Video Streaming over a DSA Network GNURadio Support for Real-time Video Streaming over a DSA Network Debashri Roy Authors: Dr. Mainak Chatterjee, Dr. Tathagata Mukherjee, Dr. Eduardo Pasiliao Affiliation: University of Central Florida,

More information

Broken Wires Diagnosis Method Numerical Simulation Based on Smart Cable Structure

Broken Wires Diagnosis Method Numerical Simulation Based on Smart Cable Structure PHOTONIC SENSORS / Vol. 4, No. 4, 2014: 366 372 Broken Wires Diagnosis Method Numerical Simulation Based on Smart Cable Structure Sheng LI 1*, Min ZHOU 2, and Yan YANG 3 1 National Engineering Laboratory

More information

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang

More information

What is Statistics? 13.1 What is Statistics? Statistics

What is Statistics? 13.1 What is Statistics? Statistics 13.1 What is Statistics? What is Statistics? The collection of all outcomes, responses, measurements, or counts that are of interest. A portion or subset of the population. Statistics Is the science of

More information

An ecological approach to multimodal subjective music similarity perception

An ecological approach to multimodal subjective music similarity perception An ecological approach to multimodal subjective music similarity perception Stephan Baumann German Research Center for AI, Germany www.dfki.uni-kl.de/~baumann John Halloran Interact Lab, Department of

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Automatic Projector Tilt Compensation System

Automatic Projector Tilt Compensation System Automatic Projector Tilt Compensation System Ganesh Ajjanagadde James Thomas Shantanu Jain October 30, 2014 1 Introduction Due to the advances in semiconductor technology, today s display projectors can

More information

RainBar: Robust Application-driven Visual Communication using Color Barcodes

RainBar: Robust Application-driven Visual Communication using Color Barcodes 2015 IEEE 35th International Conference on Distributed Computing Systems RainBar: Robust Application-driven Visual Communication using Color Barcodes Qian Wang, Man Zhou, Kui Ren, Tao Lei, Jikun Li and

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

A Bayesian Network for Real-Time Musical Accompaniment

A Bayesian Network for Real-Time Musical Accompaniment A Bayesian Network for Real-Time Musical Accompaniment Christopher Raphael Department of Mathematics and Statistics, University of Massachusetts at Amherst, Amherst, MA 01003-4515, raphael~math.umass.edu

More information