Towards Auto-Documentary: Tracking the evolution of news in time

Size: px
Start display at page:

Download "Towards Auto-Documentary: Tracking the evolution of news in time"

Transcription

1 Towards Auto-Documentary: Tracking the evolution of news in time Paper ID : Abstract News videos constitute an important source of information for tracking and documenting important events. In these videos, news stories are often accompanied by short video clips that tend to be repeated during the course of the event. Automatic detection of such repetitions is essential for creating auto-documentaries. In this paper, we propose methods for detecting and tracking the evolution of news over time. Duplicate video sequences are detected by matching consecutive key-frames of the news video. The duplicate sequences that correspond to commercials placed in between the news are then detected and removed. The remaining duplicate video sequences are assumed to correspond to threads of news. As an alternative approach, we proposed a method for automatic detection of logo images used by the channels to mark news stories. Finally, we use the news transcripts to create topic clusters and compare these clusters with the duplicate sequences detected with the proposed methods. Experiments are carried on the TREC-VID data set, consisting of hours of news videos from two different channels, and the results are reported. I. INTRODUCTION News videos constitute an important source of information for tracking and documenting important events. These videos record the evolution of a news story in time and contain valuable information for creating documentaries. Automated tracking of the evolution of a news story over the course of an event can lead to a rough summarization of the event for auto-documentation. Although it is common to use the text for tracking related stories [], the visual content is often ignored. In news videos, news stories are often accompanied by short video sequences that tend to be used over and over during the course of the event. A particular video sequence can be used again with some modifications and/or additions either as a reminder of the current story or due to a lack of video material for the current story. Also, there is a tendency to repeat the important news of the day at some other time inside the same news program. For example, CNN advertises important news of the day under the caption of Top Stories at the onset of a news bulletin. The tendency of news channels to re-use the same video sequences can be used to track news stories by detecting the duplicate video sequences. Automatic detection of such repetitions can be used to detect and track important news stories. In this paper, we propose methods for tracking the evolution of news over time from actual news videos. The next section describes the data set and features used in our study. We present the method for detecting duplicate video sequences in Section III. In Section IV, we describe a method for the detection and removal of commercials from news videos. Section V describes the proposed approach for automatic detection of repeating news stories. Logo images used by the channels to mark news stories are used as an alternative approach for tracking news stories as will be explained in Section VI. Section VII presents results on how the topic clusters created from news transcripts can be used to compare the results obtained from the detection of duplicate video sequences. Finally, we conclude in Section VIII and discuss future lines of research. II. DATA SET AND INPUT REPRESENTATION In this study, the experiments are carried out on the data set provided by the content-based video retrieval track (TREC- VID) of the Text Retrieval Conference TREC []. The data set consists of hours of broadcast news videos ( thirty minutes programs) from ABC World News Tonight and CNN Headline News recorded by the Linguistic Data Consortium from late January through June 99. The common shot segmentations, defined by TREC-VID, are used as the basic units. One key-frame is extracted from each shot. In total, there are and shots from ABC and CNN videos respectively. On the average, videos for a single day contain shots in ABC and shots in CNN. Each key-frame is described by a set of features. The average and standard deviation of HSV values obtained from a grid ( features) are used as the color features. The mean values of twelve oriented energy filters (aligned uniformly with degree separation) extracted from a grid ( features) represent the texture information. Canny s edge detector is used to extract edge features from a grid. Schneiderman s face detector algorithm [] is used to detect frontal faces. The size and position of the largest face are used as the face features ( features). All the features are normalized to have zero mean and unit variance. III. DETECTING DUPLICATE SEQUENCES We define the video sequences that have similar consecutive key-frames as duplicate sequences. Due to shot segmentation, the same piece of a video can have different number of shots, and the key-frames selected from each shot may slightly differ. Also, due to the montaging process there may be slight modifications when a piece of video is re-used. Therefore, the same video or story may look like two different sequences.

2 (a) (b) Fig.. Due to shot segmentation, the same piece of a video may have different number of shots, and the key-frames selected from each shot may differ. For example, in (a) the nd and th key-frames of the top sequence are missing in the bottom sequence. In (b), the lengths of the sequences are same, but there are missing key-frames in both of the sequences. Also, the key-frames can be very similar but not same, as seen with the first and the second matching pairs in (a). With our definition, duplicate sequences are the sequences that share identical or very similar consecutive key-frames where some missing key-frames allowed. In Figure, two example pairs of duplicate sequences are shown. The number of shots can be different due to missing shots in one of the sequences as in (a), or although the lengths of the sequences are the same, the shots may be different as in (b). Furthermore, the key-frames may be very similar but not the same. In [], visual features extracted from I-frames are used to detect repeating news videos. However, due to large amount of data, using I-frames is not feasible and this system works only for detecting identical video segments. We propose a heuristic pattern matching method for detecting the duplicate sequences. The proposed method first detects candidate repeating key-frames (the key-frames that have matching pairs) and then constructs the longest sequence that have consecutive similar key-frames where missing elements are allowed. In the following sections, we will first explain the method to find candidate repeating key-frames by searching the identical or very similar frames using the feature similarities. Then, we describe the method to find the duplicate sequences. A. Finding candidate repeating key-frames Candidate repeating key-frames are defined as the keyframes that have identical or very similar matching key-frames. In [], similar news photographs are identified using iconic matching method which is adapted from []. However, in our case, there may be bigger differences between similar keyframes that may cause problems in iconic matching method (e.g. the text overlays, or large modifications due to montaging process). As defined, a candidate key-frame should have a few duplicates or very similar images, and the rest should be very different. To detect this property, for each image in the (a) (b) (c) (d) (e) Fig.. Top: Key-frame images, middle: distances to most similar images, bottom: derivatives. Red lines show the medians of the derivatives. Since there is a big gap between the most similar image and the others (a) and (b) are candidate repeating frames. (a) has only one duplicate, where (b) has similar key-frames. The key-frame shown in (c) repeats itself, but it is very frequent. Therefore, it is not chosen as a candidate. The key-frame in (d) is a regular news story and does not have duplicates. data set, we find the most similar N images using the feature similarities. There are news videos in each of ABC and CNN data sets. We assume that the same video sequence is not shown in all videos and choose N as. This threshold value eliminates the common scenes for a TV channel that are shown in almost all the news programs, which are analogous to the stop-words in text.(e.g. Headline News logo in CNN, sport logos, whether news, etc. ) In Figure, for some selected key-frame images the distances to the most similar images are shown in sorted order. If an image repeats itself k times, then there should be a jump in the similarity values after k images. In the figure, the jump indices show that the images in (a) and (c) have single similar images, and the key-frame in (b) has similar images. The image shown in (d) is a common scene for weather news and therefore repeats almost in all news programs. Since we consider most similar images, the jump is not seen for this image. The image in (e) is from a regular news story. Therefore, it doesn t have duplicates and the jump is not obvious. For these examples, the images in (a)-(c) should be candidate repeating frames since they have a jump which is obvious. In order to catch this property, we take the derivatives of the similarity values, as shown at the bottom part of Figure. Then, we find the median of these values. The images are assigned as candidate repeating key-frames if the ratio between the largest value and meadian value is larger than a threshold (for the experiments the threshold is chosen as ). This process chooses the images in Figures(a)-(c) as candidate repeating key-frames and eliminates the rest. B. Finding Duplicate Sequences Due to the errors in shot segmentation, similar sequences cannot be directly found by matching the consecutive candidate key-frames. This is because in between two matching candidate key-frames, there may be other key-frames that do not have any matching images. If we skip the non-candidate key-frames, and continue with the rest, then there is a chance to find a sequence which will cover the missing ones.

3 Definitions: C: set of candidate key-frames K: set of all non-candidate key-frames similar(c): list of similar key-frames for c k c : set of key-frames following a candidate key-frame c length(k c) Let S = c k c c k c c, and S = c k c c k c c S and S are duplicate sequences if i, c i similar(c i) Algorithm: for all c C for all c similar(c ) S = {c } S = {c } for all c i+ where c i+ neighbor(c i ) if c i+ similar(c i+) insert(s, k(c i), c i+) insert(s, k(c i), c i+) else break Fig.. Algorithm for detecting duplicate sequences. To detect matching sequences, the candidate key-frames are chosen as the starting point. For each candidate key-frame, the list of similar key-frames are searched. The matching candidate key-frames are taken as the first elements of a possible matching sequence pair. Then, consecutive key-frames of these candidates are examined to find the longest matching sequence. The sequence is expanded only if there are other matching key-frames in the close neighborhood. If such other matching pairs are found among the consecutive frames, they are inserted as the new elements of the matching sequences. The key-frames that resides in the interval in-between two inserted elements are also inserted to the sequences. This process repeats itself until no matching pairs are found. This is performed for each candidate key-frame in the data set. The algorithm is given in Figure A shorter sequence may be shown consecutively due to lack of video or anchor and reporter may follow each other which will result in a consecutive repeating sequence. These sequences are not considered as duplicate sequences, since they are inside the same news story or same commercial. To eliminate these sequences a threshold value is set and only the sequences that repeats themselves with a period longer than this threshold is chosen as duplicate sequences. IV. DETECTING AND REMOVING COMMERCIALS In news videos, commercials are often inter-mixed with news stories. For an efficient retrieval and browsing of the news stories, detection and removal of commercials are essential. Although, some studies [], [] used black frames to detect commercials, such approaches fail for the videos of TV channels that do not use black frames to flag commercial breaks. Color properties of commercials can also be used for detection, however this approach is liable to generate too many TABLE I COMMERCIAL DETECTION RESULTS FOR MOVIES ( KEY-FRAMES) ON CNN DATA. THE TRUE NUMBER OF COMMERCIALS IS 9. num-detected tp fp fn tn candidates 9 9 sequences pruned 9 false positives, a property that is undesirable for our task due to the danger of removing important news stories. The commercials tend to be repeated several times during news programs, and can be detected as duplicate sequences by the method proposed in the previous section. We propose to detect commercials based on the following observations. First, the commercials have a tendency to have longer duplicate sequences than the news stories. This is due to the fact that rapid scene changes during commercials causes frequent shotbreaks. Second, commercials are usually presented in groups. Therefore, the neighborhood information can be used to correct or prune the detection results. We first attempt to distinguish commercials from other repeating sequences based on their sequence lengths. Sequences with lengths greater than a threshold value T (chosen as in our experiments) are predicted to belong to commercials. Assuming that commercials are not repeated during a single news program, we look for repetitions that are at least key-frames (corresponding to approximately half the duration of a news program) apart. A key-frame that is surrounded by key-frames flagged as commercial is likely to belong to a commercial. Also, if a key-frame is the only one to be flagged as a commercial, it is likely to be wrong. We use a smoothing process, which assigns to each key-frame the dominant value over a window. Then this is used to prune the results. Table I and II present the results obtained from movies from CNN and ABC respectively. The first rows show the results when only the candidate key-frames are selected as commercials. Then, these candidate frames are grouped to obtain the duplicate sequences including the extra frames inbetween two candidate frames. Second rows correspond to the results corrected by finding the duplicate sequences which have length longer than key-frames. Finally, the results are further pruned by considering the neighborhood information as shown in the third rows. In CNN, there are key-frames in all movies, and 9 of them are correct commercials. With the proposed system of them are predicted correctly. In ABC, there are key-frames in all movies, and of them are correct commercials. With the proposed system of them are predicted correctly. We observed that in ABC among the missing ones of them belongs to a single commercial segment. Among the all key-frames in the whole data, 9 key-frames in ABC and 9 key-frames in CNN are detected as commercials and removed from the data.

4 TABLE II COMMERCIAL DETECTION RESULTS FOR MOVIES ( KEY-FRAMES) ON ABC DATA. THE TRUE NUMBER OF COMMERCIALS IS. num-detected tp fp fn tn candidates 9 9 sequences 9 pruned CNN headline news //99 white house says time is running out for iraq to avoid military strike. administration officials are reacting cooly to baghdad s latest offer to open presidential Palaces to international weapons inspectors CNN headline news //99 Russian president boris yeltsin has nominated a new prime minister He announced today he wants acting prime minster sergei kiriyenko to take over the post permanently. The russian parliament s lower house now has one week vote on the nomination. Yeltsin is threatening to disband the duma if it doesn t approve the -year-old kiriyenko Yeltsin dismissed his entire cabinet monday without warning Fig.. days. CNN headline news //99 iraq is again offering to allow a limited number of u.n. weapons inspectors into eigth presidential sites. The plan is giving inspectors two months to search the areas. The united states is demanding full access by u.n. weapons inpectors to all sites. The same story is used in two news stories with a period of seven CNN headline news //99 And russian president boris yeltsin nominated acting prime minister sergei kiriyenko to take over the post permanently. Yeltsin is threatening to disband parliament if lawmakers don t approve his choice Fig.. A news story from CNN headline news on //99. First, the story is presented in full length, then at the end of the news program a summary is repeated with a title Top stories. V. THREADS: REPEATING NEWS STORIES The evolution of news stories in time can be tracked by finding the threads - the repeating news videos. It is observed that, especially in CNN news, part of the video sequence for the important events is commonly used as a self-advertisement or as a reminder, with the text overlays ahead, later, or top stories. An example of this type of re-use of video material is shown in Figure. Detection of such duplicate sequences are important, since the shorter sequences are given as the summary of the whole event which can be very helpful for automatic-summarization of the news programs. More interestingly, the same video sequence can be used over time to show the related news stories that continue over a period as in the example shown in Figure. Tracking those sequences may provide more efficient retrieval of the important news videos since the related stories can be extracted all in once. After the removal of commercials, the remaining key-frames are processed to obtain the duplicate video sequences that correspond to threads - repeating news stories. Video sequences that repeat themselves after key-frames and that have length greater than or equal to one are detected as repeating stories. With the proposed method, 9 sequences in CNN and sequences in ABC are detected as duplicate sequences. The length of the detected duplicate sequences varies from - as shown in Figure. It is observed that CNN has a tendency to use longer sequences later in some other stories. Having a large amount of single frames in ABC shows either it is common to re-use only a small part of the previous video material or the order of the sequences are changed. The period of re-using the same video material also varies. Figure shows the periods to repeat the same sequence for both CNN and ABC. The shorter periods usually correspond to advertising a story during the same day s news. The example shown in Figure is an example of this type. It is an example from CNN where a summary of the current day s important event is given as Top stories. With the proposed method, the sequence with four key-frames are detected as a repeating sequence. The longer periods correspond to stories that repeats after a few days. Therefore, those are the interesting ones. Usually, the same sequence is used in a following story to remind the past events. Figure show an example of this type. It is seen that the same sequence is used by a period of one week to represent the following stores. In this example, the detected duplicate sequence consists of two key-frames. VI. DETECTING DUPLICATE LOGOS Another helpful property of news programs for finding the related stories is the re-use of the same logo - the small graphics or picture that appears on the screen along with the anchor person. There is a tendency to use the same logo for related stories, or to show the evolution of a story in time. We are especially interested in finding the similar logos which appear on different dates, which may be used as a hint for connecting the coverages of an ongoing news event. Figure shows an example logo which is used for different news stories

5 num stories num stories TABLE III DETECTING ANCHOR-LOGO FRAMES: SELECTED IMAGES WITH LOGOS, AND RANDOM IMAGES WITHOUT LOGOS ARE USED FOR TRAINING. IMAGES WITH LOGOS, AND 9 IMAGES WITHOUT LOGOS ARE USED AS THE HELD-OUT TEST DATA. THE NUMBERS SHOW THE CORRECT MATCHES FOR THREE DIFFERENT METHODS. pattern length pattern length (a) (b) Fig.. Lengths of the duplicate sequences (a) for CNN and (b) for ABC. Pattern length for the repeated stories varies from to. logo-training nonlogo-training logo-test nonlogo-test -NN k-nn (k=) 99 9 mahalanobis 9 9 x x..... period period story story Fig.. In broadcast news, there is tendency to use the same video footage for the stories that follow each other. Usually, the repeated news are the important ones. For the repeating stories found by the proposed method, the periods (the time that the same story is re-used) are shown (a) for CNN and (b) for ABC. Average number of shots in for an half hour CNN news video is around. This means that the movies that have periods longer than shots are shown in different days. Therefore we can consider them as important events. Similarly the sequences with a period longer than are the important ones for ABC. about tornados in different days. Our goal is to detect identical or very similar logos to find these dependencies between news stories. We make use of the iconic matching method [], [] for finding the matching logo pairs. Before finding the identical or very similar logos by iconic matching, we first select the anchor-logo frames from the news reports. Anchor-logo frames are the frames that have both the anchor person and a logo side-by-side. For the experiments, we use only the CNN news where the logo appears at the right. After the detection of anchor-logo frames, the region that corresponds to logos are //99 The death toll in central florida is climbing. Authorities now say at least 9 people are dead after several tornadoes touched down overnight. Florida governor lawton chiles is leaving washington today to tour the area. //99 Dozens of tornadoes have left their mark from michigan to massachusetts. A band of powerful thunderstorms ripped through new england yesterday. Fig.. The same/similar logo is used in different days to present different/related stories about tornados. Fig. 9. Some of the images which are classified as anchor-logo frames using nearest neighbor method are shown. The first two rows show the correct detection results. The last row shows the false positives. cropped and among all of the logo regions matching logos are paired. A. Detecting Anchor-Logo Frames In order to detect the anchor-logo frames, frames with a logo are labeled manually as positive examples, and frames without logos are chosen randomly as negative examples. Three methods are tried to find the images with logos using this training set. Build two clusters for negative and positive examples respectively, and assign the test images to the closest cluster center using the mahalanobis distance Assign the test images to the label of the nearest training example Assign the test image to the dominant label of k nearest neighbors where k= As Table III shows, the best score is obtained by the nearest neighbor method. Since we want as many images with logos as possible, false positives are better than false negatives. Figure 9 shows some of the images detected as anchor-logo frames. First two rows show the correctly labeled frames and the last row shows some of the incorrect results. As it may be seen, the errors are due to the similarity of the images to some logo images. Overall, images are detected as anchor-logo frames where of them are correct. B. Finding Duplicate Logos using Iconic Matching After having a set of anchor-logo frames, logos are cut-off from the predefined upper-right corner of these frames. The

6 s s frequencies medal Japan US logo pairs Fig.. Frequency of logos (number of times that the same logo is used). Most of the logos are repeated only once. There are only three logos that are repeated over times. Fig.. The same logo is used over time to present stories that are similar, or that follows each other. For some selected logos, the time period that the logo used is shown. Some events occur in a short period, such as GM strike or Medals, and some of the events has longer periods, such as Clinton investigation. logos are re-sampled to the size of -by- to facilitate the following steps as given in []. From each of the logos, we compute sets of the -Dimensional Haar coefficients, one for each of the RGB channels, of the pixel values of these logos. The RGB values are in the interval [,]. We keep coefficients which located at the upper-left corner of the transform domain and construct a representating feature vector of a logo. The coefficients we kept are the overall averages and the low frequency coefficients of the three channels. Finding matching logos is a similarity search based on the feature vectors of the logos. We consider two logos are matched, if more than coefficients in their feature vectors have differences smaller than some thresholds ( for the first three overall averages, and for the rest of the coefficients). Among images detected as anchor-logo frames, for of them a matching pair is found. The number of distinct logos is. Figure shows, number of times that the same logo is used for each of these distinct logo clusters. The period of time that a same logo is used is different for different stories. As Figure shows the timeline for a news story may lie into a large period as in Clinton Investigation story, or it may stay important only for a few days as in GM strike and Medals stories. Fig.. The bipartite graph The graph shown is G = (V S V W, E), where shot-nodes V S = {s, s } and word-nodes V W = {medal, Japan, US}. The shot s is associated with the transcript words medal and Japan, while s is associated with the transcript words Japan and US. VII. AUTOMATIC TOPIC ASSIGNMENT What is the story of the repeating pattern we found? Could we find repeating segments of similar stories, semantically? In this section, we try to answer these questions by relating the shots (key-frames) and the topics (words) of continuing stories in the news programs. A continuing story may use some particular words repeatedly in the transcript everytime the story is reported in the news while the key-frames of a story in different shots may differ. For example, the shots of the Winter Olympic Games may be different in different reports, but certain words such as medal, gold and olympic may appear in all these shots. On the other hand, as the story evolves, the word usage might gradually change. But same key-frames may appear again to remind the audience about the development of the story. For example, the picture of President Clinton with Monica Lewinsky may appear again and again, even the transcripts in the shots are changed focusing on the new findings from the investigation. A. Co-clustering We model the problem of finding evolving stories as a clustering problem, where the shots (key-frames) and words are grouped into clusters based on the shot-word co-occurrences. Given a video clip of N shots (key-frames) and a vocabulary of M words containing all the words used in the transcript, we partition the transcript accordingly, with respect to the shots using the off-the-shelf techniques [9]. After partitioning the transcript, each shot is associated with a set of words. For example, a shot of the Winter Olympic Games may be associated with words medal, gold and so on. We build a bipartite graph G = (V, E), where the nodes V = V S V W, the shot-nodes V S = {s,..., s N } is a set of nodes of shots, and the word-nodes V W = {w,..., w M } is a set of nodes of words in the vocabulary. An edge (s i, w j ) is included in the edge set E, if the word w j appears in the transcript of the shot s i. For example, if a video clip has two shots, the first shot is about 99 Nagoya Winter Olympic Games with words medal and Japan, and the second is about economy with words Japan and US. The vocabulary is {medal, Japan, US}. The corresponding graph G is shown in Figure. Given the desired number of groups K (equivalently, number of stories) that we want to discover from the bipartite graph G, we apply the spectral graph partitioning technique [] to

7 Labels: Labels: First thread of the pair Second thread of the pair Fig.. Total number of subgraphs is set at K = 9. Story of the first thread: The federal reserve is now leaning to raise interest rate. According to the Wall Street Journal, the fed has abandoned its neutral stance, and is concern about the continuing strength of the nation s economy, and the failure of the Asian economy crisis to help slow things down. However, the journal said any hike rate is not expected to come until after the Fed s next meeting on May 9th. But that is not much comfort to the stock and bond markets today. Story of the second thread: Meanwhile, all eyes on are on the federal reserve, which is holding its policy meeting today in Washington. Most economists believe that no change in interest rates is likely today, though a rate hike is possible later in this year. Labels: Labels: Labels: Labels: partition G into K subgraphs, where the number of edges bridging from one subgraph to another subgraph is minimized, and each subgraph is constrained to have similar size (i.e., similar number of nodes in each subgraph). Each subgraph is considered to be a story (or multiple similar stories), consisting of the shots (shot-nodes) and the words (word-nodes) belong to the subgraph. Shots and words belong to the i th subgraph are labeled as i. We are interested in automatically identifying the content of the thread pairs and the logo clusters we detected in the previous sections. The idea is to use the result obtained from the coclustering. For a thread pair T = {(s,..., s m ), (t,..., t n )}, where s i s and t j s are shots of the two thread members in T. We first look up the cluster labels of these shots and have a cluster label pair C(T ) = {(c,..., c m ), (d,..., d n )}. Note that there are duplicates among c i s and d j s, since two shots s and s can have the same cluster label. Let the most frequent label shared by the threads in the pair be c. We would describe the content of the thread pair by the words of the cluster with label c. Similarly, for a logo cluster L = (s,..., s m ), where s i s are shots. We look up the cluster labels of s i s and have a cluster label sequence C(L) = (c,..., c m ). Let the most frequent label in C(L) be c. We would describe the content of the thread pair by the words of the cluster with label c. Figure shows an example thread pair. The story is about the Federal Reserve s decision on interest rate. The words automatically chosen to describe this thread pair (cluster ) are income economy company price consumer bond reserve investment motor bank bathroom chrysler credit insurance cost communication steel airline telephone microsoft strength, which reflect the story content quite well. Figure shows an example logo cluster. The story is about the Lewinsky scandal. The words automatically chosen to describe this logo cluster (Cluster ) contains words which reflect the story content very well, including the names of the main people involved such as monica, lewinksy, paula and starr. The other clusters also have related words about the scandal. Fig.. Related stories with the same Clinton Investigation logo. The number of subgraphs is set at K =. The most common cluster is cluster, which includes the following words: brian monica lewinsky lawyer whitewater counsel jury investigation paula starr relationship reporter ginsburg deposition vernon affair oprah winfrey cattle source intern white deputy lindsey immunity aide adviser subject testimony subpoena courthouse privilege conversation mcdougal showdown turkey. Some words from other clusters : cluster - president clinton investigator scandal assault, cluster - bill official campaign jones lawsuit, cluster - court supreme document evidence. B. Measuring Coherence We design a metric which we called coherence to measure the goodness of the labeling of the thread pairs and the logo clusters using the coclustering result. Intuitively, the coherence measures the degree of homogeneity of the cluster labels assigned to a thread pair or a logo cluster. Definition : (Logo cluster coherence) Let L = (s,..., s m ) be a logo cluster of m shots (s i s). The cluster labels assigned to the shots in L are C(L) = (c,..., c m ). Let c be the most frequent label value in C(L). The logo cluster coherence H logo is defined as m i= H logo = I(c i == c ), m where the function I(p) =, when the predicate p is true, and I(p) =, otherwise. Note that the range of H logo is [ m, ]. Definition : (Thread pair coherence) Let T = {(s,..., s m ), (t,..., t n )} be a thread pair consisting of two threads of shots (s i s and t j s). The cluster labels assigned to the shots in T are C(T ) = {(c,..., c m ), (d,..., d n )}. Let e be the most frequent label value shared among labels c i s and d j s. The thread pair coherence H thread is defined as m i= H pair = I(c i == e ) + n i= I(d i == e ), n + m where the function I(p) =, when the predicate p is true, and I(p) =, otherwise. Note that the range of H pair is [, ]. Table IV reports the average of the coherence values of all logo clusters we collected from the CNN set. The base value shown in Table IV is i= m i, where m i is the size of

8 TABLE IV H l ogo : LOGO CLUSTER COHERENCES. BASE COHERENCE MEASURE WHICH INDICATES THE WORST POSSIBLE COHERENCE IS.9. Random avg and std CORRESPONDS TO VALUES OF RANDOMLY GENERATED GROUPS. K= K= K= K= K=9 H logo random (avg) random (std)..... the i-th logo pair. The base value indicates the worse degree of coherence the data set could get. The proposed labeling gives at least half (in average) of the shots in a logo cluster the same label. This shows a good degree of coherence between the visual features (on which a logo cluster is formed) and the story content, and our method captures this automatically. As expected, having K = subgraphs (story topics) gives the highest coherence, since it has the least diversity on labels. However, the coherence value remains stable as K increases, which is good, for it is hard to select a correct K in practice. We compare the results with the groups that consist of randomly selected shots. The results show that, the coherence of logo groups are significantly better than random groups. Table V reports the average thread pair coherence values of all thread pairs we collected from the CNN set. In the table, we also show the single thread coherence (denoted as H thread ), which is the coherence value of a thread in a thread pair, and it is defined similarly to that of the logo cluster coherence. Each thread, which is essentially a list of shots, is viewed as a logo cluster and the corresponding logo cluster coherence is computed as its single thread coherence. The single thread coherence is above %, which indicates a great degree of coherence among shots in a thread (which are usually consecutive shots in the video clip). The proposed labeling method assigns same label to shots in the two parts of a thread pair only about one-tenth of the time. This shows that a great deal of difference exists in transcript words as an event evolves. This may also due to our coclustering algorithm which provides a hard clustering among the words. We are currently extending our work to soft clustering algorithm to try to inprove this labeling performance. Also, as shown in Figure, although the cluster numbers are different the clusters associated with the repeating stories have very similar words. Our strict coherence measure is unable to catch these similarities. When such similarities between words are captured, it is easier to observe the overlaps in the topics for the repeating stories. VIII. DISCUSSIONS AND FUTURE WORK The tendency to re-use the same video material allowed us to detect and track important news stories by detecting visual patterns (duplicate video sequences and logos) and semantic patterns (topics). The duplicate video sequences are detected with a heuristic pattern matching algorithm and same logos TABLE V THREAD PAIR COHERENCES AND SINGLE THREAD COHERENCES K= K= K= K= K=9 H pair H thread..... are detected using the iconic matching method. The proposed method for finding the related word clusters for logo and thread pairs show that detected pairs are semantically coherent. With the proposed approach, threads are found by searching the duplicate sequences which have consecutive similar patterns. This approach captures the sequences that are used again in a shorter form but unable to capture the modifications due to montaging process such as changing the position of the video parts. Instead of duplicate sequences, detection of duplicate bag of key-frames could solve such problems. Commercials are distinguished from the repeating news stories by the sequence length and neighborhood information. Including audio and transcripts will help to differentiate them better, since in commercials the other material will also be duplicated but not in the news stories. The evolution of news stories in time are important for creating documentaries automatically. With the proposed methods, it is possible to track the stories with similar visual or semantic content inside a single TV channel. Same news story may also be presented in different channels in many different forms with different visual and rhetoric styles. This may represent the perspectives of different TV channels, or even the perspectives of different regions or countries. Capturing the use of similar material may provide valuable information to detect differences in perspectives. REFERENCES [] Topic detection and tracking (TDT) Benchmark by NIST, [] TRECVID Guidelines, [] H. Schneiderman, T. Kanade, Object Detection Using the Statistics of Parts, International Journal of Computer Vision,. [] F. Yamagishi, S. Satoh, T. Hamada, M. Sakauchi, Identical Video Segment Detection for Large-Scale Broadcast Video Archives, International Workshop on Content-Based Multimedia Indexing (CBMI ), pp. -, Rennes, France, Sept. -,. [] J. Edwards, R. White, D. Forsyth, Words and Pictures in the News, HLT-NAACL Workshop on Learning Word Meaning from Non- Linguistic Data, Edmonton, Canada, May. [] C. E. Jacobs, A. Finkelstein, D. H. Salesin, Fast Multiresolution Image Querying, Proc. SIGGRAPH-9, pp. -, 99. [] A. Hauptmann, M. Witbrock, Story Segmentation and Detection of Commercials in Broadcast News Video, Advances in Digital Libraries Conference (ADL 9), Santa Barbara, CA, April -, 99 [] S. Marlow, D. A. Sadlier, K. McGeough, N. O Connor, N. Murphy, Audio and Video Processing for Automatic TV Advertisement Detetion, Proceedings of ISSC,. [9] H. Wactlar, M. Christel, Y. Gong and A. Hauptmann, Lessons Learned from the Creation and Deployment of a Terabyte Digital Video Library, IEEE Computer, vol., no., pp. -, February 999. [] I. S. Dhillon, Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning, Proceedings of the Seventh ACM SIGKDD Conference, August.

Towards Auto-Documentary: Tracking the Evolution of News Stories

Towards Auto-Documentary: Tracking the Evolution of News Stories Towards Auto-Documentary: Tracking the Evolution of News Stories Pinar Duygulu CS Department University of Bilkent, Turkey duygulu@cs.bilkent.edu.tr Jia-Yu Pan CS Department Carnegie Mellon University

More information

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning Mitsubishi Electric Research Laboratories, Cambridge, MA, USA {peker,ajayd,}@merl.com

More information

Name Identification of People in News Video by Face Matching

Name Identification of People in News Video by Face Matching Name Identification of People in by Face Matching Ichiro IDE ide@is.nagoya-u.ac.jp, ide@nii.ac.jp Takashi OGASAWARA toga@murase.m.is.nagoya-u.ac.jp Graduate School of Information Science, Nagoya University;

More information

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Story Tracking in Video News Broadcasts Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Acknowledgements Motivation Modern world is awash in information Coming from multiple sources Around the clock

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Multi-modal Analysis for Person Type Classification in News Video

Multi-modal Analysis for Person Type Classification in News Video Multi-modal Analysis for Person Type Classification in News Video Jun Yang, Alexander G. Hauptmann School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, PA 15213, USA {juny, alex}@cs.cmu.edu,

More information

Reducing False Positives in Video Shot Detection

Reducing False Positives in Video Shot Detection Reducing False Positives in Video Shot Detection Nithya Manickam Computer Science & Engineering Department Indian Institute of Technology, Bombay Powai, India - 400076 mnitya@cse.iitb.ac.in Sharat Chandran

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite

Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite Colin O Toole 1, Alan Smeaton 1, Noel Murphy 2 and Sean Marlow 2 School of Computer Applications 1 & School of Electronic Engineering

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Other funding sources. Amount requested/awarded: $200,000 This is matching funding per the CASC SCRI project

Other funding sources. Amount requested/awarded: $200,000 This is matching funding per the CASC SCRI project FINAL PROJECT REPORT Project Title: Robotic scout for tree fruit PI: Tony Koselka Organization: Vision Robotics Corp Telephone: (858) 523-0857, ext 1# Email: tkoselka@visionrobotics.com Address: 11722

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Assembling Personal Speech Collections by Monologue Scene Detection from a News Video Archive

Assembling Personal Speech Collections by Monologue Scene Detection from a News Video Archive Assembling Personal Speech Collections by Monologue Scene Detection from a News Video Archive Ichiro IDE ide@is.nagoya-u.ac.jp, ide@nii.ac.jp Naoki SEKIOKA nsekioka@murase.m.is.nagoya-u.ac.jp Graduate

More information

Advertisement Detection and Replacement using Acoustic and Visual Repetition

Advertisement Detection and Replacement using Acoustic and Visual Repetition Advertisement Detection and Replacement using Acoustic and Visual Repetition Michele Covell and Shumeet Baluja Google Research, Google Inc. 1600 Amphitheatre Parkway Mountain View CA 94043 Email: covell,shumeet

More information

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level *0192736882* STATISTICS 4040/12 Paper 1 October/November 2013 Candidates answer on the question paper.

More information

MPEG has been established as an international standard

MPEG has been established as an international standard 1100 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999 Fast Extraction of Spatially Reduced Image Sequences from MPEG-2 Compressed Video Junehwa Song, Member,

More information

FREE TIME ELECTION BROADCASTS

FREE TIME ELECTION BROADCASTS FREE TIME ELECTION BROADCASTS 2016 Edition Production Guidelines Note: These Production Guidelines apply to all Federal, State & Territory Elections. The ABC may revise these election production guidelines

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

FREE TIME ELECTION BROADCASTS

FREE TIME ELECTION BROADCASTS FREE TIME ELECTION BROADCASTS LAST REVISED: OCTOBER 2014 Production Guidelines Note: These Production Guidelines apply to all Federal, State & Territory general elections. The ABC may revise these election

More information

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Luiz G. L. B. M. de Vasconcelos Research & Development Department Globo TV Network Email: luiz.vasconcelos@tvglobo.com.br

More information

Getting Started After Effects Files More Information. Global Modifications. Network IDs. Strand Opens. Bumpers. Promo End Pages.

Getting Started After Effects Files More Information. Global Modifications. Network IDs. Strand Opens. Bumpers. Promo End Pages. TABLE of CONTENTS 1 Getting Started After Effects Files More Information Introduction 2 Global Modifications 9 Iconic Imagery 21 Requirements 3 Network IDs 10 Summary 22 Toolkit Specifications 4 Strand

More information

Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval David Chen, Peter Vajda, Sam Tsai, Maryam Daneshi, Matt Yu, Huizhong Chen, Andre Araujo, Bernd Girod Image,

More information

Adaptive Key Frame Selection for Efficient Video Coding

Adaptive Key Frame Selection for Efficient Video Coding Adaptive Key Frame Selection for Efficient Video Coding Jaebum Jun, Sunyoung Lee, Zanming He, Myungjung Lee, and Euee S. Jang Digital Media Lab., Hanyang University 17 Haengdang-dong, Seongdong-gu, Seoul,

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Incorporating Domain Knowledge with Video and Voice Data Analysis in News Broadcasts

Incorporating Domain Knowledge with Video and Voice Data Analysis in News Broadcasts Incorporating Domain Knowledge with Video and Voice Data Analysis in News Broadcasts Kim Shearer IDIAP P.O. BOX 592 CH-1920 Martigny, Switzerland Kim.Shearer@idiap.ch Chitra Dorai IBM T. J. Watson Research

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC 12th International Society for Music Information Retrieval Conference (ISMIR 2011) MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC Sam Davies, Penelope Allen, Mark

More information

Video summarization based on camera motion and a subjective evaluation method

Video summarization based on camera motion and a subjective evaluation method Video summarization based on camera motion and a subjective evaluation method Mickaël Guironnet, Denis Pellerin, Nathalie Guyader, Patricia Ladret To cite this version: Mickaël Guironnet, Denis Pellerin,

More information

Understanding PQR, DMOS, and PSNR Measurements

Understanding PQR, DMOS, and PSNR Measurements Understanding PQR, DMOS, and PSNR Measurements Introduction Compression systems and other video processing devices impact picture quality in various ways. Consumers quality expectations continue to rise

More information

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

A Fast Alignment Scheme for Automatic OCR Evaluation of Books A Fast Alignment Scheme for Automatic OCR Evaluation of Books Ismet Zeki Yalniz, R. Manmatha Multimedia Indexing and Retrieval Group Dept. of Computer Science, University of Massachusetts Amherst, MA,

More information

Research & Development. White Paper WHP 228. Musical Moods: A Mass Participation Experiment for the Affective Classification of Music

Research & Development. White Paper WHP 228. Musical Moods: A Mass Participation Experiment for the Affective Classification of Music Research & Development White Paper WHP 228 May 2012 Musical Moods: A Mass Participation Experiment for the Affective Classification of Music Sam Davies (BBC) Penelope Allen (BBC) Mark Mann (BBC) Trevor

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information

Story Tracking in Video News Broadcasts

Story Tracking in Video News Broadcasts Story Tracking in Video News Broadcasts Jedrzej Zdzislaw Miadowicz M.S., Poznan University of Technology, 1999 Submitted to the Department of Electrical Engineering and Computer Science and the Faculty

More information

Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx

Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx Olivier Lartillot University of Jyväskylä, Finland lartillo@campus.jyu.fi 1. General Framework 1.1. Motivic

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Combining Pay-Per-View and Video-on-Demand Services

Combining Pay-Per-View and Video-on-Demand Services Combining Pay-Per-View and Video-on-Demand Services Jehan-François Pâris Department of Computer Science University of Houston Houston, TX 77204-3475 paris@cs.uh.edu Steven W. Carter Darrell D. E. Long

More information

Audio Structure Analysis

Audio Structure Analysis Lecture Music Processing Audio Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Music Structure Analysis Music segmentation pitch content

More information

Wipe Scene Change Detection in Video Sequences

Wipe Scene Change Detection in Video Sequences Wipe Scene Change Detection in Video Sequences W.A.C. Fernando, C.N. Canagarajah, D. R. Bull Image Communications Group, Centre for Communications Research, University of Bristol, Merchant Ventures Building,

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

A Bayesian Network for Real-Time Musical Accompaniment

A Bayesian Network for Real-Time Musical Accompaniment A Bayesian Network for Real-Time Musical Accompaniment Christopher Raphael Department of Mathematics and Statistics, University of Massachusetts at Amherst, Amherst, MA 01003-4515, raphael~math.umass.edu

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

Controlling Peak Power During Scan Testing

Controlling Peak Power During Scan Testing Controlling Peak Power During Scan Testing Ranganathan Sankaralingam and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering University of Texas, Austin,

More information

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin Indexing local features Wed March 30 Prof. Kristen Grauman UT-Austin Matching local features Kristen Grauman Matching local features? Image 1 Image 2 To generate candidate matches, find patches that have

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1

BBM 413 Fundamentals of Image Processing Dec. 11, Erkut Erdem Dept. of Computer Engineering Hacettepe University. Segmentation Part 1 BBM 413 Fundamentals of Image Processing Dec. 11, 2012 Erkut Erdem Dept. of Computer Engineering Hacettepe University Segmentation Part 1 Image segmentation Goal: identify groups of pixels that go together

More information

Doubletalk Detection

Doubletalk Detection ELEN-E4810 Digital Signal Processing Fall 2004 Doubletalk Detection Adam Dolin David Klaver Abstract: When processing a particular voice signal it is often assumed that the signal contains only one speaker,

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

CHAPTER 8 CONCLUSION AND FUTURE SCOPE

CHAPTER 8 CONCLUSION AND FUTURE SCOPE 124 CHAPTER 8 CONCLUSION AND FUTURE SCOPE Data hiding is becoming one of the most rapidly advancing techniques the field of research especially with increase in technological advancements in internet and

More information

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions 1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Introduction Active neurons communicate by action potential firing (spikes), accompanied

More information

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts Gerald Friedland, Luke Gottlieb, Adam Janin International Computer Science Institute (ICSI) Presented by: Katya Gonina What? Novel

More information

OPERATIONS SEQUENCING IN A CABLE ASSEMBLY SHOP

OPERATIONS SEQUENCING IN A CABLE ASSEMBLY SHOP OPERATIONS SEQUENCING IN A CABLE ASSEMBLY SHOP Ahmet N. Ceranoglu* 1, Ekrem Duman*, M. Hamdi Ozcelik**, * Dogus University, Dept. of Ind. Eng., Acibadem, Istanbul, Turkey ** Yapi Kredi Bankasi, Dept. of

More information

Algorithmic Music Composition

Algorithmic Music Composition Algorithmic Music Composition MUS-15 Jan Dreier July 6, 2015 1 Introduction The goal of algorithmic music composition is to automate the process of creating music. One wants to create pleasant music without

More information

Speech Recognition and Signal Processing for Broadcast News Transcription

Speech Recognition and Signal Processing for Broadcast News Transcription 2.2.1 Speech Recognition and Signal Processing for Broadcast News Transcription Continued research and development of a broadcast news speech transcription system has been promoted. Universities and researchers

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences

Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences , pp.120-124 http://dx.doi.org/10.14257/astl.2017.146.21 Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences Mona A. M. Fouad 1 and Ahmed Mokhtar A. Mansour

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Story Segmentation and Detection of Commercials In Broadcast News Video

Story Segmentation and Detection of Commercials In Broadcast News Video Story Segmentation and Detection of Commercials In Broadcast News Video Alexander G. Hauptmann Department of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3890, USA Tel: 1-412-348-8848

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

The Future of EMC Test Laboratory Capabilities. White Paper

The Future of EMC Test Laboratory Capabilities. White Paper The Future of EMC Test Laboratory Capabilities White Paper The complexity of modern day electronics is increasing the EMI compliance failure rate. The result is a need for better EMI diagnostic capabilities

More information

h t t p : / / w w w. v i d e o e s s e n t i a l s. c o m E - M a i l : j o e k a n a t t. n e t DVE D-Theater Q & A

h t t p : / / w w w. v i d e o e s s e n t i a l s. c o m E - M a i l : j o e k a n a t t. n e t DVE D-Theater Q & A J O E K A N E P R O D U C T I O N S W e b : h t t p : / / w w w. v i d e o e s s e n t i a l s. c o m E - M a i l : j o e k a n e @ a t t. n e t DVE D-Theater Q & A 15 June 2003 Will the D-Theater tapes

More information

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT CSVT -02-05-09 1 Color Quantization of Compressed Video Sequences Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 Abstract This paper presents a novel color quantization algorithm for compressed video

More information

Precision testing methods of Event Timer A032-ET

Precision testing methods of Event Timer A032-ET Precision testing methods of Event Timer A032-ET Event Timer A032-ET provides extreme precision. Therefore exact determination of its characteristics in commonly accepted way is impossible or, at least,

More information

Analysis of MPEG-2 Video Streams

Analysis of MPEG-2 Video Streams Analysis of MPEG-2 Video Streams Damir Isović and Gerhard Fohler Department of Computer Engineering Mälardalen University, Sweden damir.isovic, gerhard.fohler @mdh.se Abstract MPEG-2 is widely used as

More information

Audio Compression Technology for Voice Transmission

Audio Compression Technology for Voice Transmission Audio Compression Technology for Voice Transmission 1 SUBRATA SAHA, 2 VIKRAM REDDY 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Manitoba Winnipeg,

More information

EDDY CURRENT IMAGE PROCESSING FOR CRACK SIZE CHARACTERIZATION

EDDY CURRENT IMAGE PROCESSING FOR CRACK SIZE CHARACTERIZATION EDDY CURRENT MAGE PROCESSNG FOR CRACK SZE CHARACTERZATON R.O. McCary General Electric Co., Corporate Research and Development P. 0. Box 8 Schenectady, N. Y. 12309 NTRODUCTON Estimation of crack length

More information

arxiv: v1 [cs.ir] 16 Jan 2019

arxiv: v1 [cs.ir] 16 Jan 2019 It s Only Words And Words Are All I Have Manash Pratim Barman 1, Kavish Dahekar 2, Abhinav Anshuman 3, and Amit Awekar 4 1 Indian Institute of Information Technology, Guwahati 2 SAP Labs, Bengaluru 3 Dell

More information

OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS

OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS OBJECT-BASED IMAGE COMPRESSION WITH SIMULTANEOUS SPATIAL AND SNR SCALABILITY SUPPORT FOR MULTICASTING OVER HETEROGENEOUS NETWORKS Habibollah Danyali and Alfred Mertins School of Electrical, Computer and

More information

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing Universal Journal of Electrical and Electronic Engineering 4(2): 67-72, 2016 DOI: 10.13189/ujeee.2016.040204 http://www.hrpub.org Investigation of Digital Signal Processing of High-speed DACs Signals for

More information

Enhancing Music Maps

Enhancing Music Maps Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing

More information

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface 1st Author 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl. country code 1st author's

More information

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation IEICE TRANS. COMMUN., VOL.Exx??, NO.xx XXXX 200x 1 AER Wireless Multi-view Video Streaming with Subcarrier Allocation Takuya FUJIHASHI a), Shiho KODERA b), Nonmembers, Shunsuke SARUWATARI c), and Takashi

More information

Auto classification and simulation of mask defects using SEM and CAD images

Auto classification and simulation of mask defects using SEM and CAD images Auto classification and simulation of mask defects using SEM and CAD images Tung Yaw Kang, Hsin Chang Lee Taiwan Semiconductor Manufacturing Company, Ltd. 25, Li Hsin Road, Hsinchu Science Park, Hsinchu

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Essence of Image and Video

Essence of Image and Video 1 Essence of Image and Video Wei-Ta Chu 2010/9/23 2 Essence of Image Wei-Ta Chu 2010/9/23 Chapters 2 and 6 of Digital Image Procesing by R.C. Gonzalez and R.E. Woods, Prentice Hall, 2 nd edition, 2001

More information

Temporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle

Temporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle 184 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 Temporal Error Concealment Algorithm Using Adaptive Multi- Side Boundary Matching Principle Seung-Soo

More information

A Pattern Recognition Approach for Melody Track Selection in MIDI Files

A Pattern Recognition Approach for Melody Track Selection in MIDI Files A Pattern Recognition Approach for Melody Track Selection in MIDI Files David Rizo, Pedro J. Ponce de León, Carlos Pérez-Sancho, Antonio Pertusa, José M. Iñesta Departamento de Lenguajes y Sistemas Informáticos

More information

Evaluating Melodic Encodings for Use in Cover Song Identification

Evaluating Melodic Encodings for Use in Cover Song Identification Evaluating Melodic Encodings for Use in Cover Song Identification David D. Wickland wickland@uoguelph.ca David A. Calvert dcalvert@uoguelph.ca James Harley jharley@uoguelph.ca ABSTRACT Cover song identification

More information

DETEXI Basic Configuration

DETEXI Basic Configuration DETEXI Network Video Management System 5.5 EXPAND YOUR CONCEPTS OF SECURITY DETEXI Basic Configuration SETUP A FUNCTIONING DETEXI NVR / CLIENT It is important to know how to properly setup the DETEXI software

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

RECOMMENDATION ITU-R BT (Questions ITU-R 25/11, ITU-R 60/11 and ITU-R 61/11)

RECOMMENDATION ITU-R BT (Questions ITU-R 25/11, ITU-R 60/11 and ITU-R 61/11) Rec. ITU-R BT.61-4 1 SECTION 11B: DIGITAL TELEVISION RECOMMENDATION ITU-R BT.61-4 Rec. ITU-R BT.61-4 ENCODING PARAMETERS OF DIGITAL TELEVISION FOR STUDIOS (Questions ITU-R 25/11, ITU-R 6/11 and ITU-R 61/11)

More information

The National Traffic Signal Report Card: Highlights

The National Traffic Signal Report Card: Highlights The National Traffic Signal Report Card: Highlights THE FIRST-EVER NATIONAL TRAFFIC SIGNAL REPORT CARD IS THE RESULT OF A PARTNERSHIP BETWEEN SEVERAL NTOC ASSOCIATIONS LED BY ITE, THE AMERICAN ASSOCIATION

More information

Jazz Melody Generation and Recognition

Jazz Melody Generation and Recognition Jazz Melody Generation and Recognition Joseph Victor December 14, 2012 Introduction In this project, we attempt to use machine learning methods to study jazz solos. The reason we study jazz in particular

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

Television Stream Structuring with Program Guides

Television Stream Structuring with Program Guides Television Stream Structuring with Program Guides Jean-Philippe Poli 1,2 1 LSIS (UMR CNRS 6168) Université Paul Cezanne 13397 Marseille Cedex, France jppoli@ina.fr Jean Carrive 2 2 Institut National de

More information

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach Song Hui Chon Stanford University Everyone has different musical taste,

More information