Towards Auto-Documentary: Tracking the evolution of news in time

Size: px

Start display at page:

Download "Towards Auto-Documentary: Tracking the evolution of news in time"

Melvyn Casey
5 years ago
Views:

1 Towards Auto-Documentary: Tracking the evolution of news in time Paper ID : Abstract News videos constitute an important source of information for tracking and documenting important events. In these videos, news stories are often accompanied by short video clips that tend to be repeated during the course of the event. Automatic detection of such repetitions is essential for creating auto-documentaries. In this paper, we propose methods for detecting and tracking the evolution of news over time. Duplicate video sequences are detected by matching consecutive key-frames of the news video. The duplicate sequences that correspond to commercials placed in between the news are then detected and removed. The remaining duplicate video sequences are assumed to correspond to threads of news. As an alternative approach, we proposed a method for automatic detection of logo images used by the channels to mark news stories. Finally, we use the news transcripts to create topic clusters and compare these clusters with the duplicate sequences detected with the proposed methods. Experiments are carried on the TREC-VID data set, consisting of hours of news videos from two different channels, and the results are reported. I. INTRODUCTION News videos constitute an important source of information for tracking and documenting important events. These videos record the evolution of a news story in time and contain valuable information for creating documentaries. Automated tracking of the evolution of a news story over the course of an event can lead to a rough summarization of the event for auto-documentation. Although it is common to use the text for tracking related stories [], the visual content is often ignored. In news videos, news stories are often accompanied by short video sequences that tend to be used over and over during the course of the event. A particular video sequence can be used again with some modifications and/or additions either as a reminder of the current story or due to a lack of video material for the current story. Also, there is a tendency to repeat the important news of the day at some other time inside the same news program. For example, CNN advertises important news of the day under the caption of Top Stories at the onset of a news bulletin. The tendency of news channels to re-use the same video sequences can be used to track news stories by detecting the duplicate video sequences. Automatic detection of such repetitions can be used to detect and track important news stories. In this paper, we propose methods for tracking the evolution of news over time from actual news videos. The next section describes the data set and features used in our study. We present the method for detecting duplicate video sequences in Section III. In Section IV, we describe a method for the detection and removal of commercials from news videos. Section V describes the proposed approach for automatic detection of repeating news stories. Logo images used by the channels to mark news stories are used as an alternative approach for tracking news stories as will be explained in Section VI. Section VII presents results on how the topic clusters created from news transcripts can be used to compare the results obtained from the detection of duplicate video sequences. Finally, we conclude in Section VIII and discuss future lines of research. II. DATA SET AND INPUT REPRESENTATION In this study, the experiments are carried out on the data set provided by the content-based video retrieval track (TREC- VID) of the Text Retrieval Conference TREC []. The data set consists of hours of broadcast news videos ( thirty minutes programs) from ABC World News Tonight and CNN Headline News recorded by the Linguistic Data Consortium from late January through June 99. The common shot segmentations, defined by TREC-VID, are used as the basic units. One key-frame is extracted from each shot. In total, there are and shots from ABC and CNN videos respectively. On the average, videos for a single day contain shots in ABC and shots in CNN. Each key-frame is described by a set of features. The average and standard deviation of HSV values obtained from a grid ( features) are used as the color features. The mean values of twelve oriented energy filters (aligned uniformly with degree separation) extracted from a grid ( features) represent the texture information. Canny s edge detector is used to extract edge features from a grid. Schneiderman s face detector algorithm [] is used to detect frontal faces. The size and position of the largest face are used as the face features ( features). All the features are normalized to have zero mean and unit variance. III. DETECTING DUPLICATE SEQUENCES We define the video sequences that have similar consecutive key-frames as duplicate sequences. Due to shot segmentation, the same piece of a video can have different number of shots, and the key-frames selected from each shot may slightly differ. Also, due to the montaging process there may be slight modifications when a piece of video is re-used. Therefore, the same video or story may look like two different sequences.

9 9. 9. 9................... (a) (b) Fig.. Due to shot segmentation, the same piece of a video may have different number of shots, and the key-frames selected from each shot may differ.

In (b), the lengths of the sequences are same, but there are missing key-frames in both of the sequences.

With our definition, duplicate sequences are the sequences that share identical or very similar consecutive key-frames where some missing key-frames allowed.

The number of shots can be different due to missing shots in one of the sequences as in (a), or although the lengths of the sequences are the same, the shots may be different as in (b).

However, due to large amount of data, using I-frames is not feasible and this system works only for detecting identical video segments.

The proposed method first detects candidate repeating key-frames (the key-frames that have matching pairs) and then constructs the longest sequence that have consecutive similar key-frames where

In the following sections, we will first explain the method to find candidate repeating key-frames by searching the identical or very similar frames using the feature similarities.

2 (a) (b) Fig.. Due to shot segmentation, the same piece of a video may have different number of shots, and the key-frames selected from each shot may differ. For example, in (a) the nd and th key-frames of the top sequence are missing in the bottom sequence. In (b), the lengths of the sequences are same, but there are missing key-frames in both of the sequences. Also, the key-frames can be very similar but not same, as seen with the first and the second matching pairs in (a). With our definition, duplicate sequences are the sequences that share identical or very similar consecutive key-frames where some missing key-frames allowed. In Figure, two example pairs of duplicate sequences are shown. The number of shots can be different due to missing shots in one of the sequences as in (a), or although the lengths of the sequences are the same, the shots may be different as in (b). Furthermore, the key-frames may be very similar but not the same. In [], visual features extracted from I-frames are used to detect repeating news videos. However, due to large amount of data, using I-frames is not feasible and this system works only for detecting identical video segments. We propose a heuristic pattern matching method for detecting the duplicate sequences. The proposed method first detects candidate repeating key-frames (the key-frames that have matching pairs) and then constructs the longest sequence that have consecutive similar key-frames where missing elements are allowed. In the following sections, we will first explain the method to find candidate repeating key-frames by searching the identical or very similar frames using the feature similarities. Then, we describe the method to find the duplicate sequences. A. Finding candidate repeating key-frames Candidate repeating key-frames are defined as the keyframes that have identical or very similar matching key-frames. In [], similar news photographs are identified using iconic matching method which is adapted from []. However, in our case, there may be bigger differences between similar keyframes that may cause problems in iconic matching method (e.g. the text overlays, or large modifications due to montaging process). As defined, a candidate key-frame should have a few duplicates or very similar images, and the rest should be very different. To detect this property, for each image in the (a) (b) (c) (d) (e) Fig.. Top: Key-frame images, middle: distances to most similar images, bottom: derivatives. Red lines show the medians of the derivatives. Since there is a big gap between the most similar image and the others (a) and (b) are candidate repeating frames. (a) has only one duplicate, where (b) has similar key-frames. The key-frame shown in (c) repeats itself, but it is very frequent. Therefore, it is not chosen as a candidate. The key-frame in (d) is a regular news story and does not have duplicates. data set, we find the most similar N images using the feature similarities. There are news videos in each of ABC and CNN data sets. We assume that the same video sequence is not shown in all videos and choose N as. This threshold value eliminates the common scenes for a TV channel that are shown in almost all the news programs, which are analogous to the stop-words in text.(e.g. Headline News logo in CNN, sport logos, whether news, etc. ) In Figure, for some selected key-frame images the distances to the most similar images are shown in sorted order. If an image repeats itself k times, then there should be a jump in the similarity values after k images. In the figure, the jump indices show that the images in (a) and (c) have single similar images, and the key-frame in (b) has similar images. The image shown in (d) is a common scene for weather news and therefore repeats almost in all news programs. Since we consider most similar images, the jump is not seen for this image. The image in (e) is from a regular news story. Therefore, it doesn t have duplicates and the jump is not obvious. For these examples, the images in (a)-(c) should be candidate repeating frames since they have a jump which is obvious. In order to catch this property, we take the derivatives of the similarity values, as shown at the bottom part of Figure. Then, we find the median of these values. The images are assigned as candidate repeating key-frames if the ratio between the largest value and meadian value is larger than a threshold (for the experiments the threshold is chosen as ). This process chooses the images in Figures(a)-(c) as candidate repeating key-frames and eliminates the rest. B. Finding Duplicate Sequences Due to the errors in shot segmentation, similar sequences cannot be directly found by matching the consecutive candidate key-frames. This is because in between two matching candidate key-frames, there may be other key-frames that do not have any matching images. If we skip the non-candidate key-frames, and continue with the rest, then there is a chance to find a sequence which will cover the missing ones.

3 Definitions: C: set of candidate key-frames K: set of all non-candidate key-frames similar(c): list of similar key-frames for c k c : set of key-frames following a candidate key-frame c length(k c) Let S = c k c c k c c, and S = c k c c k c c S and S are duplicate sequences if i, c i similar(c i) Algorithm: for all c C for all c similar(c ) S = {c } S = {c } for all c i+ where c i+ neighbor(c i ) if c i+ similar(c i+) insert(s, k(c i), c i+) insert(s, k(c i), c i+) else break Fig.. Algorithm for detecting duplicate sequences. To detect matching sequences, the candidate key-frames are chosen as the starting point. For each candidate key-frame, the list of similar key-frames are searched. The matching candidate key-frames are taken as the first elements of a possible matching sequence pair. Then, consecutive key-frames of these candidates are examined to find the longest matching sequence. The sequence is expanded only if there are other matching key-frames in the close neighborhood. If such other matching pairs are found among the consecutive frames, they are inserted as the new elements of the matching sequences. The key-frames that resides in the interval in-between two inserted elements are also inserted to the sequences. This process repeats itself until no matching pairs are found. This is performed for each candidate key-frame in the data set. The algorithm is given in Figure A shorter sequence may be shown consecutively due to lack of video or anchor and reporter may follow each other which will result in a consecutive repeating sequence. These sequences are not considered as duplicate sequences, since they are inside the same news story or same commercial. To eliminate these sequences a threshold value is set and only the sequences that repeats themselves with a period longer than this threshold is chosen as duplicate sequences. IV. DETECTING AND REMOVING COMMERCIALS In news videos, commercials are often inter-mixed with news stories. For an efficient retrieval and browsing of the news stories, detection and removal of commercials are essential. Although, some studies [], [] used black frames to detect commercials, such approaches fail for the videos of TV channels that do not use black frames to flag commercial breaks. Color properties of commercials can also be used for detection, however this approach is liable to generate too many TABLE I COMMERCIAL DETECTION RESULTS FOR MOVIES ( KEY-FRAMES) ON CNN DATA. THE TRUE NUMBER OF COMMERCIALS IS 9. num-detected tp fp fn tn candidates 9 9 sequences pruned 9 false positives, a property that is undesirable for our task due to the danger of removing important news stories. The commercials tend to be repeated several times during news programs, and can be detected as duplicate sequences by the method proposed in the previous section. We propose to detect commercials based on the following observations. First, the commercials have a tendency to have longer duplicate sequences than the news stories. This is due to the fact that rapid scene changes during commercials causes frequent shotbreaks. Second, commercials are usually presented in groups. Therefore, the neighborhood information can be used to correct or prune the detection results. We first attempt to distinguish commercials from other repeating sequences based on their sequence lengths. Sequences with lengths greater than a threshold value T (chosen as in our experiments) are predicted to belong to commercials. Assuming that commercials are not repeated during a single news program, we look for repetitions that are at least key-frames (corresponding to approximately half the duration of a news program) apart. A key-frame that is surrounded by key-frames flagged as commercial is likely to belong to a commercial. Also, if a key-frame is the only one to be flagged as a commercial, it is likely to be wrong. We use a smoothing process, which assigns to each key-frame the dominant value over a window. Then this is used to prune the results. Table I and II present the results obtained from movies from CNN and ABC respectively. The first rows show the results when only the candidate key-frames are selected as commercials. Then, these candidate frames are grouped to obtain the duplicate sequences including the extra frames inbetween two candidate frames. Second rows correspond to the results corrected by finding the duplicate sequences which have length longer than key-frames. Finally, the results are further pruned by considering the neighborhood information as shown in the third rows. In CNN, there are key-frames in all movies, and 9 of them are correct commercials. With the proposed system of them are predicted correctly. In ABC, there are key-frames in all movies, and of them are correct commercials. With the proposed system of them are predicted correctly. We observed that in ABC among the missing ones of them belongs to a single commercial segment. Among the all key-frames in the whole data, 9 key-frames in ABC and 9 key-frames in CNN are detected as commercials and removed from the data.

TABLE II COMMERCIAL DETECTION RESULTS FOR MOVIES ( KEY-FRAMES) ON ABC DATA. THE TRUE NUMBER OF COMMERCIALS IS.

administration officials are reacting cooly to baghdad s latest offer to open presidential Palaces to international weapons inspectors CNN headline news //99 Russian president boris yeltsin has

The russian parliament s lower house now has one week vote on the nomination.

4 TABLE II COMMERCIAL DETECTION RESULTS FOR MOVIES ( KEY-FRAMES) ON ABC DATA. THE TRUE NUMBER OF COMMERCIALS IS. num-detected tp fp fn tn candidates 9 9 sequences 9 pruned CNN headline news //99 white house says time is running out for iraq to avoid military strike. administration officials are reacting cooly to baghdad s latest offer to open presidential Palaces to international weapons inspectors CNN headline news //99 Russian president boris yeltsin has nominated a new prime minister He announced today he wants acting prime minster sergei kiriyenko to take over the post permanently. The russian parliament s lower house now has one week vote on the nomination. Yeltsin is threatening to disband the duma if it doesn t approve the -year-old kiriyenko Yeltsin dismissed his entire cabinet monday without warning Fig.. days. CNN headline news //99 iraq is again offering to allow a limited number of u.n. weapons inspectors into eigth presidential sites. The plan is giving inspectors two months to search the areas. The united states is demanding full access by u.n. weapons inpectors to all sites. The same story is used in two news stories with a period of seven CNN headline news //99 And russian president boris yeltsin nominated acting prime minister sergei kiriyenko to take over the post permanently. Yeltsin is threatening to disband parliament if lawmakers don t approve his choice Fig.. A news story from CNN headline news on //99. First, the story is presented in full length, then at the end of the news program a summary is repeated with a title Top stories. V. THREADS: REPEATING NEWS STORIES The evolution of news stories in time can be tracked by finding the threads - the repeating news videos. It is observed that, especially in CNN news, part of the video sequence for the important events is commonly used as a self-advertisement or as a reminder, with the text overlays ahead, later, or top stories. An example of this type of re-use of video material is shown in Figure. Detection of such duplicate sequences are important, since the shorter sequences are given as the summary of the whole event which can be very helpful for automatic-summarization of the news programs. More interestingly, the same video sequence can be used over time to show the related news stories that continue over a period as in the example shown in Figure. Tracking those sequences may provide more efficient retrieval of the important news videos since the related stories can be extracted all in once. After the removal of commercials, the remaining key-frames are processed to obtain the duplicate video sequences that correspond to threads - repeating news stories. Video sequences that repeat themselves after key-frames and that have length greater than or equal to one are detected as repeating stories. With the proposed method, 9 sequences in CNN and sequences in ABC are detected as duplicate sequences. The length of the detected duplicate sequences varies from - as shown in Figure. It is observed that CNN has a tendency to use longer sequences later in some other stories. Having a large amount of single frames in ABC shows either it is common to re-use only a small part of the previous video material or the order of the sequences are changed. The period of re-using the same video material also varies. Figure shows the periods to repeat the same sequence for both CNN and ABC. The shorter periods usually correspond to advertising a story during the same day s news. The example shown in Figure is an example of this type. It is an example from CNN where a summary of the current day s important event is given as Top stories. With the proposed method, the sequence with four key-frames are detected as a repeating sequence. The longer periods correspond to stories that repeats after a few days. Therefore, those are the interesting ones. Usually, the same sequence is used in a following story to remind the past events. Figure show an example of this type. It is seen that the same sequence is used by a period of one week to represent the following stores. In this example, the detected duplicate sequence consists of two key-frames. VI. DETECTING DUPLICATE LOGOS Another helpful property of news programs for finding the related stories is the re-use of the same logo - the small graphics or picture that appears on the screen along with the anchor person. There is a tendency to use the same logo for related stories, or to show the evolution of a story in time. We are especially interested in finding the similar logos which appear on different dates, which may be used as a hint for connecting the coverages of an ongoing news event. Figure shows an example logo which is used for different news stories

pattern length pattern length (a) (b) Fig.. Lengths of the duplicate sequences (a) for CNN and (b) for ABC.

Usually, the repeated news are the important ones.

Average number of shots in for an half hour CNN news video is around.

Similarly the sequences with a period longer than are the important ones for ABC. about tornados in different days.

Florida governor lawton chiles is leaving washington today to tour the area.

The first two rows show the correct detection results. The last row shows the false positives.

5 num stories num stories TABLE III DETECTING ANCHOR-LOGO FRAMES: SELECTED IMAGES WITH LOGOS, AND RANDOM IMAGES WITHOUT LOGOS ARE USED FOR TRAINING. IMAGES WITH LOGOS, AND 9 IMAGES WITHOUT LOGOS ARE USED AS THE HELD-OUT TEST DATA. THE NUMBERS SHOW THE CORRECT MATCHES FOR THREE DIFFERENT METHODS. pattern length pattern length (a) (b) Fig.. Lengths of the duplicate sequences (a) for CNN and (b) for ABC. Pattern length for the repeated stories varies from to. logo-training nonlogo-training logo-test nonlogo-test -NN k-nn (k=) 99 9 mahalanobis 9 9 x x..... period period story story Fig.. In broadcast news, there is tendency to use the same video footage for the stories that follow each other. Usually, the repeated news are the important ones. For the repeating stories found by the proposed method, the periods (the time that the same story is re-used) are shown (a) for CNN and (b) for ABC. Average number of shots in for an half hour CNN news video is around. This means that the movies that have periods longer than shots are shown in different days. Therefore we can consider them as important events. Similarly the sequences with a period longer than are the important ones for ABC. about tornados in different days. Our goal is to detect identical or very similar logos to find these dependencies between news stories. We make use of the iconic matching method [], [] for finding the matching logo pairs. Before finding the identical or very similar logos by iconic matching, we first select the anchor-logo frames from the news reports. Anchor-logo frames are the frames that have both the anchor person and a logo side-by-side. For the experiments, we use only the CNN news where the logo appears at the right. After the detection of anchor-logo frames, the region that corresponds to logos are //99 The death toll in central florida is climbing. Authorities now say at least 9 people are dead after several tornadoes touched down overnight. Florida governor lawton chiles is leaving washington today to tour the area. //99 Dozens of tornadoes have left their mark from michigan to massachusetts. A band of powerful thunderstorms ripped through new england yesterday. Fig.. The same/similar logo is used in different days to present different/related stories about tornados. Fig. 9. Some of the images which are classified as anchor-logo frames using nearest neighbor method are shown. The first two rows show the correct detection results. The last row shows the false positives. cropped and among all of the logo regions matching logos are paired. A. Detecting Anchor-Logo Frames In order to detect the anchor-logo frames, frames with a logo are labeled manually as positive examples, and frames without logos are chosen randomly as negative examples. Three methods are tried to find the images with logos using this training set. Build two clusters for negative and positive examples respectively, and assign the test images to the closest cluster center using the mahalanobis distance Assign the test images to the label of the nearest training example Assign the test image to the dominant label of k nearest neighbors where k= As Table III shows, the best score is obtained by the nearest neighbor method. Since we want as many images with logos as possible, false positives are better than false negatives. Figure 9 shows some of the images detected as anchor-logo frames. First two rows show the correctly labeled frames and the last row shows some of the incorrect results. As it may be seen, the errors are due to the similarity of the images to some logo images. Overall, images are detected as anchor-logo frames where of them are correct. B. Finding Duplicate Logos using Iconic Matching After having a set of anchor-logo frames, logos are cut-off from the predefined upper-right corner of these frames. The

6 s s frequencies medal Japan US logo pairs Fig.. Frequency of logos (number of times that the same logo is used). Most of the logos are repeated only once. There are only three logos that are repeated over times. Fig.. The same logo is used over time to present stories that are similar, or that follows each other. For some selected logos, the time period that the logo used is shown. Some events occur in a short period, such as GM strike or Medals, and some of the events has longer periods, such as Clinton investigation. logos are re-sampled to the size of -by- to facilitate the following steps as given in []. From each of the logos, we compute sets of the -Dimensional Haar coefficients, one for each of the RGB channels, of the pixel values of these logos. The RGB values are in the interval [,]. We keep coefficients which located at the upper-left corner of the transform domain and construct a representating feature vector of a logo. The coefficients we kept are the overall averages and the low frequency coefficients of the three channels. Finding matching logos is a similarity search based on the feature vectors of the logos. We consider two logos are matched, if more than coefficients in their feature vectors have differences smaller than some thresholds ( for the first three overall averages, and for the rest of the coefficients). Among images detected as anchor-logo frames, for of them a matching pair is found. The number of distinct logos is. Figure shows, number of times that the same logo is used for each of these distinct logo clusters. The period of time that a same logo is used is different for different stories. As Figure shows the timeline for a news story may lie into a large period as in Clinton Investigation story, or it may stay important only for a few days as in GM strike and Medals stories. Fig.. The bipartite graph The graph shown is G = (V S V W, E), where shot-nodes V S = {s, s } and word-nodes V W = {medal, Japan, US}. The shot s is associated with the transcript words medal and Japan, while s is associated with the transcript words Japan and US. VII. AUTOMATIC TOPIC ASSIGNMENT What is the story of the repeating pattern we found? Could we find repeating segments of similar stories, semantically? In this section, we try to answer these questions by relating the shots (key-frames) and the topics (words) of continuing stories in the news programs. A continuing story may use some particular words repeatedly in the transcript everytime the story is reported in the news while the key-frames of a story in different shots may differ. For example, the shots of the Winter Olympic Games may be different in different reports, but certain words such as medal, gold and olympic may appear in all these shots. On the other hand, as the story evolves, the word usage might gradually change. But same key-frames may appear again to remind the audience about the development of the story. For example, the picture of President Clinton with Monica Lewinsky may appear again and again, even the transcripts in the shots are changed focusing on the new findings from the investigation. A. Co-clustering We model the problem of finding evolving stories as a clustering problem, where the shots (key-frames) and words are grouped into clusters based on the shot-word co-occurrences. Given a video clip of N shots (key-frames) and a vocabulary of M words containing all the words used in the transcript, we partition the transcript accordingly, with respect to the shots using the off-the-shelf techniques [9]. After partitioning the transcript, each shot is associated with a set of words. For example, a shot of the Winter Olympic Games may be associated with words medal, gold and so on. We build a bipartite graph G = (V, E), where the nodes V = V S V W, the shot-nodes V S = {s,..., s N } is a set of nodes of shots, and the word-nodes V W = {w,..., w M } is a set of nodes of words in the vocabulary. An edge (s i, w j ) is included in the edge set E, if the word w j appears in the transcript of the shot s i. For example, if a video clip has two shots, the first shot is about 99 Nagoya Winter Olympic Games with words medal and Japan, and the second is about economy with words Japan and US. The vocabulary is {medal, Japan, US}. The corresponding graph G is shown in Figure. Given the desired number of groups K (equivalently, number of stories) that we want to discover from the bipartite graph G, we apply the spectral graph partitioning technique [] to

Labels: Labels: First thread of the pair Second thread of the pair Fig.. Total number of subgraphs is set at K = 9.

According to the Wall Street Journal, the fed has abandoned its neutral stance, and is concern about the continuing strength of the nation s economy, and the failure of the

But that is not much comfort to the stock and bond markets today.

Most economists believe that no change in interest rates is likely today, though a rate hike is possible later in this year.

constrained to have similar size (i.e., similar number of nodes in each subgraph).

Shots and words belong to the i th subgraph are labeled as i.

The idea is to use the result obtained from the coclustering. For a thread pair T = {(s,..., s m ), (t,.

.., c m ), (d,..., d n )}. Note that there are duplicates among c i s and d j s, since two shots s and s can have the same cluster label.

The story is about the Federal Reserve s decision on interest rate.

ginsburg deposition vernon affair oprah winfrey cattle source intern white deputy lindsey immunity aide adviser subject testimony subpoena courthouse privilege conversation

Some words from other clusters : cluster - president clinton investigator scandal assault, cluster - bill official campaign jones lawsuit, cluster - court supreme document

7 Labels: Labels: First thread of the pair Second thread of the pair Fig.. Total number of subgraphs is set at K = 9. Story of the first thread: The federal reserve is now leaning to raise interest rate. According to the Wall Street Journal, the fed has abandoned its neutral stance, and is concern about the continuing strength of the nation s economy, and the failure of the Asian economy crisis to help slow things down. However, the journal said any hike rate is not expected to come until after the Fed s next meeting on May 9th. But that is not much comfort to the stock and bond markets today. Story of the second thread: Meanwhile, all eyes on are on the federal reserve, which is holding its policy meeting today in Washington. Most economists believe that no change in interest rates is likely today, though a rate hike is possible later in this year. Labels: Labels: Labels: Labels: partition G into K subgraphs, where the number of edges bridging from one subgraph to another subgraph is minimized, and each subgraph is constrained to have similar size (i.e., similar number of nodes in each subgraph). Each subgraph is considered to be a story (or multiple similar stories), consisting of the shots (shot-nodes) and the words (word-nodes) belong to the subgraph. Shots and words belong to the i th subgraph are labeled as i. We are interested in automatically identifying the content of the thread pairs and the logo clusters we detected in the previous sections. The idea is to use the result obtained from the coclustering. For a thread pair T = {(s,..., s m ), (t,..., t n )}, where s i s and t j s are shots of the two thread members in T. We first look up the cluster labels of these shots and have a cluster label pair C(T ) = {(c,..., c m ), (d,..., d n )}. Note that there are duplicates among c i s and d j s, since two shots s and s can have the same cluster label. Let the most frequent label shared by the threads in the pair be c. We would describe the content of the thread pair by the words of the cluster with label c. Similarly, for a logo cluster L = (s,..., s m ), where s i s are shots. We look up the cluster labels of s i s and have a cluster label sequence C(L) = (c,..., c m ). Let the most frequent label in C(L) be c. We would describe the content of the thread pair by the words of the cluster with label c. Figure shows an example thread pair. The story is about the Federal Reserve s decision on interest rate. The words automatically chosen to describe this thread pair (cluster ) are income economy company price consumer bond reserve investment motor bank bathroom chrysler credit insurance cost communication steel airline telephone microsoft strength, which reflect the story content quite well. Figure shows an example logo cluster. The story is about the Lewinsky scandal. The words automatically chosen to describe this logo cluster (Cluster ) contains words which reflect the story content very well, including the names of the main people involved such as monica, lewinksy, paula and starr. The other clusters also have related words about the scandal. Fig.. Related stories with the same Clinton Investigation logo. The number of subgraphs is set at K =. The most common cluster is cluster, which includes the following words: brian monica lewinsky lawyer whitewater counsel jury investigation paula starr relationship reporter ginsburg deposition vernon affair oprah winfrey cattle source intern white deputy lindsey immunity aide adviser subject testimony subpoena courthouse privilege conversation mcdougal showdown turkey. Some words from other clusters : cluster - president clinton investigator scandal assault, cluster - bill official campaign jones lawsuit, cluster - court supreme document evidence. B. Measuring Coherence We design a metric which we called coherence to measure the goodness of the labeling of the thread pairs and the logo clusters using the coclustering result. Intuitively, the coherence measures the degree of homogeneity of the cluster labels assigned to a thread pair or a logo cluster. Definition : (Logo cluster coherence) Let L = (s,..., s m ) be a logo cluster of m shots (s i s). The cluster labels assigned to the shots in L are C(L) = (c,..., c m ). Let c be the most frequent label value in C(L). The logo cluster coherence H logo is defined as m i= H logo = I(c i == c ), m where the function I(p) =, when the predicate p is true, and I(p) =, otherwise. Note that the range of H logo is [ m, ]. Definition : (Thread pair coherence) Let T = {(s,..., s m ), (t,..., t n )} be a thread pair consisting of two threads of shots (s i s and t j s). The cluster labels assigned to the shots in T are C(T ) = {(c,..., c m ), (d,..., d n )}. Let e be the most frequent label value shared among labels c i s and d j s. The thread pair coherence H thread is defined as m i= H pair = I(c i == e ) + n i= I(d i == e ), n + m where the function I(p) =, when the predicate p is true, and I(p) =, otherwise. Note that the range of H pair is [, ]. Table IV reports the average of the coherence values of all logo clusters we collected from the CNN set. The base value shown in Table IV is i= m i, where m i is the size of

8 TABLE IV H l ogo : LOGO CLUSTER COHERENCES. BASE COHERENCE MEASURE WHICH INDICATES THE WORST POSSIBLE COHERENCE IS.9. Random avg and std CORRESPONDS TO VALUES OF RANDOMLY GENERATED GROUPS. K= K= K= K= K=9 H logo random (avg) random (std)..... the i-th logo pair. The base value indicates the worse degree of coherence the data set could get. The proposed labeling gives at least half (in average) of the shots in a logo cluster the same label. This shows a good degree of coherence between the visual features (on which a logo cluster is formed) and the story content, and our method captures this automatically. As expected, having K = subgraphs (story topics) gives the highest coherence, since it has the least diversity on labels. However, the coherence value remains stable as K increases, which is good, for it is hard to select a correct K in practice. We compare the results with the groups that consist of randomly selected shots. The results show that, the coherence of logo groups are significantly better than random groups. Table V reports the average thread pair coherence values of all thread pairs we collected from the CNN set. In the table, we also show the single thread coherence (denoted as H thread ), which is the coherence value of a thread in a thread pair, and it is defined similarly to that of the logo cluster coherence. Each thread, which is essentially a list of shots, is viewed as a logo cluster and the corresponding logo cluster coherence is computed as its single thread coherence. The single thread coherence is above %, which indicates a great degree of coherence among shots in a thread (which are usually consecutive shots in the video clip). The proposed labeling method assigns same label to shots in the two parts of a thread pair only about one-tenth of the time. This shows that a great deal of difference exists in transcript words as an event evolves. This may also due to our coclustering algorithm which provides a hard clustering among the words. We are currently extending our work to soft clustering algorithm to try to inprove this labeling performance. Also, as shown in Figure, although the cluster numbers are different the clusters associated with the repeating stories have very similar words. Our strict coherence measure is unable to catch these similarities. When such similarities between words are captured, it is easier to observe the overlaps in the topics for the repeating stories. VIII. DISCUSSIONS AND FUTURE WORK The tendency to re-use the same video material allowed us to detect and track important news stories by detecting visual patterns (duplicate video sequences and logos) and semantic patterns (topics). The duplicate video sequences are detected with a heuristic pattern matching algorithm and same logos TABLE V THREAD PAIR COHERENCES AND SINGLE THREAD COHERENCES K= K= K= K= K=9 H pair H thread..... are detected using the iconic matching method. The proposed method for finding the related word clusters for logo and thread pairs show that detected pairs are semantically coherent. With the proposed approach, threads are found by searching the duplicate sequences which have consecutive similar patterns. This approach captures the sequences that are used again in a shorter form but unable to capture the modifications due to montaging process such as changing the position of the video parts. Instead of duplicate sequences, detection of duplicate bag of key-frames could solve such problems. Commercials are distinguished from the repeating news stories by the sequence length and neighborhood information. Including audio and transcripts will help to differentiate them better, since in commercials the other material will also be duplicated but not in the news stories. The evolution of news stories in time are important for creating documentaries automatically. With the proposed methods, it is possible to track the stories with similar visual or semantic content inside a single TV channel. Same news story may also be presented in different channels in many different forms with different visual and rhetoric styles. This may represent the perspectives of different TV channels, or even the perspectives of different regions or countries. Capturing the use of similar material may provide valuable information to detect differences in perspectives. REFERENCES [] Topic detection and tracking (TDT) Benchmark by NIST, [] TRECVID Guidelines, [] H. Schneiderman, T. Kanade, Object Detection Using the Statistics of Parts, International Journal of Computer Vision,. [] F. Yamagishi, S. Satoh, T. Hamada, M. Sakauchi, Identical Video Segment Detection for Large-Scale Broadcast Video Archives, International Workshop on Content-Based Multimedia Indexing (CBMI ), pp. -, Rennes, France, Sept. -,. [] J. Edwards, R. White, D. Forsyth, Words and Pictures in the News, HLT-NAACL Workshop on Learning Word Meaning from Non- Linguistic Data, Edmonton, Canada, May. [] C. E. Jacobs, A. Finkelstein, D. H. Salesin, Fast Multiresolution Image Querying, Proc. SIGGRAPH-9, pp. -, 99. [] A. Hauptmann, M. Witbrock, Story Segmentation and Detection of Commercials in Broadcast News Video, Advances in Digital Libraries Conference (ADL 9), Santa Barbara, CA, April -, 99 [] S. Marlow, D. A. Sadlier, K. McGeough, N. O Connor, N. Murphy, Audio and Video Processing for Automatic TV Advertisement Detetion, Proceedings of ISSC,. [9] H. Wactlar, M. Christel, Y. Gong and A. Hauptmann, Lessons Learned from the Creation and Deployment of a Terabyte Digital Video Library, IEEE Computer, vol., no., pp. -, February 999. [] I. S. Dhillon, Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning, Proceedings of the Seventh ACM SIGKDD Conference, August.

Towards Auto-Documentary: Tracking the Evolution of News Stories

Towards Auto-Documentary: Tracking the Evolution of News Stories Pinar Duygulu CS Department University of Bilkent, Turkey duygulu@cs.bilkent.edu.tr Jia-Yu Pan CS Department Carnegie Mellon University