Extracting Alfred Hitchcock s Know-How by Applying Data Mining Technique

Similar documents
Reducing False Positives in Video Shot Detection

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Automated extraction of motivic patterns and application to the analysis of Debussy s Syrinx

Principles of Video Segmentation Scenarios

Wipe Scene Change Detection in Video Sequences

Audio-Based Video Editing with Two-Channel Microphone

A Framework for Segmentation of Interview Videos

Name Identification of People in News Video by Face Matching

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Temporal data mining for root-cause analysis of machine faults in automotive assembly lines

Enhancing Music Maps

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Generating Cinematic Camera Shots for Narratives

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT

Hidden Markov Model based dance recognition

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

THE importance of music content analysis for musical

Generation of Video Documentaries from Discourse Structures

SHOT DETECTION METHOD FOR LOW BIT-RATE VIDEO CODING

METHOD TO DETECT GTTM LOCAL GROUPING BOUNDARIES BASED ON CLUSTERING AND STATISTICAL LEARNING

Using DICTION. Some Basics. Importing Files. Analyzing Texts

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Music Radar: A Web-based Query by Humming System

Subjective Similarity of Music: Data Collection for Individuality Analysis

CS229 Project Report Polyphonic Piano Transcription

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Automatic Piano Music Transcription

Computer Coordination With Popular Music: A New Research Agenda 1

Smart Traffic Control System Using Image Processing

Adaptive Key Frame Selection for Efficient Video Coding

Keywords: Edible fungus, music, production encouragement, synchronization

EDDY CURRENT IMAGE PROCESSING FOR CRACK SIZE CHARACTERIZATION

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite

Melodic Pattern Segmentation of Polyphonic Music as a Set Partitioning Problem

Automatically Creating Biomedical Bibliographic Records from Printed Volumes of Old Indexes

Topics in Computer Music Instrument Identification. Ioanna Karydi

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

GCSE FILM STUDIES PAPER 1 EXPLORING FILM SUPERHERO GENRE. 1 hour 30 minutes (20 minutes for DVD screening) 1.4 minutes per mark

Unit Detection in American Football TV Broadcasts Using Average Energy of Audio Track

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

On the Characterization of Distributed Virtual Environment Systems

Retrieval of textual song lyrics from sung inputs

Embedding Multilevel Image Encryption in the LAR Codec

Composer Style Attribution

2. AN INTROSPECTION OF THE MORPHING PROCESS

WATERMARKING USING DECIMAL SEQUENCES. Navneet Mandhani and Subhash Kak

Singer Traits Identification using Deep Neural Network

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

Powerful Software Tools and Methods to Accelerate Test Program Development A Test Systems Strategies, Inc. (TSSI) White Paper.

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

Ligeti. Continuum for Harpsichord (1968) F.P. Sharma and Glen Halls All Rights Reserved

A HIGHLY INTERACTIVE SYSTEM FOR PROCESSING LARGE VOLUMES OF ULTRASONIC TESTING DATA. H. L. Grothues, R. H. Peterson, D. R. Hamlin, K. s.

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction

BEGINNING VIDEO PRODUCTION. Total Classroom Laboratory/CC/CVE

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

Music Segmentation Using Markov Chain Methods

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A New "Duration-Adapted TR" Waveform Capture Method Eliminates Severe Limitations

Processes for the Intersection

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Wise Mining Method through Ant Colony Optimization

Sensor-Based Analysis of User Generated Video for Multi-camera Video Remixing

Gaining Musical Insights: Visualizing Multiple. Listening Histories

Detecting the Moment of Snap in Real-World Football Videos

Audio Structure Analysis

Analysis of Video Transmission over Lossy Channels

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

TRAFFIC SURVEILLANCE VIDEO MANAGEMENT SYSTEM

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

System Level Simulation of Scheduling Schemes for C-V2X Mode-3

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

NETFLIX MOVIE RATING ANALYSIS

Gyorgi Ligeti. Chamber Concerto, Movement III (1970) Glen Halls All Rights Reserved

Color Image Compression Using Colorization Based On Coding Technique

Pattern Smoothing for Compressed Video Transmission

ELEN Electronique numérique

Measurement of overtone frequencies of a toy piano and perception of its pitch

Condensed tips based on Brad Bird on How to Compose Shots and Storyboarding the Simpson s Way

Analysis of Schubert's "Auf dem Flusse" Seth Horvitz

An Integrated Music Chromaticism Model

UC San Diego UC San Diego Previously Published Works

Video summarization based on camera motion and a subjective evaluation method

Design of Fault Coverage Test Pattern Generator Using LFSR

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

DIGITAL NARRATIVE ASSIGNMENT 2

Contents. Welcome to LCAST. System Requirements. Compatibility. Installation and Authorization. Loudness Metering. True-Peak Metering

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

AUTOMATIC LICENSE PLATE RECOGNITION(ALPR) ON EMBEDDED SYSTEM

Correlated to: Massachusetts English Language Arts Curriculum Framework with May 2004 Supplement (Grades 5-8)

6.5 Percussion scalograms and musical rhythm

ITU-T Y Specific requirements and capabilities of the Internet of things for big data

2. Materials Development. 1) Desktop Video Production

Transcription:

Extracting Alfred Hitchcock s Know-How by Applying Data Mining Technique Kimiaki Shirahama 1, Yuya Matsuo 1 and Kuniaki Uehara 1 1 Graduate School of Science and Technology, Kobe University, Nada, Kobe, 657-8501, Japan {kimi, yuya, uehara}@ai.cs.scitec.kobe-u.ac.jp Abstract. Video editing process is the work to produce final video sequences with certain duration by finding and selecting appropriate shots from the video material and connecting them. Then, according to the editor s skills, the video editing differs in qualities from various characteristic patterns. In order to produce excellent videos, this process is generally conducted according to the cinematic rule. Furthermore, in movies, there are several kinds of rhythms associated with contents. We define such rhythms based on the character s appearance. In this paper, we propose the methods of extracting cinematic rules and interesting patterns of rhythms by introducing data mining techniques. Data mining is the technique to discover useful patterns of interest from a vast quantity of data. Finally, we can edit new video by applying the extracted rules to our video editing support system. The resultant video must be produced in similar quality to the video that is used to extract the rules. 1 Introduction The video editing is a work to produce the final video sequences with certain duration by finding and selecting appropriate shots from the original video material and connecting them. In order to produce excellent videos, the selection process is generally conducted according to the specific rules called cinematic rules. The set of cinematic rules is a group of rules to define how the cut connection is made. For example, two cuts cannot be connected to each other, where their shot sizes are greatly different, such as tight shot (TS) and loose shot (LS). Most of the professional movie editors have their own cinematic rules. It is beneficial to have a way to mine their movies and videos for important information or patterns that may be contained within [6]. We will call this task video data mining. The most common way to complete video data mining is to index many different aspects of metadata from the raw (signal) level to the semantic level. Video data mining can be achieved by performing the data mining techniques on the metadata directly. On the other hand, in movies, there are several kinds of rhythms associated with contents. It is easily conceivable that the rhythm of a romantic scene is very slow and that of a suspense scene is very fast. We consider such rhythms as rates of character s appearances. We also extract interesting patterns of rhythms by using the data mining

technique. These patterns represent the experience of a professional movie editor and embody heuristics about shot duration and transitions. In the following sections, we discuss designs of video editing systems that use different levels of video data mining techniques. 2 Video Editing System We present our video editing system developed in [3]. In this section, we will explain our video editing system in the context that it can automatically edit the video material according to the given cinematic rules. First of all, this system chooses the shot from the video material according to the cinematic rules stored in the production memory. Each cut of video material is stored in the MySQL database indexed by the metadata shown in Table 1. The engine of the video editing system is forwardchaining production system, implemented in Prolog. The overview of the system is illustrated in Fig. 1. The interface between production system and MySQL database is implemented in Java. The working memory is a collection of items. The production memory is a collection of production rules. The reasoning control unit determines the conflict set, resolves the conflict and fires the appropriate rule. By repeatedly applying the cinematic rule, this system can automatically find and select the appropriate raw video shots. Fig. 1. Video editing system. Fig. 2 shows the snapshot of the editing process. The shot whose ID is 19 could not be a master shot, therefore the value of Master is 0. The value of Shot is TS. This cut is a zoom cut and the shot size changes into middle shot at the end of the cut. So, Shot, End Shot, and Camerawork become TS (Tight Shot), MS (Medium Shot) and Zoom, respectively. The shot with ID 20 in Fig. 2 represents a restaurant exterior of Scene 1. The shot could be regarded as a master shot. Therefore, the value of Master is 1. The head shot size (Shot) of the cut is LS and this cut is a fixed shot, therefore the shot size at the end (Shot End) of the cut is the same as Shot value, LS.

Fig. 2. Snapshot of the editing process. The first column of the Table 1 shows the name of the metadata. The next column indicates the explanation of the metadata. Examples of the metadata are shown in the last column. Table 1. Table of metadata. Metadata Explanation Example Id Identification number of the cut 19, 20 Scene A stage name where the same action or location is presented Scene1 Master The cut can be a master shot, or not 0, 1 Shot Head shot size of the cut TS, LS Shot End Last shot size of the cut MS, - Camerawork Camera movement fix, zoom, pan Start Frame The head frame number of the cut 1000, 1501 End Frame The last frame number of the cut 1500, 2000 Camera Start Frame number where camera work has started 1250, - Camera End Frame number where camera work has done 1350, - Use The cut had already been used or not 0, 0 Object The objects what is displayed on the screen character, object Direction The direction of character's gaze front, right Motion The motion type of the character displayed on the screen gaze, motion Sound The sound type of the scene speech, cheep 3 Mining Cinematic Rules from Multi-streams of data In movies, an event consists of a sequence of actions. The video editing is the work to make the video more meaningful and effective. So there must be some relations included among a sequence of actions within an event. For example, in the event where a person is looking at some objects from moderate distance (MS), the close-up of the objects (TS) must be selected as the subsequent cut. We regard such relation as having the association (i.e. MS TS) between these 2 cuts. This is so called a cinematic rule. In order to extract cinematic rules, we propose the mining method that focuses on the correlation between the nearby cuts (i.e. sequential multiple cuts) and the correlation of the metadata of each cut. As illustrated in Fig. 3, we formulate the video stream that is indexed by the raw level metadata Shot, Camerawork, Duration (= (EndFrame StartFrame) / 30), Ob-

ject, Direction, Motion and Sound from Table 1. We propose the mining method that can extract the frequent cinematic rules from the multi-streams of data in Fig. 3. Fig. 3. Video stream indexed by the metadata. Our method accepts a set of multi-stream time-series data as input. Each element of the pattern x n = (v 1, v 2,, v 7 ) represents the metadata that can be identified by using our mining method. In order to express the pattern over multiple cuts, it is required to express it as a combination of x n chronological patterns. That is, we denote the pattern over multiple cuts as the form p n = {x 1,, x n } (n > 1). In order to search patterns, we set up a searching window w as shown in Fig. 4. The size of this window is denoted as w s. The searching process starts at position 0 (i.e. Cut No. 0). w s is initialized to 0 and is increased by 1 while the searching process is under way. Increasing the value of w s is equivalent to extending the window size. Whenever extending the window size, it is required to confirm whether the current pattern p i ={x i, y i } satisfies two parameter conditions that determine the importance of the pattern shown in the equation (1) and (2). ap(p i ) = cnt(x i y i ) (1) prob(p i ) = cnt(x i y i ) / cnt(x i ) (2) For the pattern p i, cnt(p i ) represents the number of times the current pattern p i is present in the video stream. ap(p i ) is equal to cnt(p i ). ap(p i ) is essential to assess the frequency of the pattern. We call ap(p i ) appearance. prob(p i ) represents the value of conditional probability that shows the appearance y i when x i appears. prob(p i ) is essential to assess the appearance confidence of the pattern. We call prob(p i ) probability. When confirming the two threshold conditions, the elements that do not satisfy both of them should be disregarded and consequently denoted as -. If the efficient patterns that satisfy the threshold conditions remain, we extend the window size, and continue the searching process. We repeat this process until all elements in x i (i 1) are replaced by -. If the algorithm cannot extend the window size any more, at first w s is initialized to 0. Next, the starting position of searching is shifted to the right, and then the same process is started again. From Fig. 4, the extracted pattern at w s = 3 is denoted as p x ={(LS, fix, t < 6, A, R, G), (MS, -, t < 4, B, L, -), (TS, fix, -, A, R, M)}. Such a pattern is often found in the scene when the first cut is LS, MS must be inserted as the next cut to avoid the rapid transition. This is one of the cinematic rules of zoom-in, illustrated in Fig. 5.

Fig. 4. The process of extending patterns. Fig. 5. The cinematic rule of zoom in over the sequential multiple cuts. As stated above, we can extract the cinematic rules that frequently occur in multistreams of data. In related work [5], the searching method doesn t have the flexibility in many respects. For example, while the method in [5] searches the pattern that completely matches in multi-streams of data, our method has the flexibility to search by setting two threshold conditions so as to extract frequent patterns, and can complete the search quickly. Furthermore, our method has the flexibility to search within a window that can extend its size. So, according to the searching position, we can effectively look for frequently occurring patterns. We have implemented our method to extract cinematic rules. The videos used for this purpose are the movies Madadayo directed by Akira Kurosawa, Star Wars Episode I directed by George Lucas and PSYCHO directed by Alfred Hitchcock. First, we show the results of cinematic rule extraction in Table 2 as applied in Madadayo. From Table 2, we can find that his editing technique doesn t allow connecting to TS (LS TS, MS TS), so the extracted cinematic rules represent the continuously connected pattern (LS LS, LS MS) between LS and MS. Furthermore, the duration of each cut is very long compared with the duration written in video grammar. In this movie, durations of LS and MS are more than 10 and 6 seconds respectively. Table 2. Results of cinematic rule extraction from Madadayo. cinematic rule ap(p i )prob(pi) 1 {(LS, fix, t>10 ),(LS, fix, t>10 ),(LS, -,- ),(LS,-,- )} 126 0.803 2 {(LS, fix, t>10 ),(MS, fix, t>6 ),(LS, fix, t>10 ),(MS,-,- )} 52 0.531 3 {(MS, fix, t>6 ),(LS, fix, t>10 ),(MS,-,t>6 ),(LS, fix, - )} 53 0.914 Next, in Table 3, we show the results obtained by applying our method to Star Wars Episode I. From Table 3, we can find that this movie is well connected be-

tween the same shot sizes (MS MS, TS TS), and is edited in quick rhythm compared with the previous movie Madadayo. In this movie, durations of LS, MS and TS are about 4, 3 and 2.5 seconds respectively. These cinematic rules characterize the speedy development of this movie. Furthermore, since the rapid transition of the shot size such as LS TS, TS LS makes it hard for the viewer to understand what is going on, these cinematic rules are not allowed especially in such a speedy development. In fact, they are not extracted in Table 3. Table 3. Results of cinematic rule extraction from Star Wars Episode I. cinematic rule ap(p i) prob(pi) 1 {(MS,- ), (MS, 2<t<4 ), (MS,- )} 33 0.569 2 {(MS,- ), (MS,- ), (MS, t<2 )} 32 0.314 3 {(TS, 2<t<4 ), (MS,- ), (TS,- )} 33 0.44 4 {(TS,- ), (TS,- ), (TS, 2<t <4 )} 41 0.707 Finally, in Table 4, we show the results obtained by applying our method to PSYCHO. We can say that this video is normally edited like video grammar considering the following factors: Firstly, from Table 4, we can find that this movie is edited efficiently without rapid transition of shot size such as LS MS, MS MS and TS LS. Secondly, durations of LS, MS and TS are about 6, 4 and 3 seconds respectively. Table 4. Results of cinematic rule extraction from PSYCHO. cinematic rule ap(p i ) prob(p i ) 1 {(LS, fix, t < 6 ),(MS, fix, t < 4 ),(LS, fix, t < 6 )} 34 0.642 2 {(LS, fix, t < 6 ),(MS, fix, t > 4 ), (LS, fix, - )} 26 0.897 3 {(MS, fix, t < 6 ),(MS, fix, t < 4 ),(MS, fix, - )} 30 0.75 4 {(TS, fix, t < 3, speech ),(TS, fix, -, speech ),(TS, fix, t > 3, speech )} 30 0.566 5 {(TS, fix, t > 3, speech ),(TS, fix, -, speech ),(TS, fix, -, speech )} 31 0.912 6 {(TS, fix, t < 6, Marion, F, Gaze ),(LS, fix, t < 6 ),(TS, fix, t < 6, Marion, F, Gaze )} 22 0.85 7 {(TS, fix, -, R, Gaze, speech ),(TS, fix, -, L, Gaze, speech ),(TS, fix, t > 3, R, Gaze, speech )} 21 0.955 4 Topic Mining Based on Burst Streams of Character Appearance Usually in movies, there are some editing rhythms associated with contents. Two examples of such rhythms in Alfred Hitchcock s movie PSYCHO, are shown in the two tables of Fig. 6. Each table represents several cuts in a talking scene. In SCENE 1, the woman (Marion) and the man (Sam) are talking to each other while hugging in a hotel room. In SCENE 2, the policeman finds the woman (Marion) who stole a lot of money, sleeping in a car. She is surprised by being found and tries to escape from him. However, the policeman stops her, and the interrogation begins. As we can conclude, SCENE 1 is a romantic scene, while SCENE 2 is a tense scene.

The numeric values in the bottom row of each table represent the length of intervals of Marion s appearance between consecutive cuts. For example, the rate of Marion s appearance in SCENE 2 is clearly more frequent than that in SCENE 1. That is in SCENE 1, the intervals of Marion s appearance are 20.3, 65.3, 45.1 seconds, etc, whereas in SCENE 2, they are 3.7, 10.0, 1.7 seconds, etc. Thus, depending on the type of the scene, rates of character s appearance differ. We call the rate of character s appearance a rhythm. For a character, we define a burst as a rate of his/her appearance. An approach for modeling bursts in e-mail streams has been introduced by [2]. According to this approach, the stream consisting of messages relevant to a single topic of interest is divided into sub-topics by rates of message arrivals (i.e. bursts). The method in Section 4.1 for modeling the character s burst stream is based on a similar approach. We define a topic as the interval where a character appears at the same rate, that is, bursts maintain constant intensity. A rhythm of topics is defined as a transition of that character s topics in a scene. In Section 4.2, we will classify of talking scenes into groups of similar rhythms. Then, we investigate how a talking scene is characterized by rhythms of topics specific for the characters. As a result, we can use the pattern of rhythms for the creation of a new video material. We refer to this approach for extraction of useful patterns as topic mining. 4.1 Character s Burst Stream We index the following metadata for every cut as shown in each table of Fig. 6. Here the information about a cut is represented in a column. StartTime and EndTime (the third entry) are raw level metadata representing the starting point and the ending point of a cut, respectively. SkipTime (the fourth entry) is semantic level metadata that roughly represents a span of time to be found between the events presented in consecutive cuts. In this paper, SkipTime is classified into four levels, that is, continuous, a little distant, distant and very distant. In Fig. 6, all SkipTime values are continuous, because the conversation of Marion and Sam (or the policeman) does not have any gaps between consecutive cuts. Characters names (the fifth entry) represents their appearances occurring in the cut. There are two kinds of time categories in a movie. One is the time devoted to the actual movie as it is presented, time1. Another, time2, is the time semantically spanned by the events presented in a movie. Therefore, the current StartTime and EndTime represent the starting point and the ending point of each cut in time1 (the third entry in Fig. 6), whereas the character actions are reflected in time2. So we must model his/her burst stream in time2. As a rough translation of time1 into time2 for a cut, the amount of SkipTime values from the beginning to the previous cuts of the movie is added to StartTime and EndTime of the current cut (the sixth entry in Fig. 6). For example, in SCENE 2 of Fig. 6, when Marion is caught by the policeman (cut 92), 741 seconds (12 minutes 21 seconds) have passed since PSYCHO started as indicated in time1, but in the overall story 16361 seconds have passed, as reflected in time2.

Fig. 6. Examples of rhythms in a romantic and a tense scene. Based on [2], for cuts where Marion appears, we can obtain her burst stream from time2. Two segments of Marion s burst stream corresponding to SCENE 1 and 2 in Fig. 6 are shown in Fig. 7. A dot represents a value of Marion s burst intensity associated with the interval of her burst. The association of Marion s burst intensity values with the intervals of her appearance is shown in Fig. 7. In contrast to scattered intervals of her appearance like in Fig.6, her burst intensity graph is smoothly plotted. The reason for this is that our method prevents frequent changes of burst intensity values triggered by insignificant small changes in intervals of her appearance. That is, this method assigns a burst intensity to a cut in which Marion appears, only if she appears at a sufficient rate to be associated with the burst intensity. Fig. 7. Two segments of Marion s burst stream (i.e. two rhythms of topics). Since Marion s topic is defined as an interval where burst intensity value keep constant, in SCENE 1, [cut 5, cut 8], [cut 10, cut 11], [cut 12, cut 21] and [cut 23, -]

represent four Marion s topics. Note that, her actions also tend to change at the cut where Marion s topic changes. For example, Marion and Sam talk to each other while hugging, and then they step away for a while, between cut 5 and 8. Then they come close to each other and hug again between cut 9 and 11. Being out of the scene, Marion doesn t appear in cut 9. She appears for the first time again in cut 10. Hence, her topic changes at cut 10. Therefore, regarding her appearance, SCENE 1 can be divided into four topics, each topic corresponding to one semantically relevant action. This method is applicable to any other scene where Marion appears. The two segments in Fig. 7 correspond to two rhythms of topics for Marion. In SCENE 1, Marion s burst stream gradually grows from intensity 3 corresponding to long interval of her appearance, 49.1 seconds, to intensity 7 associated with 12.8 seconds. Her actions turn from hugging Sam into talking to him, resulting in a higher rate of appearance. On the other hand, at the beginning of SCENE 2, Marion s burst stream suddenly grows from intensity 0 to intensity 9 associated with a relatively short interval of her appearance, 6.5 seconds. Consequently, her burst stream remains at high intensity in SCENE 2. In this scene, the quick switch of cuts between Marion and the policeman characterizes the tense conversation. One of the attributes of character s burst stream is namely growth. The above investigation suggests that character s actions in a scene affect on the way of growth of his/her burst stream. In the case of a burst stream for a different character (e.g. Sam), intervals of his appearance associated with burst intensities differ from those of Marion. Hence, we must consider intervals of character s appearance in close relation with burst intensity of the character. Another attribute of character s burst stream is intervals. Both attributes are represented as time sequence. For character s burst stream, the attribute growth is a sequence of burst intensities, and the interval is a sequence of intervals of his/her appearance associated with burst intensities. Two segments of Marion s burst stream in SCENE 1 and 2 (i.e. two rhythms of topics for Marion) are also shown in Fig. 7. 4.2 Clustering Rhythms of Topics We can extract various rhythms of topics for characters that appear in talking scenes in PSYCHO. For each pair of rhythms, we compute similarities on each attribute by using Dynamic Time Warping (DTW) [1]. DTW computes the similarity between a pair of time series by aligning them to achieve a reasonable fit, where each time axis for the time series is stretched (or compressed). A smaller value of similarity means that a pair of time series is more similar to each other. DTW is so efficient to compute similarities on intervals, although it is not effective to compute growth values. A pair of rhythms may grow from different starting points of burst intensity, and we need to know how subsequent growth is similar. For a pair of rhythms, we apply DTW by incrementing the intensity values of either one of the rhythms. When the minimum difference is obtained, we have achieved maximum similarity between the two rhythms. Similarities in growth and intervals for three rhythms are shown in Fig. 8. Two of them correspond to the ones in Fig. 7, which are extracted for Marion in the romantic

and the tense scene. The remaining rhythm for Sam is extracted in the tense scene, where he is talking to Norman who killed Marion, while carefully choosing his words to get some information about her. As shown in Fig. 8, the two rhythms extracted from the tense scenes are similar to each other on both growth and intervals, whereas the rhythm extracted from the romantic scene isn t similar to either rhythm. Fig. 8. (Dis) Similarities for three rhythms of topics. We employ Similarity-Based Agglomerative Clustering (SBAC) [4] to compute clusters consisting of similar rhythms. SBAC provides a unified framework for clustering objects having nominal and numeric attribute values, without making any assumptions of the underlying distributions of the attribute values. We extend SBAC to handle rhythms of topics having two time series attributes (i.e. growth and interval) based on similarities computed by DTW. Here, we describe only the extended part of SBAC (See [4] for detail). For a pair of rhythms, SBAC computes the similarity contribution on each attribute, taking into account the degree of the similarity computed by DTW and the uniqueness of the pair of time series. The uniqueness value is defined as the number of time series, which poses a greater similarity to both time series than the original time series themselves. That is, the uniqueness can be considered as the density of time series encompassed by the pair of time series. For a pair of rhythms, dissimilarity score on each attribute is measured based on the similarity contribution of the pair of time series. The subsequent procedure follows [4]. For the three rhythms above in Fig. 8, dissimilarity values on growth and intervals are shown in Fig. 8. On each attribute, a dissimilarity score is a probabilistic value, and a pair having the value closer to 1 is considered to be more dissimilar. Fig. 8 shows that, the pair of two Marion s rhythms in the romantic and the tense scene is one of the most dissimilar pair regarding growth, because the dissimilarity score is 0.96. For these rhythms, overall dissimilarity scores computed by aggregating the dissimilarity scores on growth and intervals are also shown in Fig. 8. Based on overall dissimilarity scores for all rhythms extracted for characters in talking scenes of PSYCHO, a nested sequence of partitions of rhythms in the form of a dendrogram is obtained. As a result of investigating the rhythms of topics, some interesting patterns that characterize a talking scene are discovered. Fig. 9 shows the outlines of those patterns of rhythms.

Fig. 9. Extracted patterns by applying our topics mining method to PSYCHO. At first, we compare the leftmost pattern with the rest of the patterns. The main difference between them is that a rhythm in the leftmost pattern doesn t contain any topics associated with short intervals of character s appearance, but a rhythm in the remaining certainly contains some topics associated with short intervals. Then it is revealed that the character and rhythm in the leftmost pattern doesn t contain active interaction between his/her talking partners. For example, a character in the leftmost pattern just listens to the conversation of other characters that actively interact with each other. In contrast, a character in the remaining patterns interacts actively with his/her partners in topics associated with short intervals. This annotation can be applied to a character s topic where one semantically relevant action is presented as described in Section 4.1. Furthermore, for the characters that are classified as activity interacting with each other, their topics are associated with short intervals. In that way we can know the transition of characters interactions during a talking scene. In the topics associated with short intervals of character s appearance, the conversation is heated. In such a case, tension grows, and we consider it as the climax of a talking scene. The right-hand three patterns in Fig. 9 are three types of patterns of rhythms to reach the climax in a talking scene. The left pattern of the three has a rhythm represented by a burst stream that hardly grows, and is associated with short intervals. A scene where a character appears in such a rhythm, the climax is in the very beginning. The middle pattern has a rhythm represented by a burst stream that suddenly grows from low burst intensity to a high intensity. A scene where a character appears in such a suddenly intensified rhythm, the climax is quickly reached. We consider that these two types of rhythms are characteristic of the tense talking scenes. The right one has a rhythm represented by a burst stream that gradually grows into high burst intensity. A scene where all characters appear in such gradually intensifying rhythms, reaches its climax without having any strong impact on audience. We consider this scene as calm. In PSYCHO, there is only one calm talking scene (i.e. SCENE 1 in Fig. 8). Accordingly, in order to raise the spectator s tension, in most talking scenes of PSYCHO, some characters must have rhythms similar to the two middle patterns in Fig. 9.

5 Conclusion and Future Work In this paper, we proposed the methods of pattern extraction by using data mining techniques from movies. But some problems and future work still remain. They are the following: In Section 3, we proposed effective searching method. The size of multi-streams of data illustrated in Fig. 4 is very large, so the efficient searching method is needed. The problem of our method is at first to enumerate the number of times of all the possible patterns, so it requires much time to complete it. Then, the Knuth - Morris - Pratt (KMP) and Boyer - Moore (BM) algorithms are the efficient string-matching ones. BM algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch, it uses two functions to shift the window to the right. In our method, if we intend to search the pattern of length m from the strings of length n, it takes O(mn) time complexity. KMP and BM can search in O(n) and O(n/m) time complexity, respectively. So applying these methods to our video streams, we can complete the search more quickly. In Section 4, we proposed a topic mining method. But at present, it is not robust enough in terms of video segmentation. In order to extract a rhythm of topics for a character, we manually divide a movie into scenes. That is, the quality of segmentation depends on the person who manually segments the movie. Furthermore, we described that, by using character s burst stream, the movie can be divided into topics based on that character s perspective, each topic corresponding to one semantically relevant action. However, as it is often the case, there appears several characters in a scene that interact with each other. Therefore, analyzing burst streams for all characters may lead to discovering an effective method for segmentation of the movie into scenes. References 1. D. J. Berndt and J. Clifford: Finding Patterns in Time Series: A Dynamic Programming Approach. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy eds. Advance in Knowledge Discovery and Data Mining, AAAI Press, pages 229-248, 1996. 2. J. Kleinberg: Bursty and Hierarchical Structure in Streams. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 91-101, 2002. 3. M. Kumano, Y. Ariki, M. Amano, K. Uehara, K. Shunto and K. Tsukada: Video Editing Support System Based on Video Grammar and Content Analysis. In Proc. of 16th International Conference on Pattern Recognition (ICPR), pages 346-354, 2002. 4. C. Li and G. Biswas: Unsupervised Learning with Mixed Numeric and Nominal Data. IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 4, pages 673-690, 2002. 5. T. Oates and P. R. Cohen: Searching for Structure in Multiple Streams of Data. In Proc. of 13th International Conference on Machine Learning, pages 346-354, 1996. 6. D. Wijesekera and D. Barbara: Mining Cinematic Knowledge: Work in Progress. In Proc. of the International Workshop on Multimedia Data Mining, pages 98-103, 2000.