Automatic Replay Generation for Soccer Video Broadcasting

Size: px
Start display at page:

Download "Automatic Replay Generation for Soccer Video Broadcasting"

Transcription

1 Automatic Replay Generation for Soccer Video Broadcasting Jinjun Wang 2,1, Changsheng Xu 1, Engsiong Chng 2, Kongwah Wan 1, Qi Tian 1 1 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore {stuwj2, xucs, kongwah, tian}@i2r.a-star.edu.sg 2 CeMNet, SCE, Nanyang Technological University, Singapore jjwang@pmail.ntu.edu.sg, aseschng@ntu.edu.sg ABSTRACT While most current approaches for sports video analysis are based on broadcast video, in this paper, we present a novel approach for highlight detection and automatic replay generation for soccer videos taken by the main camera. This research is important as current soccer highlight detection and replay generation from a live game is a labor-intensive process. A robust multi-level, multi-model event detection framework is proposed to detect the event and event boundaries from the video taken by the main camera. This framework explores the possible analysis cues, using a mid-level representation to bridge the gap between low-level features and high-level events. The event detection results and midlevel representation are used to generate replays which are automatically inserted into the video. Experimental results are promising and found to be comparable with those generated by broadcast professionals. Categories and Subject Descriptors I.5.5 [Pattern Recognition]: Implementation Interactive systems; H.3.1 [Information Storage And Retrieval]: Content Analysis and Indexing Abstracting methods, Indexing methods General Terms Algorithms, Design, Experimentation Keywords Event detection, Sports video analysis, Broadcast, Replay 1. INTRODUCTION The growing appetite for sporting excellence and patriotic passions at both the international level and the domestic club has created new culture and businesses in the sports Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM 04, October 10-16, 2004, New York, New York, USA. Copyright 2004 ACM /04/ $5.00. domain. Sports video is widely distributed over various networks and its mass appeal to large global audiences has led to increasing research attentions on sports domain in recent years [1]. We constrain the following discussions to the domain of soccer as soccer video analysis remains a challenging task due to the loose structure of soccer games. A lot of studies have been done on broadcast or unedited soccer game video, and promising results have been reported [2, 3, 4, 5]. In [4] soccer event detection using unedited game video is attempted. In [5] both raw video image information and postproduction information are utilized and events like Shooting, Yellow/Red Card and Penalty are detected. Most of these mainly focus on semantic annotation, indexing, summarization and retrieval for sports video. They do not address video editing and production such as automatic replay generation and broadcast video generation. Generating soccer highlights from a live game is a timecritical and labor-intensive process. Typically, multiple cameras are installed around the sporting arena and the broadcast director decides which video feed to go on-air. Of these cameras, a main camera that is perched high above the pitch level provides a panoramic view of the game and is often used as the main broadcast view. At sporadic moments in the game that he deems appropriate, the director launches replays of the prior game action. These replays are manually selected by reviewing the log of a particular camera view and selecting an appropriate start and end time to play at a slower than real-time rate. Replay segments are short (15-20 seconds), and must be launched quickly (within 10s of the event). In live broadcast, inserting replays can be a risky decision call by the director, since it trades off the existing live action on the field. It is not uncommon that these replay segments are prematurely cut off in order to return to the live action in the main camera view. There is clearly an opportunity for the automation of such sports highlight moments. Robust detection of highlights is currently achievable via detection of excited commentary and/or visual analysis of ball and goalmouth action. While these technologies may not replace the entire studio crew, it is foreseeable that they can substantially cut down the crew size in the broadcast studio, and to streamline the highlight generating process. In automatic replays, the primary concern is arguably their qualitative assessment, and we seek to address this by robust replay boundary determination via extensive use of broadcast rules and domain knowledge. Our approach is to detect highlights using only the unedited

2 . 2.# #. ' ( )*! "#$$%& * % 1 0*. +,,$,,,-*,. # +/. # ##. / # # * # Figure 1: Framework of the automatic replay generation system main camera video feed. The boundary end-point of replays extracted from this feed can then be used as time-stamp markers to extract the corresponding video feed from the other cameras. Collated this way, it is possible to apply further qualitative assessment as to which video feed to select as the final replay. Even if the option for automatic replay generation is not invoked, the collating and synchronizing of the various video feeds using an appropriate interface will greatly simplify the final replay selection. It is also interesting to note that, with automatic replays, the determination of replay-worthy highlights in the game is no longer the exclusive purview of the broadcast director. With advances in digital TV and set-top-boxes with hard-drive storage, automatic replay based on personalized parameters may be performed at the client-end. With automatic replays, the number of replays would most certainly increase. While not all of these would go on-air via traditional TV channels, they can be streamed via alternative media channels, e.g. wireless video. Hence, the proliferation of replay segments may potentially offer a secondary market of game viewer ship. This paper presents an automatic system to generate replay for soccer broadcasting. The research is challenging due to following reasons: Firstly, it is more difficult to detect events from the unedited main camera video. This type of video contains neither post-production information nor multiple camera views nor commentary information that are available in the broadcast video. Thus fewer cues can be used for event detection. Secondly, soccer event detection is difficult as soccer events do not possess strong temporal structure, i.e. the same semantic event can happen in different situation with different duration. Thirdly, soccer video is noisy low level visual and audio features extracted are often affected by many factors such as audience noise, weather, luminance, etc. Lastly, upon detecting the interesting segment for replay, we also need to locate a suitable time slot for replay which minimizes the view interruption from the main camera. The main contributions of the paper include: 1. Efficient mid-level representations that are suitable to analyze unedited soccer video taken by the main camera are introduced. The accuracy of Play position keyword and Audio keyword is improved as compared with related work; 2. The proposed event detection approach is able to identify not only soccer event but also event boundary, using unedited soccer video as well as broadcast video; 3. An automatic replay generation scheme is presented, and the generated replay is found to be comparable with that generated by human broadcasters. 2. FRAMEWORK Automatic highlights identification is a difficult process as there is no clear relationship between the low-level feature patterns and high-level events. To bridge the large gap between the features and events, we proposed a midlevel representation in our previous work [3]. Here we adopt this idea and propose a three level framework. Previous research results [6, 7, 8, 9] have shown that intermodal collaboration can improve the robustness of the system, e.g. visual and text streams [6], audio and motion [7], caption [8, 9], etc. Similarly, we have also applied available information from different domains for our setup, making it a multi-level, multi-model system. Fig. 1 illustrates our proposed framework. Specifically, the low-level modules extract features from the audio stream, video stream and motion vector field. Here we have assumed that the audio information is available from the video taken by the main camera. These raw feature streams are first analyzed by the mid-level system to generate keyword sequences. The high-level system then combines these midlevel keywords to detect events and their boundaries. Lastly, in our automatic replay generation application, an application level uses the event detection results and mid-level representations to generate replays and inserts them into the output video automatically. In the following sections, the details of the mid-level representation and high-level event detection are presented. As the low-level implementation is straightforward, it will not be discussed. 3. MID-LEVEL REPRESENTATION The mid-level system creates five synchronized keyword sequences from low-level visual, motion and audio features. Details of the keyword feature and associated analysis are listed in Table 1:

3 Table 1: Analysis description ID Description Analysis F 1 Active play position keyword Visual F 2 Ball trajectory Visual F 3 Goalmouth location Visual F 4 Motion activity Motion F 5 Audio keyword Audio 3.1 Visual analysis (F 1, F 2, F 3 ) The visual analysis creates three keywords: F 1, F 2, and F 3. Details of each keyword creation are discussed in the following subsections Position keyword (F 1 ) The F 1 keyword reflects the location of the play in the soccer field. In our implementation, the field is divided into 15 areas (Fig. 2a). Symmetrical regions in the field are given the same labels, resulting in six keyword labels (Fig. 2b). In comparison with [4], which has 12 coarser field regions, our field division is finer with greater precision. (a) 15 areas (b) 6 labels Figure 2: Soccer field model Video from the main camera is used to identify the play region in the field. The raw video will only show a cropped version of the field as the main camera pans and zooms. Previous work [5] implemented field-line detection to identify penalty area position. In our model, as we need to identify play regions spanning the entire field, the following three features are used: 1. Field-line locations, represented in polar coordinates (ρ i, θ i) i = 1, 2,..., N where ρ i and θ i are the i th radial and angular coordinate respectively and N is the total number of lines; 2. Goalmouth location, represented by the central point (x g, y g ) where x g and y g are the X and Y coordinate; 3. Central circle location, represented by the central point (x e, y e) where x e and y e are the X and Y coordinate. To detect the active play region, we proposed a Competition Network (CN) using the three shape features described above. The CN consists of 15 dependent classifier nodes, each node representing one area of the field as illustrated in Fig. 2a. The 15 nodes compete amongst each other, and the accumulated winning node is identified as the chosen region of play. The CN operates in the following manner: at time t, every detected field-line (ρ it, θ it ), together with the goalmouth (x gt, y gt) and central circle (x et, y et) forms the feature vector v i(t) where i = 1...N, N is the number of lines detected at each time t. Specifically, v i (t) is v i (t) = [ρ it, θ it, x gt, y gt, x et, y et ] T i = 1,..., N (1) The response of each node is where r j (t) = N w j v i (t) j = 1,..., 15 (2) i=1 w j = [w j1, w j2,..., w j6 ] j = 1,..., 15 (3) is the weight vector associated with the j th node, j = for the 15 regions. The set of wining nodes at time t is {j (t)} = arg max{r j (t)} j=15 j=1 (4) j Then the accumulated response is computed by R j(t) = R j(t 1) + r j(t) α Dist(j, j (t)) β (5) where R j(t) is the accumulated response of node j, α is a scaling positive constant, β is an attenuation constant, and Dist(j, j (t)) is the Euclidean distance between node j to the nearest instantaneous wining node within the list {j (t)}. A large Dist(j, j (t)) will result in stronger attenuation in Eq 5. To compute the final output of CN at time t, the maximal accumulated response is found at node j # (t) where j # (t) = arg max{r j (t)} j=15 j=1 (6) j If R j #(t) is bigger than a predefined threshold, the value of position keyword F 1 at time instant t is set to j # (t), otherwise it remains unchanged Ball trajectory (F 2) Detected and tracked position of the ball is a strong and direct factor to recognize some events. For example, the relative position between the ball and goalmouth can indicate events such as scoring and shooting. In this paper, the ball trajectories are obtained by a novel trajectory-based ball-detection-and-tracking algorithm presented in our previous work [2]. Unlike the object-based algorithms, this algorithm does not evaluate whether a sole object is a ball. Instead, it uses a Kalman filter to evaluate whether a candidate trajectory is a ball trajectory. We denote the ball trajectory using ID F 2 (Table 1), and F 2 is a two dimension vector stream recording the two coordinates of the ball in each frame Goalmouth location (F 3 ) Besides being used in position keyword model, goalmouth location itself is an important mid-level representation. A goalmouth can be formed by the two goalposts detected, and is expressed by its four vertexes. We denote the goalmouth location found using ID F 3 (Table 1). 3.2 Motion analysis (F 4 ) Motion information has been widely studied for video analysis such as Motion Texture [10], and MPEG-7 intensity of motion activity descriptor [7]. In soccer games, the main camera is always following the movement of the ball, and the camera motion thus provides an important cue to represent the general activity. In our framework, we calculate the camera motion using motion vector field information that is readily available in compressed video. A texture filter is applied to filter out inaccurate motion vectors extracted. Then the algorithm in [11] is used to

4 compute the pan factor p p, tilt factor p t, zoom factor p z of the camera. In addition, the average motion magnitude p m is computed. Thus a motion activity vector is formed as a measure of the motion activity [p z, p p, p t, p m] T. Since the motion information is only available or extracted from P frames, the motion activity vector for I and B frames is set to the last computed P frame value. We denote this motion activity vector stream using ID F 4 (Table 1). 3.3 Audio analysis (F 5) The purpose of the audio analysis is to label each audio frame (20ms in our experiment) with a predefined class. For our purpose, we have three classes: Whistle, Acclaim and Noise and the audio keyword found is denoted as F 5. The classifier used is the Support Vector Machine (SVM) with the Gaussian (RBF) kernel function. As the SVM is a two-class classifier, it is modified and used as one-against-all for our three-class problem. The input audio feature to the SVM is found by exhaustive search from amongst the following audio features tested: Mel Frequency Cepstral Coefficients (MFCC), Liner Prediction Coefficient (LPC), LPC Cepstral (LPCC), Short Time Energy (STE), Spectral Power (SP), and Zero Crossing Rate (ZCR). The best parameters found are a combination of LPCC subset and MFCC subset features. 3.4 Post-processing The first function of post-processing is to eliminate sudden errors in created keywords. The mid-level keywords are actually coarse semantic representations so the keyword value should not change too fast. Any sudden change in the keyword sequences can be considered as an error, and will be eliminated using majority-voting within a sliding window length of w l and step-size w s frames. For different keywords, the sliding window has different w l and w s defined experientially: position keyword F 1 : w l = 25 and w s = 10; ball trajectory keyword F 2: no post-processing is applied as it has been smoothed by Kalman filter; goalmouth position keyword F 3: w l = 12 and w s = 8; motion activity keyword F 4: no post-processing is applied as it is objective from compressed video; audio keyword F 5 : w l = 5 and w s = 1. The second function of post-processing is to synchronize keywords from different domains. Audio labels are created based on a smaller sliding window (20ms in our system) compared with visual frame rate (25fps, each video frame lasts 40ms). Since the audio sequence is twice that of video sequence, it is easy to synchronize them. After post-processing, the mid-level outputs are used by the next level for event detection. 4. EVENT DETECTION This section discusses three problems associated with event detection from the video taken by the main camera for automatic replay generation: 1. The lack of general criteria to define desired events to be selected for replay. For a human broadcaster, suitable selection comes with experience; 2. The requirement to achieve acceptable event detection accuracy from the video taken by the main camera. As mentioned in section 1, this is a difficult problem as fewer cues are available compared with event detection from the broadcast video; 3. The difficulty to detect time boundary of interesting events. For the purpose of generating replay, not only is the detection of event required, we also need to extract time boundary. However, event boundary is an even more subjective concept. Our proposed solutions to the above three problems are discussed in the following three subsections, respectively. 4.1 Selection of replay event To find general criteria on the selection of event for replay, a quantitative study of 143 replays in five FIFA WC 2002 games is conducted. It is shown that all of the events replayed belong to three types as listed in Table 2. Table 2: Events for replay Total Attack Foul Other Number Percentage 100% 49% 47% 4% The three events are: Attack events consist of scoring or just-missing shot of a goal. Foul events consist of a referee decision (referee whistle), and Other events consist of injury events and miscellaneous. If none of the above events is detected, the output of the classifier should default to noevent. For our automatic replay generation, the system will generate replays for these three events. 4.2 Event moment detection We detect events based on the created keywords sequences. Event detection with broadcast video has been widely studied [1]. In broadcast video the transition between the types of shot/view is closely related to the semantic state of the game, hence the Hidden Markov Model (HMM) based classifier which is good at discovering temporal pattern is applicable [12]. In our previous work [13] we also used the HMM for event detection from mid-level representations created from broadcast soccer video. However, when applying an HMM on the keyword sequences created in the above section, we noticed that there is less temporal pattern in the keyword sequences and this makes the HMM method unacceptable. Instead we find certain distinct feature patterns appearing only during the occurrence of an event. We name such moments with distinguishing feature pattern event moment, e.g. the moment of hearing whistle in Foul, the moment of very close distance between goalmouth and ball in Attack. By detecting these moments it is possible to detect the occurrence of an event. Fig. 3 illustrates the structure of part of a game from the perspective of events. As can be seen from Fig. 3a, the timeline of the game consists of event / no-event segments. In addition, within the event boundary, there is a smaller boundary of event moment as described above. The event in this example is an Attack event. We observed that the event moment of an Attack consists of (1) very small ballgoalmouth distance (Fig. 3b); (2) the position keyword has value 2 (Fig. 3c) which is designated for the penalty area

5 (Fig. 2b); and (3) the audio keyword is Acclaim (Fig. 3d). The choice of which mid-level representations to be used for detecting event moments is derived from heuristic and statistical methods. In the above example, the reason of choosing the ball-goalmouth distance and using position keyword is because of the intrinsic nature of soccer scoring [14], and the reason of choosing audio keywords is because of the close relationship between a possible scoring event and the response of the spectators. Figure 3: Event moment of attack (a) event segment (b) ball-goalmouth distance (pixel) (c) position (labels in Fig. 2b) (d) audio (label) As illustrated in Fig. 1, the chosen keyword streams are synchronized and integrated into a multi-dimension keyword vector stream from which the event moment is to be detected. To avoid employing heuristics, a statistical classifier to detect decision boundary is employed, e.g. how small the ball-goalmouth distance is in Attack event, how slow the motions are during a Foul event. To classify the three events, three classifiers are trained to detect event moments for the associated events. To make the classifier robust, each classifier uses a different set of mid-level keywords as input, specifically the inputs are: Attack classifier: position keyword (F 1), ball trajectory (F 2 ), goalmouth location (F 3 ) and audio keyword (F 5 ); Foul classifier: position keyword (F 1 ), motion activity keyword (F 4) and audio keyword (F 5); Other classifier: position keyword (F 1) and motion activity keyword (F 4 ). The output of each classifier is Attack /no-event, Foul /no-event and Other /no-event respectively. The classifier used is the SVM with the Gaussian kernel (radial basis function (RBF)). To train the SVM classifier, event and no-event segments are first manually identified, mid-level representations are then created. To generate the training data, the specific event moments within the events are manually tagged and used as positive examples for training the classifier. Sequences from the rest of the clips are used as negative training samples. In the detection process, the entire keyword sequences from the test video are fed to the SVM classifier and the segments with the same statistical pattern as event moment are identified. By applying post-processing similar to subsection 3.4, the small fluctuation in SVM classification results is eliminated to avoid reduplicated detection of the event moment from the same event. 4.3 Event boundary decision If an event moment is found, a search algorithm will be applied to search backward and forward from the event moment instance to identify the duration of the event. The entire video segment from this duration is used as the replay of the event. There are many factors affecting the human perceptual understanding of the duration of an event: One factor is time, i.e. events usually possess only a certain temporal duration. Another factor is the position where the event happens. Mostly events happen in a certain position, hence scenes from previous location may not be of much interest to audience. This assumption is true unless there is fast position changing in the event (e.g. a goal scoring by a long shot from the middle field). These observations motivate us to detect event boundary by making use of position keywords (F 1 ) and time duration. The backward search to identify event starting boundary begins by checking whether the location keyword F 1 has changed from t s D 1 to t s D 2, where t s is the event moment starting time, D 1 < D 2 where D 1, D 2 are the minimal and maximal offset threshold respectively. Specifically, the following pseudo code illustrates our approach: 1. Let time t = t s D If F 1 (t) F 1 (t s D 1 ) then the event starting time t es is set to t. Goto step If t < t s D 2 then t es is set to t. Goto step Let t = t 1, and loop back to step Stop. The forward search is applied to detect the event ending time t ee. The algorithm is similar to the backward search, the differences are only in the thresholds and that the search now is in forward time. We have noted that different types of events require different thresholds and they can be found by empirical evaluations. 5. REPLAY GENERATION Based on the events and event boundaries detected from the video taken by the main camera, we can automatically generate replays for these events and decide whether and where to insert the replays. Since this has been very subjective for human broadcasters, we need to set general criteria for this production. Another quantitative study is done on the same video database mentioned in subsection 4.1 and the result is given in Table 3. Table 3: Possible replay insertion place Total Instant replay Delayed replay MM FI IE MM: missed by main camera; FI: followed by another interesting segment; IE: very important event It is found that all the replays belong to two classes: instant replay and delayed replay. Most replays are instant replays that are inserted almost immediately following the

6 event if subsequent segments are un-interesting. The other replay class, delayed replay, occurs for several reasons: a) the event is missed by the main camera (MM), b) the event to be replayed is followed by an interesting segment (FI), hence the broadcaster has to delay the replay, and c) the event is important and worth being replayed many times (IE). The input to the replay generation system is the event detection result which has segmented the game into sequentially event / no-event structure, as illustrated in Fig. 4 row 1. If an event segment is identified, the system examines whether an instant replay can be inserted at the following no-event segment, and react accordingly. This is shown in Fig. 4 row 2 and 3 where instant replays are inserted for both event 1 and event 2. In addition, the system will examine whether the same event meets the delayed replay condition. If so, the system buffers the event and inserts the replay in a suitable subsequent time slot. This is shown in Fig. 4 row 2 and 3 where a delayed replay is inserted at a later time slot for event 1. Fig. 4 row 4 shows the generated video after replay insertion. Figure 4: Replay Structure In our current application, we have not examined the use of sub-camera capture for the replay scenes. The current work restricts the replay to that of the main camera capture, enhancement to use sub-camera capture is on-going. 5.1 Instant replay generation The replay starting time t rs and ending time t re are computed as: t rs = t ee + D 3 (7) t re = t rs + (t ee t es ) ν (8) where t es and t ee are the starting and ending time of the event as mentioned in subsection 4.3 respectively. D 3 represents the time duration between the end of an event and the start of the instant replay. We arbitrarily set D 3 to 1 second and this is adjustable. ν is a factor defining how slow the replay is displayed compared with real-time. Then the system examines whether the time slot from t rs to t re in the subsequent no-event segment meets one of the following conditions: no/low motion; high motion but position not at area 2 in Fig. 2b the penalty area. If so, an instant replay is inserted. 5.2 Delayed replay generation As we mentioned at the start of this section, delayed replays should be inserted for MM, FI or IE events. MM events are unable to be processed by our system as they cannot be detected using the video taken by the main camera. Our replay generation system will buffer the FI and IE events and find suitable time slots to insert delayed replays. In addition, to identify whether an event is an IE event, an importance measure I is given to the event based on the duration of its event moment as generally the longer the event moment, the more important the event: I = t te t ts (9) And those events with I > T 4 are deemed as important events. In our system, T 4 is set to 80 frames empirically so that only 5% events detected become important ones. This ratio is consistent to broadcast video identification of important events. The duration of the delayed replay is the same as the instant replay. The system will search in subsequent noevent segments for a time slot with t re t rs in length that meets the following condition: no motion; If such a time slot is found, a delayed replay is inserted. This search continues until a suitable time slot is found for FI event, or two delayed replays have been inserted for an IE event, or a more important IE event occurs. 6. EXPERIMENTAL RESULTS 6.1 Accuracy of mid-level representation Usually a soccer event lasts many frames, thus the detection process should examine a collection of frames for the event. In our experiments, we have noted that sporadic classification errors occur in the mid-level representations. However, these scattered errors are time-averaged and hence do not affect the overall classification performance in the high level system Position keyword Fig. 5 demonstrates the detection of two typical areas defined in Fig. 2b. (a) Position 2 (b) Position 3 Figure 5: Position keywords creation To evaluate the performance of the position keyword creation, totally 10 minutes of videos (from two FIFA WC 2002 games, Senegal vs Turkey and Germany vs Brazil) consisting of the main camera video segments only are manually

7 labeled. The result of keyword generation for this database is compared with the labels, and the accuracy of the position keyword is listed in Table 4. It is noted that the detection accuracy for field area 4 is low compared with the other labels. This can be easily explained: Field area 4 (Fig. 2b) has fewer cues than the other areas, e.g. it does not have fieldlines or goalmouth or central circle. This lack of distinct information thus results in poorer accuracy. Table 4: Accuracy of position keyword Position Accuracy Position Accuracy % % % % % % The position is the 6 labels given in Fig. 2b Ball trajectory Ball trajectory test is conducted on 15 sequences (176 seconds). The content of the video used is the final match of FIFA WC 2002 Final. These sequences are representative in the way that they include short to long sequences, ball-less sequences, and sequences with assorted frames of closed-up and full view frames. Table 5 shows the performance. More detailed results of ball trajectory tracking can be found in our previous work [2]. Table 5: Accuracy of ball trajectory Detected and tracked False positive Accuracy 4283 frames 25 frames 98.8% Audio keyword To evaluate the accuracy of the audio keyword generation module, three audio classes are defined: Acclaim, Whistle and Noise. 30 minutes of soccer audio data are segmented into 20ms frames, and each frame is classified into one of the three classes. In this experiment, 50%/50% is used as training/testing data set. The performance of the audio feature selected by exhaustive search is compared with our previous work [15] where feature selection was done by using domain knowledge. Table 6: Accuracy of audio keywords Acclaim Whistle Noise Previous method [15] 91.2% 90.8% 89.2% Current method 93.8% 94.4% 96.3% 6.2 Event detection To examine the performance of our system, we selected 50 minutes of unedited main camera video (from the Singapore- League) and also 4.5 hours of FIFA WC 2002 broadcast video in the experiment. There are two reasons why we choose broadcast soccer video: firstly, it is very difficult to collect main camera video as most TV stations do not keep such tapes; secondly, the broadcast video can also be used as the ground truth to evaluate our application level result, which will be described in the later sections. In fact, the non-main-camera shots in broadcast video are identified and filtered out by our visual analysis blocks, and only main camera segments are processed. The event detection results from these two types of videos are listed in Table 7 and Table 8, respectively. Table 7: Accuracy from main camera video Event Recall Precision BDA Attack 3 60% 100% 72.2% Foul % 70.0% 71.4% Other % 50.0% 60.0% Table 8: Accuracy from broadcast video Event Recall Precision BDA Attack % 78.3% 69.4% Foul % 72.8% 80.9% Other 12 80% 66.7% 65.0% BDA: boundary decision accuracy The boundary decision accuracy (BDA) in Table 7 and Table 8 is computed by BDA = τ db τ mb max(τ db, τ mb ) (10) where τ db and τ mb are the automatically detected event boundary and the manually labeled event boundary, respectively. It is observed that the boundary decision accuracy for event Other is lower compared with the other two events. This is because Other event is mainly made up of injure or sudden events. The cameraman usually continues moving the camera to capture the ball until the game is stopped, e.g. the ball is kicked out of the touch-line so that the injured player can be treated. Then the camera is focused on the wounded players. This results in either missing the extract event moment by the main camera or an unpredictable duration of camera movement. These reasons affect the event moment detection and hence affect the boundary decision accuracy. 6.3 Automatic replay generation As we have both the automatically generated video and the broadcast video from broadcast TV program, we can use the later as the ground truth to evaluate the performance of the replay generated. The following table compares the automatic replay generation by our system with the broadcast videos replays. Table 9: Replay Generation video Automatic generation Broadcast replay total same 13 missed 2 recall 86.7% precision 35.1% We use the term same in Table 9 which means replays are inserted in both the automatically generated video and the broadcast video. It is observed from Table 9 that our system generates significantly more replays than human broadcaster s selection. This can be understood in at least three ways:

8 1) Lack of general broadcast syntax: As noted in the previous section, the selection of replay is a subjective choice. We have observed that, for example, for a justmissing shoot event, the human broadcaster s choice was a long time close-up view of the disappointed player and subsequently back to main camera view when the game resumes with no slow motion replay. However in a similar event with the same game feature, the choice was to launch into a replay. In another example, replay was performed when an offside (foul event) was detected, for such foul events, replays using side camera logged videos are often used to explain the correctness of the assistant referee, while on other cases no replay was given. Hence it is obvious that an automated system will generate more replays if predefined conditions are met. 2) The ability of automation: Generating live soccer highlight is currently a time-critical and labor-intensive process. The strict time limit set to generate a replay means that a good replay segment selection might be missed. Hence, with the assistance of an automatic system, more replays will be generated. 3) The limit of event detection accuracy: The third possible reason for the excess in replay generation might be due to the failure of event detection. Incorrect event detection ultimately leads to missed or incorrect replays being carried out. Currently automatic systems are not able to detect the event, especially event boundaries, as accurately as humans. However, with human intervention (but at much lower expense compared with broadcasting solely by human, e.g. the director only need to justify the necessity of the generated replay, instead of monitoring throughout the match and manually finding suitable replay time slots), this problem can be minimized. 7. CONCLUSIONS AND FUTURE WORK This paper presents a novel framework to detect events from soccer videos taken by a single main camera and to automatically generate soccer replays for broadcasting. This has obvious importance to reduce manual processing, thus reducing the size of crew in the broadcast studio and streamlining the highlight generating process. The accuracy of event detection and soccer replay generated by our framework can be complemented and refined by a small quantity of human interventions. We have built up a demo system for the framework. Although currently the system is performing off-line, it can be improved into an on-line processing system. This is because the required mid-level representations can be generated onthe-fly, and these analysis are not computationally expensive, e.g. we do not need to track players to analyze their behaviors. The successive high-level and application level processing can be done in one pass, though some processing delay would be introduced, e.g., to search a suitable replay time slot. We have begun investigating the next stage of the proposed system. The future framework includes several parts: Firstly, to improve the event detection accuracy by employing more cues using either mid-level representation creation and/or high level semantic events modeling and detection; Secondly, to examine new techniques to detect the missed by main camera event to enable a full-scale replay generation system; Thirdly, to enhance the system s functionality into a fully automatic broadcast control or broadcast video generation system by incorporating a system to automatically select sub-camera capture for replay, control multi-camera switching, interactive caption overlay, etc tasks; Fourthly, to extend the system to other sports domains by investigating respective domain knowledge and introducing new domain analysis while keeping the generic structure of the framework presented in this paper unchanged. 8. REFERENCES [1] N. Adami, R. Leonardi, and P. Migliorati, An overview of multi-modal techniques for the characterization of sport programmes, SPIE-VCIP 03, pp , [2] X. Yu and et al, Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video, ACM MM 03, pp , [3] L. Duan and et al, A mid-level representation framework for semantic sports video analysis, ACM MM 03, pp , [4] J. Assfalg and et al, Semantic annotation of soccer videos: automatic highlights identification, CVIU 03, vol. 92, pp , [5] A. Ekin, A. M. Tekalp, and R. Mehrotra, Automatic soccer video analysis and summarization, IEEE Trans. on Image Processing, vol. 12:7, pp , [6] N. Babaguchi and N. Nitta, Intermodal collaboration: A strategy for semantic content analysis for broadcasted sports video, ICIP 03, vol. 1, pp , [7] Z. Xiong., R. Radhakrishnan, and A. Divakaran, Generation of sports highlights using motion activity in combination with a common audio feature extraction framework, ICIP 03, vol. 1, pp. 5 8, [8] N. Nitta and N. Babaguchi, Automatic story segmentation of closed-caption text for semantic content analysis of broadcasted sports video, Inter. Workshop on MM Info. Sys. 02, pp , [9] D. Zhang and S.-F. Chang, Event detection in baseball video using superimposed caption recognition, ACM MM 02, pp , [10] Y. Ma and H. Zhang, Motion pattern based video classification using support vector machines, ISCAS 02, Theme: Circuits and Systems for Ubiquitous Computing, [11] Y. Tan and et al, Rapid estimation of camera motion from compressed video with application to video annotation, IEEE Trans. on Circuits and Systems for Video Technology, vol. 10-1, pp , [12] L. Xie and et al, Structure analysis of soccer video with domain knowledge and hidden markov models, Pattern Recognition Letters, vol. 24, [13] J. Wang, C. Xu, E. Chng, and Q. Tian, Sports highlight detection from keyword sequences using hmm, ICME 04, [14] I. F. A. Board, Law of the game, Federation International de Football Association, 11 hitzigweg, 8030 Zurich, Switzerland, July [15] M. Xu and et al, Creating audio keywords for event detection in soccer video, ICME 03, vol. 2, pp , 2003.

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories

More information

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection

Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning Mitsubishi Electric Research Laboratories, Cambridge, MA, USA {peker,ajayd,}@merl.com

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Unit Detection in American Football TV Broadcasts Using Average Energy of Audio Track

Unit Detection in American Football TV Broadcasts Using Average Energy of Audio Track Unit Detection in American Football TV Broadcasts Using Average Energy of Audio Track Mei-Ling Shyu, Guy Ravitz Department of Electrical & Computer Engineering University of Miami Coral Gables, FL 33124,

More information

Automatic Soccer Video Analysis and Summarization

Automatic Soccer Video Analysis and Summarization 796 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 7, JULY 2003 Automatic Soccer Video Analysis and Summarization Ahmet Ekin, A. Murat Tekalp, Fellow, IEEE, and Rajiv Mehrotra Abstract We propose

More information

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264 Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264 Ju-Heon Seo, Sang-Mi Kim, Jong-Ki Han, Nonmember Abstract-- In the H.264, MBAFF (Macroblock adaptive frame/field) and PAFF (Picture

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Reducing False Positives in Video Shot Detection

Reducing False Positives in Video Shot Detection Reducing False Positives in Video Shot Detection Nithya Manickam Computer Science & Engineering Department Indian Institute of Technology, Bombay Powai, India - 400076 mnitya@cse.iitb.ac.in Sharat Chandran

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Wipe Scene Change Detection in Video Sequences

Wipe Scene Change Detection in Video Sequences Wipe Scene Change Detection in Video Sequences W.A.C. Fernando, C.N. Canagarajah, D. R. Bull Image Communications Group, Centre for Communications Research, University of Bristol, Merchant Ventures Building,

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Chapter 10 Basic Video Compression Techniques

Chapter 10 Basic Video Compression Techniques Chapter 10 Basic Video Compression Techniques 10.1 Introduction to Video compression 10.2 Video Compression with Motion Compensation 10.3 Video compression standard H.261 10.4 Video compression standard

More information

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur Module 8 VIDEO CODING STANDARDS Lesson 27 H.264 standard Lesson Objectives At the end of this lesson, the students should be able to: 1. State the broad objectives of the H.264 standard. 2. List the improved

More information

Reduced complexity MPEG2 video post-processing for HD display

Reduced complexity MPEG2 video post-processing for HD display Downloaded from orbit.dtu.dk on: Dec 17, 2017 Reduced complexity MPEG2 video post-processing for HD display Virk, Kamran; Li, Huiying; Forchhammer, Søren Published in: IEEE International Conference on

More information

Goal Detection in Soccer Video: Role-Based Events Detection Approach

Goal Detection in Soccer Video: Role-Based Events Detection Approach International Journal of Electrical and Computer Engineering (IJECE) Vol. 4, No. 6, December 2014, pp. 979~988 ISSN: 2088-8708 979 Goal Detection in Soccer Video: Role-Based Events Detection Approach Farshad

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005. Wang, D., Canagarajah, CN., & Bull, DR. (2005). S frame design for multiple description video coding. In IEEE International Symposium on Circuits and Systems (ISCAS) Kobe, Japan (Vol. 3, pp. 19 - ). Institute

More information

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Story Tracking in Video News Broadcasts Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Acknowledgements Motivation Modern world is awash in information Coming from multiple sources Around the clock

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

Detecting the Moment of Snap in Real-World Football Videos

Detecting the Moment of Snap in Real-World Football Videos Detecting the Moment of Snap in Real-World Football Videos Behrooz Mahasseni and Sheng Chen and Alan Fern and Sinisa Todorovic School of Electrical Engineering and Computer Science Oregon State University

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Luiz G. L. B. M. de Vasconcelos Research & Development Department Globo TV Network Email: luiz.vasconcelos@tvglobo.com.br

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding Jun Xin, Ming-Ting Sun*, and Kangwook Chun** *Department of Electrical Engineering, University of Washington **Samsung Electronics Co.

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Detecting Soccer Goal Scenes from Broadcast Video using Telop Region

Detecting Soccer Goal Scenes from Broadcast Video using Telop Region Information Engineering Express International Institute of Applied Informatics 2017, Vol.3, No.2, P.25-34 Detecting Soccer Scenes from Broadcast Video using Region Naoki Ueda *, Masao Izumi Abstract We

More information

SHOT DETECTION METHOD FOR LOW BIT-RATE VIDEO CODING

SHOT DETECTION METHOD FOR LOW BIT-RATE VIDEO CODING SHOT DETECTION METHOD FOR LOW BIT-RATE VIDEO CODING J. Sastre*, G. Castelló, V. Naranjo Communications Department Polytechnic Univ. of Valencia Valencia, Spain email: Jorsasma@dcom.upv.es J.M. López, A.

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Comparison Parameters and Speaker Similarity Coincidence Criteria: Comparison Parameters and Speaker Similarity Coincidence Criteria: The Easy Voice system uses two interrelating parameters of comparison (first and second error types). False Rejection, FR is a probability

More information

Automatic Summarization of Music Videos

Automatic Summarization of Music Videos Automatic Summarization of Music Videos XI SHAO, CHANGSHENG XU, NAMUNU C. MADDAGE, and QI TIAN Institute for Infocomm Research, Singapore MOHAN S. KANKANHALLI School of Computing, National University of

More information

Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite

Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite Colin O Toole 1, Alan Smeaton 1, Noel Murphy 2 and Sean Marlow 2 School of Computer Applications 1 & School of Electronic Engineering

More information

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER PERCEPTUAL QUALITY OF H./AVC DEBLOCKING FILTER Y. Zhong, I. Richardson, A. Miller and Y. Zhao School of Enginnering, The Robert Gordon University, Schoolhill, Aberdeen, AB1 1FR, UK Phone: + 1, Fax: + 1,

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Recommendation ITU-T H.261 Fernando Pereira The objective of this lab session about Recommendation ITU-T H.261 is to get the students familiar with many aspects

More information

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Priyanka S. Jadhav M.E. (Computer Engineering) G. H. Raisoni College of Engg. & Mgmt. Wagholi, Pune, India E-mail:

More information

AI FOR BETTER STORYTELLING IN LIVE FOOTBALL

AI FOR BETTER STORYTELLING IN LIVE FOOTBALL AI FOR BETTER STORYTELLING IN LIVE FOOTBALL N. Déal1 and J. Vounckx2 1 UEFA, Switzerland and 2 EVS, Belgium ABSTRACT Artificial Intelligence (AI) represents almost limitless possibilities for the future

More information

Video coding standards

Video coding standards Video coding standards Video signals represent sequences of images or frames which can be transmitted with a rate from 5 to 60 frames per second (fps), that provides the illusion of motion in the displayed

More information

UC San Diego UC San Diego Previously Published Works

UC San Diego UC San Diego Previously Published Works UC San Diego UC San Diego Previously Published Works Title Classification of MPEG-2 Transport Stream Packet Loss Visibility Permalink https://escholarship.org/uc/item/9wk791h Authors Shin, J Cosman, P

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

LAUGHTER serves as an expressive social signal in human

LAUGHTER serves as an expressive social signal in human Audio-Facial Laughter Detection in Naturalistic Dyadic Conversations Bekir Berker Turker, Yucel Yemez, Metin Sezgin, Engin Erzin 1 Abstract We address the problem of continuous laughter detection over

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences

Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences , pp.120-124 http://dx.doi.org/10.14257/astl.2017.146.21 Shot Transition Detection Scheme: Based on Correlation Tracking Check for MB-Based Video Sequences Mona A. M. Fouad 1 and Ahmed Mokhtar A. Mansour

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Content-based Music Structure Analysis with Applications to Music Semantics Understanding

Content-based Music Structure Analysis with Applications to Music Semantics Understanding Content-based Music Structure Analysis with Applications to Music Semantics Understanding Namunu C Maddage,, Changsheng Xu, Mohan S Kankanhalli, Xi Shao, Institute for Infocomm Research Heng Mui Keng Terrace

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards COMP 9 Advanced Distributed Systems Multimedia Networking Video Compression Standards Kevin Jeffay Department of Computer Science University of North Carolina at Chapel Hill jeffay@cs.unc.edu September,

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

Name Identification of People in News Video by Face Matching

Name Identification of People in News Video by Face Matching Name Identification of People in by Face Matching Ichiro IDE ide@is.nagoya-u.ac.jp, ide@nii.ac.jp Takashi OGASAWARA toga@murase.m.is.nagoya-u.ac.jp Graduate School of Information Science, Nagoya University;

More information

On-Supporting Energy Balanced K-Barrier Coverage In Wireless Sensor Networks

On-Supporting Energy Balanced K-Barrier Coverage In Wireless Sensor Networks On-Supporting Energy Balanced K-Barrier Coverage In Wireless Sensor Networks Chih-Yung Chang cychang@mail.tku.edu.t w Li-Ling Hung Aletheia University llhung@mail.au.edu.tw Yu-Chieh Chen ycchen@wireless.cs.tk

More information

The Measurement Tools and What They Do

The Measurement Tools and What They Do 2 The Measurement Tools The Measurement Tools and What They Do JITTERWIZARD The JitterWizard is a unique capability of the JitterPro package that performs the requisite scope setup chores while simplifying

More information

Temporal data mining for root-cause analysis of machine faults in automotive assembly lines

Temporal data mining for root-cause analysis of machine faults in automotive assembly lines 1 Temporal data mining for root-cause analysis of machine faults in automotive assembly lines Srivatsan Laxman, Basel Shadid, P. S. Sastry and K. P. Unnikrishnan Abstract arxiv:0904.4608v2 [cs.lg] 30 Apr

More information

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT CSVT -02-05-09 1 Color Quantization of Compressed Video Sequences Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 Abstract This paper presents a novel color quantization algorithm for compressed video

More information

CHAPTER 8 CONCLUSION AND FUTURE SCOPE

CHAPTER 8 CONCLUSION AND FUTURE SCOPE 124 CHAPTER 8 CONCLUSION AND FUTURE SCOPE Data hiding is becoming one of the most rapidly advancing techniques the field of research especially with increase in technological advancements in internet and

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks July 22 nd 2008 Vineeth Shetty Kolkeri EE Graduate,UTA 1 Outline 2. Introduction 3. Error control

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling

Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling International Conference on Electronic Design and Signal Processing (ICEDSP) 0 Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling Aditya Acharya Dept. of

More information

PulseCounter Neutron & Gamma Spectrometry Software Manual

PulseCounter Neutron & Gamma Spectrometry Software Manual PulseCounter Neutron & Gamma Spectrometry Software Manual MAXIMUS ENERGY CORPORATION Written by Dr. Max I. Fomitchev-Zamilov Web: maximus.energy TABLE OF CONTENTS 0. GENERAL INFORMATION 1. DEFAULT SCREEN

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Dual frame motion compensation for a rate switching network

Dual frame motion compensation for a rate switching network Dual frame motion compensation for a rate switching network Vijay Chellappa, Pamela C. Cosman and Geoffrey M. Voelker Dept. of Electrical and Computer Engineering, Dept. of Computer Science and Engineering

More information

Motion Video Compression

Motion Video Compression 7 Motion Video Compression 7.1 Motion video Motion video contains massive amounts of redundant information. This is because each image has redundant information and also because there are very few changes

More information

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences Michael Smith and John Villasenor For the past several decades,

More information

TRAFFIC SURVEILLANCE VIDEO MANAGEMENT SYSTEM

TRAFFIC SURVEILLANCE VIDEO MANAGEMENT SYSTEM TRAFFIC SURVEILLANCE VIDEO MANAGEMENT SYSTEM K.Ganesan*, Kavitha.C, Kriti Tandon, Lakshmipriya.R TIFAC-Centre of Relevance and Excellence in Automotive Infotronics*, School of Information Technology and

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Lie Lu, Muyuan Wang 2, Hong-Jiang Zhang Microsoft Research Asia Beijing, P.R. China, 8 {llu, hjzhang}@microsoft.com 2 Department

More information

White Paper. Video-over-IP: Network Performance Analysis

White Paper. Video-over-IP: Network Performance Analysis White Paper Video-over-IP: Network Performance Analysis Video-over-IP Overview Video-over-IP delivers television content, over a managed IP network, to end user customers for personal, education, and business

More information

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 International Conference on Applied Science and Engineering Innovation (ASEI 2015) Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 1 China Satellite Maritime

More information

Digital Video Telemetry System

Digital Video Telemetry System Digital Video Telemetry System Item Type text; Proceedings Authors Thom, Gary A.; Snyder, Edwin Publisher International Foundation for Telemetering Journal International Telemetering Conference Proceedings

More information

Story Tracking in Video News Broadcasts

Story Tracking in Video News Broadcasts Story Tracking in Video News Broadcasts Jedrzej Zdzislaw Miadowicz M.S., Poznan University of Technology, 1999 Submitted to the Department of Electrical Engineering and Computer Science and the Faculty

More information

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

An Overview of Video Coding Algorithms

An Overview of Video Coding Algorithms An Overview of Video Coding Algorithms Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Video coding can be viewed as image compression with a temporal

More information

Visual Communication at Limited Colour Display Capability

Visual Communication at Limited Colour Display Capability Visual Communication at Limited Colour Display Capability Yan Lu, Wen Gao and Feng Wu Abstract: A novel scheme for visual communication by means of mobile devices with limited colour display capability

More information

Bit Rate Control for Video Transmission Over Wireless Networks

Bit Rate Control for Video Transmission Over Wireless Networks Indian Journal of Science and Technology, Vol 9(S), DOI: 0.75/ijst/06/v9iS/05, December 06 ISSN (Print) : 097-686 ISSN (Online) : 097-5 Bit Rate Control for Video Transmission Over Wireless Networks K.

More information

White Paper : Achieving synthetic slow-motion in UHDTV. InSync Technology Ltd, UK

White Paper : Achieving synthetic slow-motion in UHDTV. InSync Technology Ltd, UK White Paper : Achieving synthetic slow-motion in UHDTV InSync Technology Ltd, UK ABSTRACT High speed cameras used for slow motion playback are ubiquitous in sports productions, but their high cost, and

More information

Dual Frame Video Encoding with Feedback

Dual Frame Video Encoding with Feedback Video Encoding with Feedback Athanasios Leontaris and Pamela C. Cosman Department of Electrical and Computer Engineering University of California, San Diego, La Jolla, CA 92093-0407 Email: pcosman,aleontar

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

Principles of Video Segmentation Scenarios

Principles of Video Segmentation Scenarios Principles of Video Segmentation Scenarios M. R. KHAMMAR 1, YUNUSA ALI SAI D 1, M. H. MARHABAN 1, F. ZOLFAGHARI 2, 1 Electrical and Electronic Department, Faculty of Engineering University Putra Malaysia,

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

Bridging the Gap Between CBR and VBR for H264 Standard

Bridging the Gap Between CBR and VBR for H264 Standard Bridging the Gap Between CBR and VBR for H264 Standard Othon Kamariotis Abstract This paper provides a flexible way of controlling Variable-Bit-Rate (VBR) of compressed digital video, applicable to the

More information

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang

More information

Multi-modal Kernel Method for Activity Detection of Sound Sources

Multi-modal Kernel Method for Activity Detection of Sound Sources 1 Multi-modal Kernel Method for Activity Detection of Sound Sources David Dov, Ronen Talmon, Member, IEEE and Israel Cohen, Fellow, IEEE Abstract We consider the problem of acoustic scene analysis of multiple

More information

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs Abstract Large numbers of TV channels are available to TV consumers

More information

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine Project: Real-Time Speech Enhancement Introduction Telephones are increasingly being used in noisy

More information

Understanding PQR, DMOS, and PSNR Measurements

Understanding PQR, DMOS, and PSNR Measurements Understanding PQR, DMOS, and PSNR Measurements Introduction Compression systems and other video processing devices impact picture quality in various ways. Consumers quality expectations continue to rise

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

SCALABLE video coding (SVC) is currently being developed

SCALABLE video coding (SVC) is currently being developed IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 7, JULY 2006 889 Fast Mode Decision Algorithm for Inter-Frame Coding in Fully Scalable Video Coding He Li, Z. G. Li, Senior

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information