CARDIFF UNIVERSITY EXAMINATION PAPER Academic Year: 2013/2014 Examination Period: Examination Paper Number: Examination Paper Title: Duration: Autumn CM3106 Solutions Multimedia 2 hours Do not turn this page over until instructed to do so by the Senior Invigilator. Structure of Examination Paper: There are 11 pages. There are 4 questions in total. There are no appendices. The maximum mark for the examination paper is 75 and the mark obtainable for a question or part of a question is shown in brackets alongside the question. Students to be provided with: The following items of stationery are to be provided: ONE answer book. Instructions to Students: Answer 3 questions. The use of calculators is permitted in this examination. The use of translation dictionaries between English or Welsh and a foreign language bearing an appropriate departmental stamp is permitted in this examination. 1 PLEASE TURN OVER
Q1. (a) State Nyquist s Sampling Theorem. In order to effectively sample a waveform the sampling frequency must be at least twice that of the highest frequency present in the signal 2 Marks Bookwork (b) What general considerations affect the selection of the sampling rate in multimedia data? Basically is a compromise between size and quality. [1] Sampling frequency affects the quality of the data - higher frequency equals better sampling hence representation of the underlying signal (given fixed frequency range of signal) [1] Sampling frequency affects size of digitized data higher frequency means more samples therefore more data. [1] 3 Marks Bookwork (c) For each of the following media types: audio, graphics, images and video, briefly discuss how sampling affects the quality of the data, the cause of sampling artefacts, and the form in which they manifest themselves in the each data modality. Sketch of possible answers (Other variants accepted): Audio: Quality: Lack of clarity in high frequencies, telephonic voices at low sampling frequencies [1] Sampling artefact: Digital noise present in signal, loss of high frequencies or poor representation of high frequencies give audio aliasing (should be filtered out before sampling). Robotic voice. [1] Cause: Low pass filtering leads to lack of clarity in high frequencies, Quantisation/Bit Depth (8 vs 16-bit) leads to digital noise. [2] Graphics: Images Video: Quality: Sampling not really an issue with vector graphics. [1] Sampling artefact: Rendering may lead to Aliasing effect in lines etc. (as in images) [1] Cause: Sampling Resolution of pixel rendering not as high as that required for the line. [1] Quality: Image size decreases so less detail or sampling artefacts. [1] Sampling artefact: Aliasing effect in blocky images. Checkerboard/Moire fringe effect. [2] Cause: As above for Graphics. [1] 2
Quality: Video Frame size decreases so less detail or sampling artefacts, motion blur or loss of motion detail, motion illusions. [2] Sampling artefact: Aliasing effect in frame images, jittery motion tracking etc. Temporal aliasing: Wagon Wheel effect apparently motion in reverse direction. Raster scan aliasing: twinkling or strobing effects on sharp horizontal lines. Interlacing aliasing. [3] Cause: Temporal aliasing: Sampling motion less than Nyquist s frequency gives strobing type aliasing effect. Interlacing aliasing: effectively halves the sampling frequency. Rater-Scan/Frame subsampling as Images/graphics above. [4] Part (c) sub-total 20 Marks Extended bookwork reasoning: gathers together different parts of module. Causes artefacts of briefly addressed in tutorial/lab work Question Total: 25 Marks 3 PLEASE TURN OVER
Q2. (a) What is the difference between reverb and echo? Echo implies a distinct, delayed version of a sound, [1] Reverb each delayed sound wave arrives in such a short period of time such that we do not perceive each reflection as a copy of the original sound. [1] TOTAL 2 Marks Bookwork (b) Describe two filter based approaches to simulating the reverb effect in digital audio, explaining how one approach builds on the other and how filters are used to achieve the desired effect Schroeder s Reverberator : Early digital reverberation algorithms tried to mimic the a rooms reverberation by primarily using two types of infinite impulse response (IIR) filters. Comb filter usually in parallel banks [1] Allpass filter usually sequentially after comb filter banks [1] A delay is (set via the feedback loops allpass filter) aims to make the output would gradually decay. Moorer s Reverberator : Moorer s reverberator build s on Schroeder: Parallel comb filters with different delay lengths are used to simulate modes of a room, and sound reflecting between parallel walls [1] Allpass filters to increase the reflection density (diffusion). [1] Lowpass filters inserted in the feedback loops to alter the reverberation time as a function of frequency Shorter reverberation time at higher frequencies is caused by air absorption and reflectivity characteristics of wall). [1] Implement a dc-attenuation, and a frequency dependent attenuation. Different in each comb filter because their coefficients depend on the delay line length [1] 6 Marks Bookwork 4
(c) Describe, briefly how Convolution Reverb is implemented. What the is fundamental theorem that underpins this approach? Fundamental theorem The convolution theorem which states that: If f(x) and g(x) are two functions with Fourier transforms F (u) and G(u), then the Fourier transform of the convolution f(x) g(x) is simply the product of the Fourier transforms of the two functions, F (u)g(u). [2] Convolution Reverb: Record a room impulse: Instant sound (gun shot, drum beat) in the given room (at given location). [1] Compute Fourier Transforms of room impulse and audio sample (or windowed segment). [1] Multiply two Fourier Transforms and inverse Fourier Transform to get new reverberated audio. [1] 5 Marks Bookwork (d) A new audio application requires that reverb be simulated as would be heard at a precise location within an acoustic space. This location must be allowed to vary and will be user-defined. Describe how this may efficiently be implemented via Convolution Reverb. Clearly state what challenges this approach presents for standard convolution reverb approaches and outline how your solution addresses such problems. Challenge is that ideally one needs to record an impulse at every point in the room - theoretically an infinite number of impulses! [1] Record an adequate number of impulse responses to capture room acoustics [1] Room dependent: depends of room shape, acoustics etc. [1] Too many impulses will significantly increase processing time. [1] Compute convolution reverb responses at each location or ones close to the user-defined position at least.. [1] Interpolate near response to get approximate reverb at new location. [1] 6 Marks Unseen Problem 5 PLEASE TURN OVER
(e) An audio engineer has been presented with a rare audio recording at an established concert hall. The concert hall is still in existence but the performer is not. The audio engineer has been tasked to create a new piece of music incorporating this recording with a new studio recorded backing. The problem is that the reverberation of the concert hall recording does not match the reverb of the new recording. How may the audio engineer achieve a better match to the reverb audio characteristics of the two recordings? Solution Outline: Record impulse responses at concert hall and the studio. [1] Deconvolve the rare audio recording with the concert hall impulse response to factor out the reverb in the original recording divide rather than multiply in the Fourier based convolution reverb.. [2] Apply the new studio reverb to deconvolved audio via standard convolution reverb with studio impulse response. [1] 4 Marks Unseen Problem Question Total: 25 Marks 6
Q3. (a) Explain briefly what motion compensation is used for in MPEG video compression. [2] Since consecutive video frames are often similar except for changes induced by objects moving within the frames, motion compensation allows to cheaply encode a lot of variation energy. Bookwork. (b) Assume 2 2 macroblock is used for motion compensation. For the following macroblock # # # # # 5 7 # # 4 5 # # # # # the corresponding intensities in the reference frame are given as follows: 1 4 6 7 2 5 3 7 1 2 4 8 5 2 4 4 Calculate the motion vector, with complete search within a ±1 pixel search window. List the steps to obtain the result. Having computed the motion vector, determine the the macroblock to be coded after motion compensation. Brute force search: For dx=-1, dy=-1, SAD=9 For dx=-1, dy=0, SAD=5 For dx=-1, dy=1, SAD=4 For dx=0, dy=-1, SAD=11 For dx=0, dy=0, SAD=7 For dx=0, dy=1, SAD=5 For dx=1, dy=-1, SAD=13 For dx=1, dy=0, SAD=9 For dx=1, dy=1, SAD=3 [4] Therefore the best displacement is (1, 1) with SAD=3. After motion compensation, the difference between the target macroblock and the best match in the reference will be used, i.e. ( 5 7 4 5 ) ( 4 8 4 4 ) = ( 1 1 0 1 ) 7 Marks Unseen Problem [3] 7 PLEASE TURN OVER
(c) What is the key difference between I-Frames, P-Frames and B-Frames? I-Frame: Basic reference frame for each group of pictures essentially a JPEG compressed image. [1] P-Frame: Coded forward difference frame w.r.t. last I or P frame. [1] B-Frame: Coded backward difference frame w.r.t. last I or P frame. [1] 3 Marks Bookwork. (d) Explain briefly why JPEG compression is not always suitable for compression of images that contain sharp edges or abrupt changes of intensity (such as black text on a white background). Low pass filtering less to blurring of edges - High Frequency component will not be small as assumed by JPEG. [2] Ringing artefacts occur due to Gibbs phenomenon: Fourier sums overshoot at a jump discontinuity, and this overshoot does not die out as the frequency increases. [2] 4 Marks Bookwork. (e) Consider the following block of frequency domain values from a video frame arising during MPEG compression: 196 207 1 129 1 7 129 199 11 73 73 194 75 78 139 135 Apply successively to this block: (1) MPEG quantisation using a constant quantisation value of 64. (2) Zig-zag scanning. (3) Run length encoding. [9] Quantisation in this case is simply dividing by the quantisation constant and rounding down: 3 3 0 2 0 0 2 3 0 1 1 3 1 1 2 2 Zig-zag scanning: 3 3 0 0 0 0 2 2 1 1 1 1 3 3 2 2. RLE: (3,2) (0,4) (2,2) (1,4) (3,2) (2,2). Unseen problem. Applying known algorithms. 3 marks for each step. 8
Question Total: 25 Marks 9 PLEASE TURN OVER
Q4. (a) Explain why lossy data compression is sometimes preferred over lossless? [2] In some scenarios (multimedia data) a lossy method can produce a much smaller compressed file than any lossless method, while the loss of information may remain imperceptable by a human. Bookwork. (b) Consider the following DNA fragment:...gtacccgacacttccgtccccttc... Assume that the frequencies of symbols in the rest of the sequence are the same as in this fragment. Estimate the probabilities of each symbol {A, G, T, C} and hence derive the Huffman code for each. Estimate the average number of bits per symbol required to encode the sequence using Huffman code under these circumstances. P (A) = 0.125, P (G) = 0.125, P (T ) = 0.25, P (C) = 0.5. [3] The Huffman codes are: H(A) = 000, H(G) = 001, H(T) = 01, H(C) = 1. [2] Average number of bits per symbol is: (12 1 + 6 2 + 3 3 + 3 3)/24 = 1.75. 8 Marks Unseen problem. Applying known algorithm. (c) What advantage does arithmetic coding offer over Huffman coding for data compression? [3] Huffman coding assumes an integer number (k) of bits for each symbol hence k is never less than 1. Arithmetic coding can represent fractional number of bits and can thus achieve better compression ratios. Bookwork. (d) Given the following string as input: ABRACADABRA with the initial dictionary below, encode the sequence with the LZW algorithm, showing the intermediate steps. [12] Index Entry 1 A 2 B 3 C 4 D 5 R [2] 10
w k output index symbol ------------------------------------ NIL A A B 1 6 AB B R 2 7 BR R A 5 8 RA A C 1 9 AC C A 3 10 CA A D 1 11 AD D A 4 12 DA A B AB R 6 13 ABR R A RA EOF 8 The output is therefore: 1 2 5 1 3 1 4 6 8. Unseen problem applying algorithms covered in lectures. 3 marks for keeping w, 3 marks for appropriate allocation of index, 3 marks for symbol table and 3 marks for output. Question Total: 25 Marks 11X END OF EXAMINATION