Video
In filmmaking and video production, footage is the raw, unedited material as it had been originally filmed by movie camera or recorded by a video camera which must be edited to create a motion picture, video clip, television show or similar completed work. Video signals are separated into several channels for recording and transmission. There are different methods of color channel separation, depending on the video format and its historical origins.
For example, broadcast video devices were originally designed for black-and-white video, and color was added later. This is still evident in today s video formats that break image information into separate black-andwhite and color information. On the other hand, video and image processing on computers is more flexible and developed later, so a three-color RGB model was adopted instead of a luma-chroma model.
Video signal formats C:\Users\Mac\Downloads\slide5.pdf
NTSC NTSC television channel occupies a total bandwidth of 6 MHz The actual video signal is transmitted between 500 khz and 5.45 MHz above the lower bound of the channel. The video carrier is 1.25 MHz above the lower bound of the channel. The color subcarrier is 3.579545 MHz above the video carrier. The main audio carrier is 4.5 MHz above the video carrier.
PAL Phase Alternating Line, is a colour encoding system for analogue television used in broadcast television systems in most countries broadcasting. PAL uses a subcarrier carrying the chrominance information added to the luminance video signal to form a composite video baseband signal. The frequency of this subcarrier is 4.43361875 MHz. The name "Phase Alternating Line" describes the way that the phase of part of the colour information on the video signal is reversed with each line, which automatically corrects phase errors in the transmission of the signal by cancelling them out, at the expense of v frame colour resolution.
PAL The 4.43361875 MHz frequency of the colour carrier is a result of 283.75 colour clock cycles per line plus a 25 Hz offset to avoid interferences.
SECAM ( Sequential Color with Memory ) SECAM differs from the other color systems by the way the R-Y and B-Y signals are carried. First, SECAM uses frequency modulation to encode chrominance information on the sub carrier. Second, instead of transmitting the red and blue information together, it only sends one of them at a time, and uses the information about the other color from the preceding line. It uses an analog delay line, a memory device, for storing one line of color information. This justifies the "Sequential, With Memory" name.
SECAM Because SECAM transmits only one color at a time, it is free of the color artifacts present in NTSC and PAL resulting from the combined transmission of both signals. This means that the vertical color resolution is halved relative to NTSC. Because the FM modulation of SECAM's color sub carrier is insensitive to phase (or amplitude) errors, phase errors do not cause loss of color saturation in SECAM. It uses YUV color model. This encoding is suitable for applications that transmit only one signal at a time.
SECAM SECAM transmissions are more robust over longer distances than NTSC or PAL.
Property NTSC PAL SECAM Lines 525 625 Frame rate 30 fps 25 fps Resolution 720 x 480; 704 x 480; 352 x 480; 352 x 240 720 x 576; 704 x 576; 352 x 576; 352 x 288 720x576 Details This is also called "composite video" because all the video information synchronization, luminance, and color are combined into a single analog signal. Has some color distortions. By reversing the relative phase of the color signal components on alternate scanning lines, this system avoids the color distortion that appears in NTSC. The color information is transmitted sequentially (R-Y followed by B-Y, etc.) for each line and conveyed by a frequency modulated subcarrier that avoids the distortion arising during NTSC transmission.
EDTV CCIR CIF SIF HDTV Video transmission standards
Common concepts Interlacing: Interlacing was invented as a way to reduce flicker in CRT video displays without increasing the number of complete frames per second, which would have sacrificed image detail to remain within the limitations of a narrow bandwidth. Progressive scan: Each refresh period updates all scan lines in each frame in sequence. When displaying a natively progressive broadcast or recorded signal, the result is optimum spatial resolution of both the stationary and moving parts of the image.
EDTV Enhanced-definition television, or extendeddefinition television (EDTV) is an American Consumer Electronics Association (CEA) marketing shorthand term for certain digital television formats and devices. Specifically, this term defines formats that deliver a picture superior to that of SDTV but not as detailed as HDTV. The term refers to devices capable of displaying 480-line or 576-line signals in progressive scan. As EDTV signals require more bandwidth (due to frame doubling)
EDTV EDTV broadcasts use less digital bandwidth than HDTV, so TV stations can broadcast several EDTV stations at once. EDTV signals are broadcast with non-square pixels. Progressive displays (such as plasma displays and LCDs) can show EDTV signals without the need to interlace them first. This can result in a reduction of motion artifacts. However to achieve this most progressive displays require the broadcast to be frame doubled (i.e., 25 to 50 and 30 to 60) to avoid the same motion flicker issues that interlacing fixes.
HDTV High-definition television (HDTV) is a digital television broadcasting system with a significantly higher resolution than traditional formats (NTSC, SECAM, PAL). HDTV is a digital TV broadcasting format where the broadcast transmits widescreen pictures with more detail and quality than found in a standard analog television, or other digital television formats. Any scan line count greater than 480 is generally considered "High Definition". Even 480 lines transmitted as progressive scan is considered a "High Definition" image. The top of the heap would be the 1080 line HDTV standard which several broadcasters have elected to support.
CCIR CCIR is the Consultative Committee for International Radio, one of the most important standards it has produced is CCIR-601, for component digital video. Table shows some of the digital video specifications, all with an aspect ratio of 4:3. The CCIR 601 standard uses an interlaced scan, so each field has only half as much vertical resolution
CIF Format used to standardize the horizontal and vertical resolutions in pixels of ycbcr sequences in video signals, commonly used in video teleconferencing systems. CIF stands for Common Intermediate Format specified by the CCITT (International Telegraph and Telephone Consultative Committee). The idea of CIF is to specify a format for lower bit rate. QCIF stands for Quarter-CIF To have one fourth of the area, as "quarter" implies, the height and width of the frame are halved.
Digitization of video The basic process used to digitize images to create video sequences is the sampling of image elements (pixels) for intensity and color. For color video, each element contains intensity (brightness) and color components (red, green, and blue - RGB). These components are periodically sampled and converted into a digital format. Analog video digitization involves analyzing each scan line of video, separating the color and intensity levels and digitizing each component.
For digital video capturing from optical sensors (such as video recorders with CCD sensors), each pixel element is converted into a color type (red, green, and blue) which has an intensity level (brightness). Converting video signals at 30 frames per second into digital streams of data results in large amounts of data. For color images, each line of image is divided (filtered) into its color components (red, green and blue components). Each position on filtered image is scanned or sampled and converted to a level. Each sampled level is converted into a digital signal.
Video file formats MOV Real Video H-261 H-263 Cinepack Nerodigtal
MOV MOV is an MPEG 4 video container file format used in Apple's Quicktime program. MOV files use Apple s proprietary compression algorithm. Apple introduced the MOV file format in 1998. The format specifies a multimedia container file that contains one or more tracks, each of which stores a particular type of data: audio, video, effects, or text (e.g. for subtitles). MOV and MP4 files are similar and can both be played by QuickTime. However, MP4 files are recognized as an international standard and are more widely supported than MOV files.
Real video RealVideo is a suite of proprietary video compression formats developed by RealNetworks. Supported on many platforms, including Windows, Mac, Linux, Solaris, and several mobile phones. RealVideo codecs are identified by four-character codes. RV10 and RV20 are the H.263-based codecs. RV30 and RV40 are RealNetworks' proprietary codecs.
RealVideo can be played from a RealMedia file or streamed over the network using the Real Time Streaming Protocol (RTSP). However, RealNetworks uses RTSP only to set up and manage the connection. The actual video data is sent with their own proprietary Real Data Transport (RDT) protocol.
H.261 H.261 is an ITU-T video compression standard. It is the first member of the H.26x family of video coding standards in the domain of the ITU-T Video Coding Experts Group (VCEG), and was the first video codec that was useful in practical terms.
H.261 was originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s. The coding algorithm was designed to be able to operate at video bit rates between 40 kbit/s and 2 Mbit/s. \ Widely used for video conferencing in the 128 Kbits/second to 384 Kbits/second range. This is a block Discrete Cosine Transform method. TheH.261 standard actually only specifies how to decode the video. Encoder designers were left free to design their own encoding algorithms, as long as their output was constrained properly to allow it to bedecoded byany decoder made according to the standard..
Encoders are also left free to perform any pre-processing they want to their input video, and decoders are allowed to perform any post-processing they want to their decoded video prior to display. One effective post-processing technique that became a key element of the best H.261-based systems is called deblocking filtering. This reduces the appearance of block-shaped artifacts caused by the block-based motion compensation and spatial transform parts of the design.
1. A preprocessor converts the video at the output of a camera to a new format. 2. The coding parameters of the compressed video signal are multiplexed and then combined with the audio, data and end-to-end signaling for transmission. 3. The transmission buffer controls the bit rate, either by changing the quantizer step size at the encoder, or in more severe cases by requesting reduction in frame rate, to be carried out at the preprocessor.
Nerodigital Nero Digital is a brand name applied to a suite of MPEG-4-compatible video and audio compression codecs developed by Nero AG of Germany and Ateme of France. The audio codecs are integrated into the Nero Digital Audio+ audio encoding tool for Microsoft Windows, and the audio & video codecs are integrated into Nero's Recode DVD ripping software. The video streams generated by Nero Digital can be played back on some stand-alone hardware players and software media players such as the company's own Nero Showtime.
Cinepak is a lossy video codec developed by Peter Barrett at SuperMac Technologies, and released in 1991 with the Video Spigot, and then in 1992 as part of Apple Computer's QuickTime video suite. One of the first video compression tools to achieve full motion video on CD-ROM, it was designed to encode 320 240 resolution video at 1 (150 kbyte/s) CD-ROM transfer rates. The original name of this codec was CompactVideo, which is why its FourCC identifier is CVID. The codec was ported to the Microsoft Windows platform in 1993.
Cinepak is based on vector quantization, which is a significantly different algorithm from the DCT algorithm used by most current codecs. This permitted implementation on relatively slow CPUs (video encoded in Cinepak will usually play fine even on a 25 MHz Motorola 68030). Cinepak files tend to be about 70% larger than similar quality MPEG-4 Part 2. Codebooks V1 and V4 2*2 pixel blocks 1 block = 4 luma values or 4 luma & 2 chroma values Quantization 0... 255 0... 255 4*4 pixel blocks 2*2 pixel blocks
For processing, Cinepak divides a video into key (intra-coded) images and inter-coded images. codebooks are transmitted from scratch codebook entries are selectively updated. Each image is further divided into a number of horizontal bands. The codebooks can be updated on a per-band basis. Each band is divided into 4 4 pixel blocks. Each block can be coded either from the V1 or from the V4 codebook.
When coding from the V1 codebook, one codebook index per 4 4 block is written to the bit stream, and the corresponding 2 2 codebook entry is upscaled to 4 4 pixels. When coding from the V4 codebook, four codebook indices per 4 4 block are written to the bit stream, one for each 2 2 sub-block. Alternatively to coding from the V1 or the V4 codebook, a 4 4 block in an inter-coded image can be skipped. A skipped block is copied unchanged from the previous frame in a conditional replenishment fashion. The data rate can be controlled by adjusting the rate of key frames and by adjusting the permitted error in each block.
Android video formats Android Supported Video Format/Codec Supported Video File Types/Container Formats Details H.263 3GPP (.3gp) MPEG-4 (.mp4) H.264 AVC 3GPP (.3gp) MPEG-4 (.mp4) Baseline Profile (BP) MPEG-TS (.ts, AAC audio only, not seekable, Android 3.0+) MPEG-4 SP VP8 3GPP (.3gp) WebM (.webm) Matroska (.mkv, Android 4.0+) Streamable only in Android 4.0 and above
The 3GP and 3G2 file formats are both structurally based on the ISO base media file format defined in ISO/IEC 14496-12 - MPEG-4 Part 12. 3GP and 3G2 are container formats similar to MPEG-4 Part 14 (MP4), which is also based on MPEG-4 Part 12. The 3GP and 3G2 file format were designed to decrease storage and bandwidth requirements to accommodate mobile phones. 3GP and 3G2 are similar standards, but with some differences: 3GPP file format was designed for GSM-based Phones and may have the filename extension.3gp 3GPP2 file format was designed for CDMA-based Phones and may have the filename extension.3g2 Some cell phones use the.mp4 extension for 3GP video.
The Matroska Multimedia Container (.mkv) is an open standard free container format, a file format that can hold an unlimited number of video, audio, picture, or subtitle tracks in one file. It is intended to serve as a universal format for storing common multimedia content, like movies or TV shows.
Video editing
DVD Formats DVD (also known as "Digital Versatile Disc" or "Digital Video Disc") is a popular optical disc storage media format mainly used for video and data storage. Most DVDs are of the same dimensions as compact discs (CDs) but store more than 6 times the data. DVD-ROM has data which can only be read and not written, DVD-R can be written once and then functions as a DVD-ROM, and DVD-RAM or DVD-RW holds data that can be re-written multiple times. DVD-Video and DVD-Audio discs respectively refer to properly formatted & structured video and audio content. Other types of DVD discs, including those with video content, may be referred to as DVD-Data discs.
DVD Technology DVD uses 650 nm wavelength laser diode light as opposed to 780 nm for CD. This permits a smaller spot on the media surface that is 1.32 μm for DVD while it was 2.11 μm for CD. Writing speeds for DVD were 1x, that is 1350 kb/s (1318 KiB/s), in first drives and media models. More recent models at 18x or 20x have 18 or 20 times that speed. Note that for CD drives, 1x means 153.6 kb/s (150 KiB/s), 9 times slower.
DVD recordable and rewritable HP initially developed recordable DVD media from the need to store data for back-up and transport. DVD recordables are now also used for consumer audio and video recording. Three formats were developed: DVD-R/RW (minus/dash), DVD+R/RW (plus), DVD-RAM.
Dual layer recording Dual Layer recording allows DVD-R and DVD+R discs to store significantly more data, up to 8.5 Gigabytes per side, per disc, compared with 4.7 Gigabytes for single layer discs. The drive with Dual Layer capability accesses the second layer by shining the laser through the first semi-transparent layer. The layer change mechanism in some DVD players can show a noticeable pause, as long as two seconds by some accounts.
DVD Video DVD-Video is a standard for storing video content on DVD media. DVD-Video discs use either 4:3 or 16:9 aspect ratio MPEG-2 video, stored at a resolution of 720 480 (NTSC) or 720 576 (PAL) at 24, 30, or 60 FPS. Audio is commonly stored using the Dolby Digital (AC-3) or Digital Theater System (DTS) formats, ranging from 16-bits/48kHz to 24-bits/96kHz format with monaural to 7.1 channel "Surround Sound presentation, and/or MPEG-1 Layer 2. DVD-Video also supports features like menus, selectable subtitles, multiple camera angles, and multiple audio tracks.
DVD-Audio DVD-Audio is a format for delivering high-fidelity audio content on a DVD. It offers many channel configuration options (from mono to 7.1 surround sound) at various sampling frequencies (up to 24-bits/192kHz). Compared with the CD format, the much higher capacity DVD format enables the inclusion of considerably more music (with respect to total running time and quantity of songs) and/or far higher audio quality (reflected by higher linear sampling rates and higher vertical bitrates, and/or additional channels for spatial sound reproduction).
MPEG MPEG video compression is used in many current and emerging products. It is at the heart of digital television set-top boxes, DSS, HDTV decoders, DVD players, video conferencing, Internet video, and other applications. These applications benefit from video compression in the fact that they may require less storage space for archived video information, less bandwidth for the transmission of the video information from one point to another, or a combination of both.
Moving Picture Expert Group worked to generate the specifications under ISO, & IEC, the International Electrotechnical Commission. "MPEG video" actually consists of two finalized standards, MPEG-1 and MPEG-2, with a third standard, MPEG-4, in the process of being finalized at the time this paper was written. The MPEG-1 & -2 standards are similar in basic concepts. They both are based on motion compensated blockbased transform coding techniques, while MPEG-4 uses software image construct descriptors, for target bit-rates in the very low range, < 64Kb/sec.
Finalized in 1991 MPEG-1 Referred to as source input format (SIF) video Was originally optimized to work at video resolutions of or commonly. 352x240 pixels at 30 frames/sec (NTSC based) 352x288 pixels at 25 frames/sec (PAL based), MPEG-1 resolution may go as high as 4095x4095 at 60 frames/sec. The bit-rate is optimized for applications of around 1.5 Mb/sec, but can be used at higher rates if required. MPEG-1 is defined for progressive frames only, and has no direct provision for interlaced video applications, such as in broadcast television applications.
MPEG-2 Addressed issues directly related to digital television broadcasting Such as the efficient coding of field-interlaced video and scalability. The target bit-rate was raised to between 4 and 9 mb/sec, very high quality video. Mpeg-2 consists of profiles and levels. Bit-stream scalability, color-space resolution image resolution and the maximum bit-rate per profile Example: M a in p r o f ile, m a in le v e l ( m p @ m l) w it h 720x480 r e s o lu t io n v id e o at 30 f r a m e s / s e c, at b it - r a t e s up to 15 mb/ s e c f o r n t s c
MPEG Video Layers MPEG video is broken up into a hierarchy of layers to help with error handling, random search and editing, and synchronization, for example with an audio bitstream. Video Sequence Layer It is a self contained bits-stream Ex. Coded movie or advertisement Group of pictures Composed of 1 or more groups of intra frames (I) and non intra pictures (P and B) Picture layer itself Slice Layer Each slice is a contiguous sequence of raster ordered macro-blocks
Each slice consists of macro-blocks, which are 16x16 arrays of luminance pixels, or picture data elements, with 2 8x8 arrays of associated chrominance pixels. The macro-blocks can be further divided into distinct 8x8 blocks, for further processing such as transform coding. Each of these layers has its own unique 32 bit start code defined in the syntax to consist of 23 zero bits followed by a one, then followed by 8 bits for the actual start code. These start codes may have as many zero bits as desired preceding them.
A MPEG "film" is a sequence of three kinds of frames: Frames I-Frame (Intra-coded) P- Frame(Intercoded) B- Frame(Intercoded)
Video Filter MPEG uses the YCbCr color space to represent the data values instead of RGB, where Y is the luminance signal, Cb is the blue color difference signal, and Cr is the red color difference signal. A macroblock can be represented in several different manners when referring to the YCbCr color space such as 4:4:4, 4:2:2, and 4:2:0 video. 4:2:0 contains one quarter of the chrominance information. Although MPEG-2 has provisions to handle the higher chrominance formats for professional applications, most consumer level products will use the normal 4:2:0 mode.
The 4:2:0 representation allows an immediate data reduction from 12 blocks/macroblock to 6 blocks/macroblock, or 2:1 compared to full bandwidth representations such as 4:4:4 or RGB. To generate this format without generating color aliases or artifacts requires that the chrominance signals be filtered.
DCT 8x8 block values are coded by means of the discrete cosine transform. 120 108 90 75 69 73 82 89 127 115 97 81 75 79 88 95 134 122 105 89 83 87 96 103 137 125 107 92 86 90 99 106 131 119 101 86 80 83 93 100 117 105 87 72 65 69 78 85 100 88 70 55 49 53 62 69 89 77 59 44 38 42 51 58 The normal way is to determine the brightness of each of the 64 pixels and to scale them to some limits, say from 0 to 255 *, whereby "0" means "black" and "255" means "white".
But you can define all the 64 values by only 5 integers if you apply the following formula called discrete cosine transform (DCT) 700 90 100 0 0 0 0 0 90 0 0 0 0 0 0 0-89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The decoder can reconstruct the pixel values by the following formula called inverse discrete cosine transform (IDCT):
Quantization: This operation is used to force as many of the DCT coefficients to zero, or near zero, as possible within the boundaries of the prescribed bit-rate and video quality parameters. Run Length VLC: Considerable savings can be had by representing the fairly large number of zero coefficients in a more effective manner, and that is the purpose of run-length amplitude coding of the quantized coefficients. But before that process is performed, more efficiency can be gained by reordering the DCT coefficients.
Scanning of the example coefficients in a zigzag pattern results in a sequence of numbers as follows: 8, 4, 4, 2, 2, 2, 1, 1, 1, 1, (12 zeroes), 1, (41 zeroes). This sequence is then represented as a run-length (representing the number of consecutive zeroes) and an amplitude (coefficient value following a run of zeroes). These values are then looked up in a fixed table of variable length codes, where the most probable occurrence is given a relatively short code, and the least probable occurrence is given a relatively long code.
Video Buffer and Rate Control A constant bit-rate may be provided by the output of the encoder buffer, yet underflow or overflow may be prevented without severe quality penalties such as the repeating or dropping of entire video frames.
Inter-frame construction Imagine an I-frame showing a triangle on white background! A following P-frame shows the same triangle but at another position. Prediction means to supply a motion vector which declares how to move the triangle on I-frame to obtain the triangle in P-frame. This motion vector is part of the MPEG stream and it is divided in a horizontal and a vertical part. These parts can be positive(motion to the right or downwards) or motion to the left or motion upwards).
The red rectangle is shifted and rotated by 5 to the right. So a simple displacement of the red rectangle will cause a prediction error. Therefore the MPEG stream contains a matrix for compensating this prediction error. Thus, the reconstruction of inter coded frames goes ahead in two steps: 1. Application of the motion vector to the referred frame; 2. Adding the prediction error compensation to the result;
The input bitstream buffer consists of memory that operates in the inverse fashion of the buffer in the encoder. For fixed bit-rate applications, the constant rate bitstream is buffered in the memory and read out at a variable rate depending on the coding efficiency of the macroblocks and frames to be decoded.
The VLD is most computationally expensive portion of the decoder because it must operate on a bit-wise basis with table look-ups performed at speeds up to the input bit-rate. The inverse quantizer block multiplies the decoded coefficients by the corresponding values of the quantization matrix and the quantization scale factor. Clipping of the resulting coefficients is performed to the region 2048 to +2047, then an IDCT mismatch control is applied to prevent long term error propagation within the sequence.
MPEG-4 1. MPEG-4 uses media objects to represent aural, visual or audiovisual content. These media objects can be combined to form compound media objects. 2. MPEG-4 multiplexes and synchronizes the media objects before transmission to provide QoS and it allows interaction with the constructed scene at receiver s machine. 3. MPEG-4 organizes the media objects in a hierarchical fashion where the lowest level has primitive media objects like still images, video objects, audio objects. 4. MPEG-4 has a number of primitive media objects which can be used to represent 2 or 3-dimensional media objects. 5. MPEG-4 also defines a coded representation of objects for text, graphics, synthetic sound, talking synthetic heads. 6. MPEG-4 provides a standardized way to describe a scene. Media objects can be places anywhere in the coordinate system. Transformations can be used to change the geometrical or acoustical appearance of a media object.
Visual part of the MPEG-4 standard describes methods for compression of images and video, compression of textures for texture mapping of 2-D and 3-D meshes, compression of implicit 2-D meshes, compression of timevarying geometry streams that animate meshes. It also provides algorithms for random access to all types of visual objects as well as algorithms for spatial, temporal and quality scalability, content-based scalability of textures, images and video. Algorithms for error robustness and resilience in error prone environments are also part of the standard. For synthetic objects MPEG-4 has parametric descriptions of human face and body, parametric descriptions for animation streams of the face and body.
1. MPEG-4 also describes static and dynamic mesh coding with texture mapping, texture coding with view dependent applications. 2. MPEG-4 supports coding of video objects with spatial and temporal scalability. 3. Scalability allows decoding a part of a stream and construct images with reduced decoder complexity (reduced quality), reduced spatial resolution, reduced temporal resolution., or with equal temporal and spatial resolution but reduced quality. Scalability is desired when video is sent over heterogeneous networks, or receiver can not display at full resolution (limited power)
Robustness in error prone environments is an important issue for mobile communications. MPEG-4 has 3 groups of tools for this: 1. Resynchronization tools enables the resynchronization of the bit-stream and the decoder when an error has been detected. 2. After synchronization data recovery tools are used to recover the lost data. 3. These tools are techniques that encode the data in an error resilient way. Error concealment tools are used to conceal the lost data. Efficient resynchronization is key to good data recovery and error concealment.
Scene descriptors Object descriptors Video Scene VOP1 VOP2 MUX Storage VOP3 Audio encoding