Data Storage and Manipulation

Data Storage Bits and Their Storage: Gates and Flip-Flops, Other Storage Techniques, Hexadecimal notation Main Memory: Memory Organization, Measuring Memory Capacity Mass Storage: Magnetic Disks, Compact Disks, Magnetic Tape, File Storage and Retrieval Representing Information as Bit Patterns: Text, Numeric Values, Images, Sound The Binary System: Addition, Fraction

Storing Integers: Two s Complement Notation, Excess Notation Storing Fractions: Floating-Point, Truncation Errors Data Compress: Generic Data Compression Techniques, Compressing Images Communication Errors: Parity Bits, Error- Correcting Codes

The AND operation 0 AND 0 = 0 0 AND 1 = 0 1 AND 0 = 0 1 AND 1 = 1

The OR operation 0 OR 0 = 0 0 OR 1 = 1 1 OR 0 = 1 1 OR 1 = 1

The XOR operation 0 XOR 0 = 0 0 XOR 1 = 1 1 XOR 0 = 1 1 XOR 1 = 0

The NOT operation NOT 0 = 1 NOT 1 = 0

Gates and Flip-Flops Gate: a device that produces the output of a Boolean operation when given the operation s input values Boolean operation: operations that manipulate true/false values Flip-flop: a circuit that produces an output value of 0 or 1 that remains constant until a temporary pulse from another circuit causes it to shift to the other value.

Figure 1.4: Setting the output of a flip-flop to 1 (continued)

Figure 1.4: Setting the output of a flip-flop to 1

The hexadecimal coding system 0000 0 1000 8 0001 1 1001 9 0010 2 1010 10 0011 3 1011 11 0100 4 1100 12 0101 5 1101 13 0110 6 1110 14 0111 7 1111 15

Figure 1.7: The organization of a byte-size memory cell

Figure 1.8: Memory cells arranged by address

Magnetic Disks Seek Time: the time to move the read/write heads from one track to the desired track. Rotation delay/latency time: Once the head has reached the correct track, we must wait for the desired sector to rotate under the read/write head. Transfer time: the time to transfer a block of bits, typical a sector. Access time = Seek time + Latency time + Transfer time + Controller overhead

FIGURE 1.9 A disk storage system

Compact Disks 12 centimeters in diameter and consist of reflective material covered with a clear protective coating. Information is recorded on them by creating variations in their reflective surfaces. This information can then be retrieved by means of a laser beam that monitors irregularities on the reflective surface.

CD-ROM: CD-Read-Only-Memory Information on a CD is stored on a single track that spirals around the CD like a groove in an old-fashioned record. The track spirals from the inside out. This track is divided into units called sectors. All sectors contain the same amount of data and each has its own identifying markings. A sector contains 2KB data.

Information is stored at a uniform linear density over the entire spiraled track. To obtain a uniform rate of data transfer, CD players are designed to vary the CD s rotation speed depending on the location of the laser beam. CD-Rom formats have capacities slightly over 600 MB. DVD (Digital Versatile Disk) provide storage capacities on the order of 10 GB. CD-WORM: Write once, read many

Figure 1.10: CD storage format

Figure 1.11: A magnetic tape storage mechanism

File Storage and Retrieval Logical record sizes rarely match the physical record size. Several logical records residing within a single physical record or perhaps a logical record split between two or more physical records. A certain amount of unscrambling is often associated with retrieving data from mass storage systems.

Set aside an area of main memory that is large enough to hold several physical physical records and to use this memory space as a regrouping area. Updating data stored in mass storage involves transferring the data to main memory, updating the data, and then transferring the updated data back to mass storage.

Representing Information as Bit Patterns Unicode: uses a unique pattern of 16 bits to represent each symbol. ISO: International Organization for Standardization using patterns of 32 bits to represent symbols. ASCII: American Standard Code for Information Interchange 8 bits

The message Hello. in ASCII 01001000 H 01100101 e 01101100 l 01101100 l 01101111 o 00101110.

Figure 1.14: The base ten and binary systems

Figure 1.15: Decoding the binary representation 100101

An algorithm for finding the binary representation of a positive integer Step 1: Divide the value by two and record the remainder Step 2: As long as the quotient obtained is not zero, continue to divide the newest quotient by two and record the remainder. Step 3: Now that a quotient of zero has been obtained, the binary representation of the original value consists of the remainders listed from right to left in the order they were recorded.

Figure 1.17: Applying the algorithm in Figure 1.15 to obtain the binary representation of thirteen

Figure 1.18: The sound wave represented by the sequence 0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0, 0

Audio CD Sampling 44.1 khz, 16 bits (2 bytes) per sampling, two channels (stereo) 2 bytes * 2 * 44.1k =176.4 k bytes / second Beethoven 9th symphony 74 min. and 42 sec. 176.4 k * (60*74+ 42) ~ 750 M bytes MP3 (MPEG-1 Audio layer-3): using data compression to reduce the required storage space to 1/12 to 1/10.

Figure 1.19: The binary addition facts

Figure 1.20: Decoding the binary representation 101.101

Complement 67 55 = 67 (100 45) = 67 + 45 100 = 12

Figure 1.21: Two s complement notation systems

Figure 1.22: Coding the value -6 in two s complement notation using four bits

Figure 1.23: Addition problems converted to two s complement notation Overflow? E.g. 6+7= -3 or -6-8 = (-6)+(-8)=+2 Machine can make mistakes. It is treated with a special procedure.

Figure 1.24: An excess eight conversion table

Figure 1.25: An excess notation system using bit patterns of length three

Figure 1.26: Floating-point notation components

Figure 1.27: Coding the value 2 5/8 Converting 0.3 (Decimal) to binary =? Truncation (round off) error

Representing Images Bit map: an image is considered to be a collection of dots, each of which is called a pixel. Facsimile machines, video cameras, scanners. (GIF and JPEG compress such images into more manageable sizes.) can t be easily rescaled to any arbitrary size. Vector: an image is represented as a collection of lines and curves. Scalable fonts. Not able to provide the photographic quality images.

Grayscale of 3 bits

Data Compression Run-length encoding replacing long sequences of the same value with a code indicating the value that is repeated and the number of times it occurs in the sequence. e.g. 253 ones and 118 zeros and 87 ones = 458 bits Compression rate approx. ~ (3 bytes+3 bits) / 458 bits = (27/458) ~ 6%

Data Compression Relative encoding record the differences between consecutive data blocks rather than entire blocks. Picture differs slightly from the preceding one, e.g. consecutive frames of a motion picture Ex: If only 4% of the two consecutive frames differ, the compression rate ~ 4%

Data Compression Frequency-dependent encoding the length of the bit pattern used to represent a data item is inversely related to the frequency of the item s use. Huffman codes e.g. use less space for (e, t, a), and more space for (z, q, x) Ex: an article with 1000 characters (including symbols) ~ 8 bits*1000 = 8,000 bits (e, t, a) with 3 bits, and the rest with 9 bits 700 characters of (e, t,a) and 300 characters of the rest. ~ 3 bits * 700 + 9 bits * 300 = 4,800 bits => Compression rate ~ 4,800/8,000 = 60%

Lempel-Ziv encoding Adaptive dictionary encoding the dictionary is allowed to change during the encoding process. LZ77 start by actually quoting the initial part of the message, but at some point we would shift to representing future segments by triples, each consisting of two integers followed by a symbol from the message.

Figure 1.28: Decompressing xyxxyzy (5, 4, x)

LZ77 encoding xyxxyzy(5,4,x)(0,0,w)(8,6,y) xyxxyzyxxyzx(0,0,w)(8,6,y) xyxxyzyxxyzxw(8,6,y) xyxxyzy xxyzx w zyxxyzy xyxxyzyxxyzxwzyxxyzy

Compressing Images Three-byte-per-pixel GIF (Graphic Interchange Format) reducing the number of colors that can be assigned to a pixel to only 256 (good for cartoon images) Each of the 256 potential pixel values is associated with a red-green-blue combination by means of a table known as the palette. Comparison: Chinese characters reference table -> another method of data compression

JPEG (Joint Photographic Experts Group) Encompasses several methods of image representation, each with is own goals. Lossless mode Impling that no information is lost in the process of encoding the picture. Space is saved by storing the difference between consecutive pixels rather than the pixel intensities themselves. (relative encoding) These differences are then coded using a variablelength code to further conserve storage space.

JPEG (lossy mode) Each pixel is represented by three components - a brightness component and two color components The human eye is more sensitive to changes in brightness than to changes in color. Encode each brightness component but averaging the values of the color components over the four pixel blocks and recording only these 4 block averages. 4-pixel block is presented by only 6 values (4 brightness values and 2 color values) The compression rate is in the range of 1/20.

Additional space is saved by recording data that indicate how the various brightness and color components change rather than their actual values. The degree to which nearby pixel values differ can be recorded using fewer bits than would be required if the actual values were recorded. (discrete cosine transform)

~200 k bytes

~60 k bytes

MPEG Motion Picture Experts Group Start a picture sequence with an image similar to JPEG s baseline and then to represent the rest of the sequence using relative encoding techniques. The relative encoding is taken within one picture and two or more consecutive pictures. (Mostly, only small fraction of the image varies, while the rest keeps the same.) e.g. internet movie (Due to different compression rates, you may see a blurred image in some fast or drastically changing motion due to limited bandwidth.)

Communication Errors Parity bits Long bit patterns are often accompanied by a collection of parity bits making up a checkbyte.

Figure 1.29: The ASCII codes for the letters A and F adjusted for odd parity

An error in any one of the check bits will cause exactly one parity check violation, while an error in any one of the message it will cause violations of a distinct pair of parity checks.

The parity can be either or odd. If the received data are different from the assigned parity, there exist some communication errors. ->Redo the action to get correct data. Note: Parity check does not ensure the correctness of communication, e.g. even number of erroneous bits. (but the possibility is usually much lower than that of single erroneous bit.)

Communication Errors Error-Correcting Codes: Hamming distance between two patterns to be the number of bits in which the two differ. X = 1000 1011 and Y = 0100 1001 Z = 1000 0111 Hamming distance of (X,Y) = 3 Hamming distance of (Y,Z) = 5

Figure 1.30: An error-correcting code

Figure 1.31: Decoding the pattern 010100 using the code in Figure 1.30 The closest character is D.

Single-error-correcting code M 1 M 2 M 3 C 1 C 2 C 3 0 0 0 0 0 0 0 0 1 1 1 0 : exclusive OR (XOR) 0 1 0 0 1 1 0 1 1 1 0 1 1 0 0 1 0 1 C 1 = M 1 M 3 1 0 1 0 1 1 C 2 = M 2 M 3 1 1 0 1 1 0 C 3 = M 1 M 2 1 1 1 0 0 0

ECC with Hamming distance of three allows us to detect two errors per pattern and correct one error, while EEC with Hamming distance of five, to detect four errors and correct two errors.