Overview of Computer Science CSC 101 Summer 2011 Analog, Binary and Digital Concepts Digitization iti Lecture 4 July 11, 2011 Announcements Writing Assignment #1 Due Today. Hand it to me after class if you haven t already Make sure you have the electronic copy with you for lab tomorrow Lab#1 is tomorrow (8am) Be sure to read the prelab tonight 2 Objectives Analog vs. digital information Binary encoding of information bits and bytes Digitization 3 1
Processing Data For a device to process data, what three steps are required? Input some data Process the data (perform some planned operations on the data) Output the results A computer is any device that processes data Not necessarily only digital data 4 Analog Information Analog information is what we experience directly Sights, sounds, textures, smells, tastes, etc. Analog info is continuous and infinitely variable Example: monitoring the outside temp through the day using an analog thermometer 50 5 An Analog Computer A very simple analog computer is a mechanical thermostat Inputs : Measured temperature Desired temperature ( setpoint ) Executes a simple program: If temp > setpoint then AC.on Output is the action of turning the AC on or off temp and setpoint are both analog values Temperature causes a spring to stretch or shrink Setpoint is set by turning a dial Both of these are continuous, infinitely variable values 6 2
Digital Information Digital information is discrete Definite, distinct, precise Enumerable (countable) 56.5 F Finite Example: measuring temperature with a digital thermometer Time 12:00 AM 12:30 AM 1:00 AM 1:30 AM 2:00 AM 2:30 AM 3:00 AM Temperature 56.5 54.9 54.0 53.5 53.3 53.1 53.0 7 Analog vs. Digital Information Advantages of digital information: Efficient storage and transfer Unlimited absolute replication Can be compressed Easily manipulated Editing, combining, etc. We don t use many analog computers today Digital computers give us all the advantages of being able to process digital information 8 Bits and Bytes Computers contain lots of on/off switches A relay, vacuum tube, or transistor acts like a switch either on or off Let s say a switch that is on represents the digit 1 and off represents 0 Digital computers represent all data using only 1s and 0s Each of the billions of transistors in a computer are either on or off A single digit (1 or 0) is called a bit (binary digit) A bit is the smallest possible amount of information Like an atom of data One bit provides only a minimum amount of data: 1 or 0; Yes or No; On or Off; Up or Down; Stop or Go any two-state value Anything beyond a simple two-state value requires more than one bit 9 3
Bits and Bytes A single light bulb is one bit of information on or off; yes or no The light gives the answer (yes or no), but you need to know the question One if by land, two if by sea 10 Bits and Bytes A single light bulb is one bit of information on or off But a whole bunch of light bulbs, arranged in a proper pattern, can give lots of information (such as a scoreboard), even though each light is only on or off 11 Bits and Bytes A bit is the smallest possible amount of information yes/no, on/off, 0/1, etc. One bit doesn t give us much information, but many bits together can give much more An image (maybe on a scoreboard) Words Sounds Numbers other than 0 or 1 How can we represent numbers using bits? 12 4
Bits and Bytes One bit can represent only 2 things on or off, yes or no, 0 or 1 Two bits can represent 4 things There are 4 different patterns: 00, 01, 10, 11 1-bit Binary Decimal 0 0(off) 0 1 1(on) 1 2-bit Binary Decimal 00 0 01 1 10 2 11 3 13 Bits and Bytes One bit can represent only 2 things on or off, yes or no, 0 or 1 Two bits can represent 4 things There are 4 different patterns: 00, 01, 10, 11 Eight bits can represent 256 things There are 256 different patterns possible with eight bits A group of 8 consecutive bits is called a byte 8-bit Binary Decimal 00000000 0 00000001 1 00000010 2 00000011 3 00000100 4 00000101 5 11111110 254 11111111 255 14 Bits and Bytes Bytes are usually grouped for convenience 1 typed character is (usually) 1 byte 1 KB (kilobyte) is about 1,000 bytes (actually 1024 = 2 10 ) A single typed manuscript page is about 1,500 characters about 1.5 KB 1 MB (megabyte) is about 1,000 KB, or a million bytes 1 GB (gigabyte) is about 1,000 MB, or a billion bytes The WFU T60 ThinkPad has 1 GB of RAM memory and a 100-GB hard disk 100 GB is about 100,000,000 typed pages 1 TB (terabyte) is about 1,000 GB, or a trillion bytes 1 TB of data, if on typed pages of paper would be a stack of paper 50 miles high The print collection of the Library of Congress is about 10 TB 15 5
Bits and Bytes 1 PB (petabyte) is about 1,000 TB (1,000,000,000,000,000 bytes) A stack of paper more than 6 times the diameter of the Earth... 1/5 th the distance to the Moon! All material ever printed on paper is estimated to be about 200 petabytes Google processes many petabytes of data each day (http://portal.acm.org/citation.cfm?doid=1327452.1327492) 1 EB (exabyte) is about 1,000 PB (1,000,000,000,000,000 bytes) All the words ever spoken by any human, ever, would be about 5 EB of text Next comes zettabyte, yottabyte, etc. Check out How Much Data is That http://www.jamesshuggins.com/h/tek1/how_big.htm 16 Origin of the Term Byte The term byte was coined by Werner Buchholz, a researcher at IBM, in 1956 during the early design phase for the IBM Stretch computer (the company s first supercomputer). It was a modification of the word bite that was intended to avoid accidentally misspelling it as bit. The movement toward an eight-bit byte began in late 1956. A major reason that eight was considered the optimal number was that seven bits can define 128 characters (as against only 64 characters for six bits), which is sufficient for the approximately 100 unique codes needed for the upper and lower case letters of the English alphabet as well as punctuation marks and special characters, and the eighth bit could be used as a parity check (i.e., to confirm the accuracy of the other bits). This size was later adopted by IBM's highly popular System/360 series of mainframe systems [1964] and this was a key factor in its eventually becoming the industry-wide standard. From http://www.linfo.org/byte.html Half of an eight-bit byte (four bits) is sometimes called (playfully) a nibble (sometimes spelled nybble) or more formally a hex digit. The nibble is often called a semioctet in a networking or telecommunication context and also by some standards organisations. The eight-bit byte is often called an octet in formal contexts such as industry standards, as well as in networking and telecommunication. This is also the word used for the eight-bit quantity in many non-english languages, where the pun on bite does not translate. From http://www.wordiq.com/definition/byte 17 Etymology of Unit Prefixes 1. Kilo 10 3 from Greek khilioi = 1000 2. Mega 10 6 from Greek megas = great, e.g., Alexandros Megos (Alexander the Great) 3. Giga 10 9 from Latin gigas = giant 4. Tera 10 12 from Greek teras = monster 5. Peta 10 15 from Greek pente = five, because it s the fifth prefix penta N = peta 6. Exa 10 18 from Greek hex = six, because it s the sixth prefix Hexa H =exa 7. Zetta 10 21 the last letter of the Latin alphabet (similar to the Greek letter Zeta) 8. Yotta 10 24 the penultimate letter of the Latin alphabet (similar to the Greek Iota) 9. Xona 10 27 10. Weka 10 30 The first prefix is number-derived; the second, third, and fourth are based on mythology. 11. Vunda 10 33 12. Uda 10 36 The fifth and sixth are just that: fifth and sixth. 13. Treda 10 39 14. Sorta 10 42 With the seventh, another fork has been taken. The General Conference of 15. Rinta 10 45 Weights and Measures (Conférence Générale des Poids et Mesures, CGPM) 16. Quexa 10 48 has now decided to name the prefixes, starting with the seventh, with the 17. Pepta 10 51 18. Ocha 10 54 letters of the Latin alphabet, but starting from the end. Thus, going 19. Nena 10 57 backwards through the Latin alphabet, the next prefixes will be: 20. Minga 10 60 21. Luna 10 63 18 6
Digital Information Digital computers process digital information Digital information is discrete; however, natural forms of information are analog and continuous The process of converting information i to a digital i form is called digitization Both discrete and analog information may be digitized Information that is already discrete (numbers, text characters, etc.) is easily represented in a digital form Analog information must be converted in some way 19 Digitizing Analog Information Text and numbers are discrete information Digitization is simply a matter of conversion from one discrete form to another Analog information is continuous (non-discrete) Must be transformed into a discrete form for digitizing Analog information is digitized in two steps: 1. Sampling: Discrete samples are chosen to represent the continuous data 2. Quantizing: Each sample is assigned a particular number 20 Digitizing Analog Information An example using an image 1. Sampling Choose discrete pixels, or picture elements 2. Quantizing Assign a number to each pixel 21 7
Digitizing Analog Information Sample: break up the data into pixels 22 Digitizing Analog Information Sample: break up the data into pixels Average the contents of each pixel 23 Digitizing Analog Information Sample: break up the data into pixels Average the contents of each pixel Quantize: assign a number to represent the gray level of each pixel (e.g. from 0 15, where 0 = black and 15 = white ) 24 8
Digitizing Analog Information The quality of the digitized image depends on Number/size of pixels Number of different levels used in quantization The size of the data file depends on the same factors Tradeoff between image quality and file size 25 Digitizing Analog Data Another example: temperature data Step 1: sampling How many samples do we need? Is once a day sufficient? 50 73.2 26 Digitizing Analog Data How about twice a day? 50 66.3 72.5 27 9
Digitizing Analog Data How about every two hours? 50 28 Digitizing Analog Data How about every two hours? More accurate representation But, still not complete 50 29 Digitizing Analog Data Adding more samples increases the fidelity (accuracy) of the representation But, still not exactly identical to the analog data Still have the tradeoff between data quality and file size 50 30 10