Chapter 3 Parallel Computing As we have discussed in the Processor module, in these few decades, there has been a great progress in terms of the computer speed, indeed a 20 million fold increase during a fifty year period. This is done, mainly due to the fact that more and more transistors have been integrated into a silicon chip, from a few to tens (SSI), to hundreds(msi), to thousands(lsi), and to the billions(vlsi). 1
Moore s law This phenomenon is nicely summarized via the Moore s law: The number of transistors placed on a chip has been doubled every eighteen month. For example, Intel 8086, a processor chip made by Intel in 1978, contained 29,000 transistors, and ran at 5 MHz; and the Intel Core 2 Duo, introduced in 2006, contained 291 million transistors and ran at the speed of 2.93 GHz. Thus, during those 28 years, the number of transistors has gone up by 10,034 times, or doubled once every 24 months, or two years. 2
A picture worths how many words? More importantly, this increase of the transistors directly leads to an increase of the computer speed. In this case, the speed goes up by 586 times during this period. The following chart shows the increase of the computer speed corresponding to that of the integration number. 3
Not just the speed... Moreover, besides processing speed, some of the other capabilities of many digital electronic devices are also strongly connected to Moore s law: memory capacity, sensors and even the number and size of pixels in digital cameras. As a result, all of these technology have also been speeding up at this stunning exponential rate as well. Since Moore s law precisely describes a driving force of technological and social change in the past thirty or so years, it has been used to guide long term planning and to set targets for research and development. 4
A dead end? Unfortunately, this era of steady and rapid growth of single-processor performance over 30 years is essential over, because By doubling every eighteen months,, we have to make the wires 2 thinner every eighteen months. This has to come to an end at some point since we can t make the wires infinitely thin. Although every transistor produces only a tiny bit of heat, when you put billions of them to a tiny space, the amount do add up,..., to that at the surface of the Sun. We also have essentially done our best to dig out all the benefits of a complicated single processor architecture. 5
What to do? Fortunately, Moore s law is not completely out of the window yet. It is predicted that it will continue for another five years or so. This many transistors will no longer be used to construct a single processor, but to increase the number of independent processors in a single chip. We will then try to speed up the whole process of letting those independent processors work on the data in parallel. An analogy could be that, in the ancient time, we can only cook one thing at a time with our old fashioned stove. 6
Nowadays, with a contemporary stove, we can cook many different dishes in parallel, or at the same time, which certainly saves time. Similarly, we could cut up a big problem into many smaller ones, and run them in parallel with multiple processors. Could we? 7
They are happening everywhere... Indeed, we can find many examples of parallel computing in our work and/or life: multiple galaxies running in the Universe, multiple lanes in I-93, multiple gas pumps in most of the gas stations, etc.. 8
It is difficult... They all sound good, but it is not as easy. In the cooking example, a good chef knows that she will not always cook everything at the same time. To cook the dish of, e.g., Pepper, Onions and Pork, she has to fry the pepper, and the pork first, which can be done at the same time; then fry the onion, which is mixed with the partially fried pepper and the pork. In the multiple lane case, although the cars in different lanes can go forward in parallel, the cars in the same lane have to go forward in turn. It is the same idea to do computing in parallel. You have to figure out what parts can be done in parallel, and what have to be done in parallel. 9
An example We have been using computers to do the courses registration for quite a few years now. When adding somebody into a class, a program has to make sure, among other things, that the total number of students added into a class is no more than the cap of that class, 25 for ours. If we run course add sequentially, i.e., one by one, this is what the program will do to add another student into this calfs: if the current number < 25 then add this student Thus, before we add in another student, we always check the cap. 10
The parallel case Since the above add consists of two steps: one check and another add, when we try to add multiple requests at the same time, we might get into trouble since we don t know in what order will the steps get mixed up. For example, if there are 24 students signed up for this course, and two more students come to add into the course. What is to happen? 11
This will. If we do the add in parallel, and it happens that the arrangement of the two steps for the two adds look like the following: Request 1 time Request 2 - - Check the number t 1 - (Still 24) - - t 2 Check the number - (Still 24) Add in student t 3 - (Now 25) - - t 4 Add in student - (Now 26) Thus, as the above charts shows, we will add more students than what the cap requires. 12
Software is really hard Although we have been working with parallel computer hardware for a long time, since the late 1960 s, its programming is really difficult as we have to take care of the communication and coordination issues between the multiple processors, just like when we do conference calls, we want to make sure that only one person speaks at a time. In other words, the difficulty lies in on the software part, although we can come with lots of cheap hardware parts. 13
How fast could it be? The natural expectation for the speed-up from parallelization would be linear: If you put in a two lane highway, then two cars can do through the toll both at the same time, and if you put in a four lane, then four cars can pay tolls in parallel. That is why we often put in multiple toll booths, e.g., in Exit 11 in I-93. On the other hand, this does not happen to the parallel computing: very few parallel algorithms achieve linear speed-up. Most of them have a near-linear speed-up for small numbers of processing elements, but degrades to constant value for large numbers of processing elements. 14
Here is the limit The potential speed-up of a parallel algorithm on a parallel computer is given by Amdahl s law, established in 1960s by Gene Amdahl. When a big problem is cut into a bunch of smaller one, some of them can run in parallel, while the others have to run as a sequence, then, it is the latter that will decide overall speed-up available from parallelization. This relationship is given by the equation: S = 1 1 P, where S is the speed-up of the program, as a factor of its original sequential runtime, and P is the fraction that can be run in parallel. 15
An example If we cut the problem into ten pieces, nine of them can run in parallel, while one piece can t, we have S = 10%,P = 90%, then, the Amdahl s law tells us that S = 1 1 0.9 = 1 0.1 = 10. In other words, at most, we can speed it up 10 times, no matter how many processors we throw in. This result thus puts an upper limit on the usefulness of adding more parallel execution units. One way to put it: The bearing of a child takes nine months, no matter how many women are assigned. 16
Discussion topics Do some further research on Amdahl s law, and share with us your findings in laymen s language. What are some of the successful applications of this multi-processing idea in parallel computing? Give some details... What is it? Why do we do it in parallel? What are the benefits, as compared with sequential computing? In your life, study and/or work, have you ever applied the multi-processing strategy, i.e., do multiple things at one time? If yes, give us some examples: what is the problem? how to you cut it into smaller problems? Can all these smaller ones be run in parallel? If not all of them can be run in parallel, how do you coordinate them? 17