并行程序設(shè)計(jì)導(dǎo)論課件_第1頁(yè)
并行程序設(shè)計(jì)導(dǎo)論課件_第2頁(yè)
并行程序設(shè)計(jì)導(dǎo)論課件_第3頁(yè)
并行程序設(shè)計(jì)導(dǎo)論課件_第4頁(yè)
并行程序設(shè)計(jì)導(dǎo)論課件_第5頁(yè)
已閱讀5頁(yè),還剩38頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、1并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Chapter 1Why Parallel Computing?An Introduction to Parallel ProgrammingPeter Pacheco2并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論RoadmapnWhy we need ever-increasing performance.nWhy were building parallel systems.nWhy we need to write parallel programs.nHow do we write parallel programs?nWhat well be doing.nCo

2、ncurrent, parallel, distributed!# Chapter Subtitle3并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Changing timesnFrom 1986 2002, microprocessors were speeding like a rocket, increasing in performance an average of 50% per year.nSince then, its dropped to about 20% increase per year.4并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論An intelligent solutionnInstead

3、 of designing and building faster microprocessors, put multiple processors on a single integrated circuit.5并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Now its up to the programmersnAdding more processors doesnt help much if programmers arent aware of themn or dont know how to use them.nSerial programs dont benefit from this ap

4、proach (in most cases).6并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Why we need ever-increasing performancenComputational power is increasing, but so are our computation problems and needs.nProblems we never dreamed of have been solved because of past increases, such as decoding the human genome.nMore complex problems are stil

5、l waiting to be solved.7并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Climate modeling8并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論P(yáng)rotein folding9并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Drug discovery10并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Energy research11并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Data analysis12并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Why were building parallel systemsnUp to now, performance increases have been attributable to incre

6、asing density of transistors.nBut there areinherent problems.13并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論A little physics lessonnSmaller transistors = faster processors.nFaster processors = increased power consumption.nIncreased power consumption = increased heat.nIncreased heat = unreliable processors.14并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Solu

7、tion nMove away from single-core systems to multicore processors.n“core” = central processing unit (CPU)nIntroducing parallelism!15并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Why we need to write parallel programsnRunning multiple instances of a serial program often isnt very useful.nThink of running multiple instances of your

8、 favorite game.nWhat you really want is forit to run faster.16并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Approaches to the serial problemnRewrite serial programs so that theyre parallel.nWrite translation programs that automatically convert serial programs into parallel programs.nThis is very difficult to do.nSuccess has been

9、 limited.17并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論More problemsnSome coding constructs can be recognized by an automatic program generator, and converted to a parallel construct.nHowever, its likely that the result will be a very inefficient program.nSometimes the best parallel solution is to step back and devise an entir

10、ely new algorithm.18并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論ExamplenCompute n values and add them together.nSerial solution:19并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Example (cont.)nWe have p cores, p much smaller than n.nEach core performs a partial sum of approximately n/p values.Each core uses its own private variablesand executes this block o

11、f codeindependently of the other cores.20并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Example (cont.)nAfter each core completes execution of the code, is a private variable my_sum contains the sum of the values computed by its calls to Compute_next_value.nEx., 8 cores, n = 24, then the calls to Compute_next_value return:1,4,3,

12、9,2,8, 5,1,1, 5,2,7, 2,5,0, 4,1,8, 6,5,1, 2,3,921并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Example (cont.)nOnce all the cores are done computing their private my_sum, they form a global sum by sending results to a designated “master” core which adds the final result.22并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Example (cont.)23并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Example

13、(cont.)Core01234567my_sum8197157131214Global sum8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95Core01234567my_sum9519715713121424并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論But wait!Theres a much better wayto compute the global sum.25并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Better parallel algorithmnDont make the master core do all the work.nShare it among th

14、e other cores.nPair the cores so that core 0 adds its result with core 1s result.nCore 2 adds its result with core 3s result, etc.nWork with odd and even numbered pairs of cores.26并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Better parallel algorithm (cont.)nRepeat the process now with only the evenly ranked cores.nCore 0 adds

15、result from core 2.nCore 4 adds the result from core 6, etc.nNow cores divisible by 4 repeat the process, and so forth, until core 0 has the final result.27并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Multiple cores forming a global sum28并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論AnalysisnIn the first example, the master core performs 7 receives and 7 ad

16、ditions.nIn the second example, the master core performs 3 receives and 3 additions.nThe improvement is more than a factor of 2!29并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Analysis (cont.)nThe difference is more dramatic with a larger number of cores.nIf we have 1000 cores:nThe first example would require the master to perfo

17、rm 999 receives and 999 additions.nThe second example would only require 10 receives and 10 additions.nThats an improvement of almost a factor of 100!30并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論How do we write parallel programs?nTask parallelism nPartition various tasks carried out solving the problem among the cores.nData p

18、arallelismnPartition the data used in solving the problem among the cores.nEach core carries out similar operations on its part of the data.31并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論P(yáng)rofessor P15 questions300 exams32并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論P(yáng)rofessor Ps grading assistantsTA#1TA#2TA#333并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Division of work data paralleli

19、smTA#1TA#2TA#3100 exams100 exams100 exams34并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Division of work task parallelismTA#1TA#2TA#3Questions 1 - 5Questions 6 - 10Questions 11 - 1535并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Division of work data parallelism36并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Division of work task parallelismTasksReceiving1)Addition 37并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Coo

20、rdinationnCores usually need to coordinate their work.nCommunication one or more cores send their current partial sums to another core.nLoad balancing share the work evenly among the cores so that one is not heavily loaded.nSynchronization because each core works at its own pace, make sure cores do

21、not get too far ahead of the rest.38并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論What well be doingnLearning to write programs that are explicitly parallel.nUsing the C language.nUsing three different extensions to C.nMessage-Passing Interface (MPI)nPosix Threads (Pthreads)nOpenMP39并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Type of parallel systemsnShare

22、d-memorynThe cores can share access to the computers memory.nCoordinate the cores by having them examine and update shared memory locations.nDistributed-memorynEach core has its own, private memory.nThe cores must communicate explicitly by sending messages across a network.40并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Type of parallel systemsShared-memoryDistributed-memory41并行程序設(shè)計(jì)導(dǎo)論并行程序設(shè)計(jì)導(dǎo)論Terminology nConcurrent computing a program is one in which multiple tasks can be in progress at any in

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論