Sources of overhead in parallel programs


















Forgot password? Please enter right email to get password! Retrive Password Enter right registered email to receive password! Enter your emailid Username. Login Here Create a new account. Explain parallel overhead, Computer Engineering Assignment Help:. Parallel Overhead The amount of time needed to organize parallel tasks, as opposed to undertaking useful work.

Parallel overhead may comprise factors like: 1 Task start-up time 2 Synchronisations 3 Data communications Software overhead inflicted by libraries, tools, parallel compilers, operating system etc. Related Discussions:- Explain parallel overhead Database, er table for hospital management system er table for hospital management system.

Workflow automation in e-business environment, How do you achieve workflow Microprocessor, project project. Classify computer systems according to capacity, Classify computer systems Explain the concept of thread, The Concept of Thread A thread is a sequ Determine the current allocation state and granted process, An operating sy Illustrate about fat structure, Q.

How cloud can be used in event schedules, How cloud can be used in Event Sc Write Your Message! Email id. Verfication Code. Featured Services. Online Tutoring. Project Development.

Exam Preparation. The solution node is the rightmost leaf in the tree. A serial formulation of this problem based on depth-first tree traversal explores the entire tree, i. If it takes time tc to visit a node, the time for this traversal is 14tc. Now consider a parallel formulation in which the left subtree is explored by processing element 0 and the right subtree by processing element 1. If both processing elements explore the tree at the same speed, the parallel formulation explores only the shaded nodes before the solution is found.

Notice that the total work done by the parallel algorithm is only nine node expansions, i. The corresponding parallel time, assuming the root node expansion is serial, is 5tc one root node expansion, followed by four node expansions by each processing element. The cause for this superlinearity is that the work performed by parallel and serial algorithms is different. Indeed, if the two-processor algorithm was implemented as two processes on the same processing element, the algorithmic superlinearity would disappear for this problem instance.

Note that when exploratory decomposition is used, the relative amount of work performed by serial and parallel algorithms is dependent upon the location of the solution, and it is often not possible to find a serial algorithm that is optimal for all instances. As we saw in Example 5. Efficiency is a measure of the fraction of time for which a processing element is usefully employed; it is defined as the ratio of speedup to the number of processing elements.

In an ideal parallel system, speedup is equal to p and efficiency is equal to one. In practice, speedup is less than p and efficiency is between zero and one, depending on the effectiveness with which the processing elements are utilized.

We denote efficiency by the symbol E. Mathematically, it is given by Equation 5. The process of applying the template corresponds to multiplying pixel values with corresponding template values and summing across the template a convolution operation. Since we have nine multiply-add operations for each pixel, if each multiply-add takes time tc, the entire operation takes time 9tcn2 on a serial computer.

Example of edge detection: a an 8 x 8 image; b typical templates for detecting edges; and c partitioning of the image across four processors with shaded regions indicating image data that must be communicated from neighboring processors to processor 1. A simple parallel algorithm for this problem partitions the image equally across the processing elements and each processing element applies the template to its own subimage.

Note that for applying the template to the boundary pixels, a processing element must get data that is assigned to the adjoining processing element. On a message passing machine, the algorithm executes in two steps: i exchange a layer of n pixels with each of the two adjoining processing elements; and ii apply template on local subimage.

The first step involves two n-word messages assuming each pixel takes a word to communicate RGB data. The total time for the algorithm is therefore given: 5. Cost reflects the sum of the time that each processing element spends solving the problem.

Efficiency can also be expressed as the ratio of the execution time of the fastest known sequential algorithm for solving a problem to the cost of solving the same problem on p processing elements.

The cost of solving a problem on a single processing element is the execution time of the fastest known sequential algorithm. A parallel system is said to be cost-optimal if the cost of solving a problem on a parallel computer has the same asymptotic growth in Q terms as a function of the input size as the fastest-known sequential algorithm on a single processing element.

Since efficiency is the ratio of sequential cost to parallel cost, a cost-optimal parallel system has an efficiency of Q 1. Cost is sometimes referred to as work or processor-time product, and a cost-optimal system is also known as a pTP -optimal system. Since the serial runtime of this operation is Q n , the algorithm is not cost optimal. Cost optimality is a very important practical concept although it is defined in terms of asymptotics.

We illustrate this using the following example. The pTP product of this algorithm is n log n 2. Therefore, this algorithm is not cost optimal but only by a factor of log n. Let us consider a realistic scenario in which the number of processing elements p is much less than n. An assignment of these n tasks to p. Posted by vijikumar at PM. Idling Processing elements in a parallel system may become idle due to many reasons such as load imbalance, synchronization, and presence of serial components in a program.

In many parallel applications for example, when task generation is dynamic , it is impossible or at least difficult to predict the size of the subtasks assigned to various processing elements. Hence, the problem cannot be subdivided statically among the processing elements while maintaining uniform workload.

If different processing elements have different workloads, some processing elements may be idle during part of the time that others are working on the problem. In some parallel programs, processing elements must synchronize at certain points during parallel program execution. If all processing elements are not ready for synchronization at the same time, then the ones that are ready sooner will be idle until all the rest are ready. Parts of an algorithm may be unparallelizable, allowing only a single processing element to work on it.

While one processing element works on the serial part, all the other processing elements must wait. Excess Computation The fastest known sequential algorithm for a problem may be difficult or impossible to parallelize, forcing us to use a parallel algorithm based on a poorer but easily parallelizable that is, one with a higher degree of concurrency sequential algorithm.



0コメント

  • 1000 / 1000