Skip to main content

Instruction Pipelining



Amu_Ke_Fundye

Instruction Pipelining


Parallel processing provides simultaneous data processing tasks for the purpose of increasing the computational speed of a computer system rather than each instruction is processed sequentially, a parallel processing system is able to perform concurrent data processing to achieve faster execution time and increase throughput.

There are more advantages with parallel processing but it has some issues also. Due to parallel processing, the amount of hardware increases and the cost of system increases. Parallel processing is established by distributing the data among the multiple functional units.

Flynn’s Classification

Flynn introduced the parallel processing classification. This classification considers the organisation of a computer system by the number of instructions and data items that are manipulated simultaneously.
  • The sequence of instructions read from the memory constitutes an instruction stream.
  • The operations performed on the data in the processor constitutes a data stream.

Flynn’s Classification: Flynn’s classification divides computer into four major groups as follows
  • Single Instruction stream, Single Data stream (SISD)
  • Single Instruction stream, Multiple Data stream (SIMD)
  • Multiple Instruction stream, Single Data stream (MISD)
  • Multiple Instruction stream, Multiple Data stream (MIMD)

SISD: It represents the organisation of a single computer containing a control unit, a processor unit and a memory unit. Instructions are executed sequentially.
image001
SIMD: It represents an organisation that includes many processing units under the supervision d a common control unit. All processors receive the same instruction from the control unit but operate on different items of data.
image002
MISD: Its architecture contains n processors unit, each receiving instruction streams and providing the same data stream. MISD structure is only of theoretical interest, since no practical system has been constructed using this organisation.
image003
MIMD: Its organisation refers to a computer system capable of processing several programs at the same time.
image004
The operation of the CPU is usually described in terms of the Fetch-Execute cycle.
image005
In order to appreciate the operation of a computer we need to answer such questions and to consider in more detail the organisation of the CPU.

Pipelining

Pipeline processing is an implementation technique, where arithmetic suboperations or the phases of a computer instruction cycle overlap in execution. A pipeline can be visualised as a collection of processing segments through which information flows.

The overlapping of computation is made possible by associating a register with each segment in the pipeline. The registers provide isolation between each segment.

General Structure of 3-Segment Pipeline
image006
  • Each segment consists of a combinational circuit Si that performs a suboperation over the data stream flowing through pipe. The segments are separated by register Ri that hold the intermediate results between the stages.
  • Information flows between adjacent stages under the control of a common clock applied to all the registers simultaneously.
  • The behaviour of a pipeline can be illustrated with a space-time diagram. This is a diagram that shows the segment utilisation as a function of time. The horizontal axis displays the time in clock cycles and the vertical axis gives the segment number.
  • The space-time diagram shows the four segment pipeline with T1 through T6 six tasks executed.
image007
  • Consider if K- segment pipeline with clock cycle time tp is used to execute n tasks. The first task T1 requires a time = K tp.
  • The remaining (n –1) tasks emerge from the pipe at the rate of one task per clock cycle and they will be completed after a time = (n –1) tp.
  • Therefore, to complete n tasks using a K-segment pipeline requires K + (n – 1) clock cycles.
  • A non-pipeline unit perform the same operation and takes a time of tn to complete each task. The total time required for n tasks in (n tn).
  • The speedup (S) is the ratio of a pipeline processing over an equivalent non-pipeline processing.image008

Special Case: As number of tasks increases, n becomes larger than K – l, then K + n – l is approximately n. Then, speedup becomes
If we assume tn = Ktp then image009
image010

Instruction Pipeline

Pipeline processing can occur not only in the data stream but in the instruction stream. An instruction pipeline reads consecutive instructions from memory while previous instructions are being executed in other segments.
  • This causes the instruction fetch and execution phases to overlap and perform simultaneous operations and consider a computer with a instruction fetch unit and an instruction execution unit designed to provide two-segment pipeline.
  • Complex instructions that requires other phases in addition to fetch and execute to process an instruction completely. The instructions cycle is as follows:
    • Fetch the instruction from memory
    • Decode the instruction
    • Calculate the effective address
    • Fetch the operands from memory
    • Execute the instruction
    • Store the result in the proper place.

Difficulties in Instruction Pipeline

  1. Resource conflicts: It is caused by access to memory by two segments at the same time. Most of these conflicts can be solved by using separate instruction and data memories.
  2. Data dependency conflicts: It arises when an instruction depends on the result of a previous instruction but this result is not yet available.
  3. Branch difficulties arise: It arises from branch and other instructions that change the value of PC.

Stalls: The periods in which the decode unit, execute unit, and the write unit are idle are called stalls. They are also referred to as bubbles in the pipeline.

Hazard: Any condition that causes the pipeline to stall is called a hazard. There are three types of hazards are possible:
  • Data Hazard: A data hazard is any condition in which either the source or the destination operands of an instruction are not available at the time expected in the pipeline. As a result some operation has to be delayed, and the pipeline stalls.
  • Instruction hazards: The pipeline may also be stalled because of a delay in the availability of an instruction. For example, this may be a result of a miss in the cache, requiring the instruction to e fetched from the main memory. Such hazards are often called control hazards or instruction hazards.
  • Structural hazard: Structural hazard is the situation when two instructions require the use of a given hardware resource at the same time. The most common case in which this hazard may arise is in access to memory.

Performance

  • Performance = 1 / Execution time
  • Let Machine X is n times faster than Machine Y:
    • Performance of X = n * Performance of Y
    • Execution_time of X = (1/n) * Execution_time of Y
  • CPU Execution_time = (Number of CPU clock cycles required) * (cycle time)OR
  • CPU Execution_time = ( Number of CPU clock cycles required) / (Clock rate)
Regards 
Amrut Jagdish Gupta

Comments

Popular posts from this blog

Undecidability

Amu_Ke_Fundye Undecidability Decidable Problem If there is a Turing machine that decides the problem, called as Decidable problem. A decision problem that can be solved by an algorithm that halts on all inputs in a finite number of steps. A problem is decidable, if there is an algorithm that can answer either yes or no. A language for which membership can be decided by an algorithm that halts on all inputs in a finite number of steps. Decidable problem is also called as totally decidable problem, algorithmically solvable, recursively solvable. Undecidable Problem A problem that cannot be solved for all cases by any algorithm whatsoever. Equivalent Language cannot be recognized by a Turing machine that halts for all inputs. The following problems are undecidable problems: Halting Problem:  A halting problem is undecidable problem. There is no general method or algorithm which can solve the halting problem for all possible inputs. Emptyness Problem:  Wh

ALU and Data Path

Amu_Ke_Fundye ALU and Data Path The CPU can be divided into a data section and a control section. The   data section , which is also called the   datapath. Datapath The registers, the ALU, and the interconnecting bus are collectively referred to as the datapath.  Each bit in datapath is functionally identical.  The datapath is capable of performing certain operations on data items. The  control section  is basically the  control unit ,  which issues control signals to the datapath . Bus : A Bus is a collection of wires or distinct lines meant to carry data, address and control information. Data Bus : it is used for transmission of data. The number of data lines corresponds to the number of bits in a word. Address Bus : it carries the address of the main memory location from where the data can be accessed. Control Bus : it is used to indicate the direction of data transfer and to coordinate the timing of events during the transfer. PC (Pro

Sequential Logic Circuits

Amu_Ke_Fundye Sequential Logic Circuits In sequential logic circuit, the output is dependent upon the present inputs as well as the past inputs and outputs. Sequential circuit is of two types. Synchronous Sequential Circuit:  Change in input signals can affect memory elements only upon activation of clock signals. Asynchronous Sequential Circuit:  Change in input signals can affect memory elements at any instant of time. These are faster than synchronous circuit. Flip Flops: It is a one-bit memory cell which stores the 1-bit logical data (logic 0 or logic 1). It is a basic memory element. The most commonly used application of flip flops is in the implementation of a feedback circuit. As a memory relies on the feedback concept, flip flops can be used to design it. In synchronous sequential circuit, Memory elements are clocked flip flops and generally edge triggered. In asynchronous sequential circuit, Memory elements are unclocked flip flops / time delay