Pipelining · Computer Organization · GATE CSE

Marks 1

An instruction format has the following structure:

Instruction Number: Opcode destination reg, source reg-1, source reg-2

Consider the following sequence of instructions to be executed in a pipelined processor:

I1: DIV R3, R1, R2

I2: SUB R5, R3, R4

I3: ADD R3, R5, R6

I4: MUL R7, R3, R8

Which of the following statements is/are TRUE?

GATE CSE 2024 Set 2

Consider a 5-stage pipelined processor with Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Register Writeback (WB) stages. Which of the following statements about forwarding is/are CORRECT?

GATE CSE 2024 Set 1

Consider a 3-stage pipelined processor having a delay of 10 ns (nanoseconds), 20 ns, and 14 ns, for the first, second, and the third stages, respectively. Assume that there is no other delay and the processor does not suffer from any pipeline hazards. Also assume that one instruction is fetched every cycle.

The total execution time for executing 100 instructions on this processor is ___________ ns.

GATE CSE 2023

GATE CSE 2012

For a pipelined $$CPU$$ with a single $$ALU$$, consider the following situations
$$1.\,\,\,\,\,$$ The $$j+1$$ instruction uses the result of the $$j$$-$$th$$ instruction as an operand
$$2.\,\,\,\,\,$$ The execution of a conditional jump instruction
$$3.\,\,\,\,\,$$ The $$j$$-$$th$$ and $$j+1$$ instruction require the $$ALU$$ at the same time

Which of the above can cause a hazard?

GATE CSE 2003

Comparing the time $$T1$$ taken for a single instruction on a pipelined $$CPU$$ with time $$T2$$ taken on a non-pipelined but identical $$CPU,$$ we can say that

GATE CSE 2000

Marks 2

A 5-stage instruction pipeline has stage delays of $180,250,150,170$, and 250 , respectively, in nanoseconds. The delay of an inter-stage latch is 10 nanoseconds. Assume that there are no pipeline stalls due to branches and other hazards. The time taken to process 1000 instructions in microseconds is ________ . (Rounded off to two decimal places)

GATE CSE 2025 Set 2

An application executes $6.4 \times 10^8$ number of instructions in 6.3 seconds. There are four types of instructions, the details of which are given in the table. The duration of a clock cycle in nanoseconds is _________. (rounded off to one decimal place)

Instruction type	Clock cycles required per instruction (CPI)	Number of Instructions executed
Branch	2	$2.25\times10^8$
Load	5	$1.20\times10^8$
Store	4	$1.65\times10^8$
Arithmetic	3	$1.30\times10^8$

GATE CSE 2025 Set 2

A non-pipelined instruction execution unit operating at 2 GHz takes an average of 6 cycles to execute an instruction of a program P. The unit is then redesigned to operate on a 5-stage pipeline at 2 GHz. Assume that the ideal throughput of the pipelined unit is 1 instruction per cycle. In the execution of program P, 20% instructions incur an average of 2 cycles stall due to data hazards and 20% instructions incur an average of 3 cycles stall due to control hazards. The speedup (rounded off to one decimal place) obtained by the pipelined design over the non-pipelined design is ________

GATE CSE 2024 Set 2

The baseline execution time of a program on a 2 GHz single core machine is 100 nanoseconds (ns). The code corresponding to 90% of the execution time can be fully parallelized. The overhead for using an additional core is 10 ns when running on a multicore system. Assume that all cores in the multicore system run their share of the parallelized code for an equal amount of time.

$$ \text { The number of cores that minimize the execution time of the program is _______. } $$

GATE CSE 2024 Set 1

A processor X₁ operating at 2 GHz has a standard 5-stage RISC instruction pipeline having a base CPI (cycles per instruction) of one without any pipeline hazards. For a given program P that has 30% branch instructions, control hazards incur 2 cycles stall for every branch. A new version of the processor X₂ operating at same clock frequency has an additional branch predictor unit (BPU) that completely eliminates stalls for correctly predicted branches. There is neither any savings nor any additional stalls for wrong predictions. There are no structural hazards and data hazards for X₁ and X₂. If the BPU has a prediction accuracy of 80%, the speed up (rounded off to two decimal places) obtained by X₂ over X₁ in executing P is ____________.

GATE CSE 2022

Consider a pipelined processor with 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). Each stage of the pipeline, except the EX stage, takes one cycle. Assume that the ID stage merely decodes the instruction and the register read is performed in the EX stage. The EX stage takes one cycle for ADD instruction and two cycles for MUL instruction. Ignore pipeline register latencies.

Consider the following sequence of 8 instructions:

ADD, MUL, ADD, MUL, ADD, MUL, ADD, MUL

Assume that every MUL instruction is data-dependent on the ADD instruction just before it and every ADD instruction (except the first ADD) is data-dependent on the MUL instruction just before it. The speedup is defined as follows:

$$Speedup = \frac{{Execution{\:}time{\:}without{\:}operand{\:}forwarding}}{{Execution{\:}time{\:}with{\:}operand{\:}forwarding}}$$

The Speedup achieved in executing the given instruction sequence on the pipelined processor (rounded to 2 decimal places) is _______

GATE CSE 2021 Set 2

A five-stage pipeline has stage delays of 150, 120, 150, 160 and 140 nanoseconds. The registers that are used between the pipeline stages have a delay of 5 nanoseconds each.

The total time to execute 100 independent instructions on this pipeline, assuming there are no pipeline stalls, is ______ nanoseconds.

GATE CSE 2021 Set 1

Consider the following instruction sequence where register R1, R2 and R3 are general purpose and MEMORY[X] denotes the content at the memory location X.

Instruction	Semantics	Instruction Size (bytes)
MOV R1, (5000)	R1 ← MEMORY[5000]	4
MOV R2, (R3)	R2 ← MEMORY[R3]	4
ADD R2, R1	R2 ← R1 + R2	2
MOV (R3), R2	MEMORY[R3] ← R2	4
INC R3	R3 ← R3 + 1	2
DEC R1	R1 ← R1 – 1	2
BNZ 1004	Branch if not zero to the given absolute address	2
HALT	Stop	1

Assume that the content of the memory location 5000 is 10, and the content of the register R3 is 3000. The content of each of the memory locations from 3000 to 3010 is 50. The instruction sequence starts from the memory location 1000. All the numbers are in decimal format. Assume that the memory is byte addressable.

After the execution of the program, the content of memory location 3010 is ______

GATE CSE 2021 Set 1

Consider a non-pipelined processor operating at 2.5 GHz. It takes 5 clock cycles to complete an instruction. You are going to make a 5-stage pipeline out of this processor. Overheads associated with pipelining force you to operate the pipelined processor at 2 GHz. In a given program, assume that 30% are memory instructions, 60% are ALU instructions and the rest are branch instructions. 5% of the memory instructions cause stalls of 50 clock cycles each due to cache misses and 50% of the branch instructions cause stalls of 2 cycles each. Assume that there are no stalls associated with the execution of ALU instructions. For this program, the speedup achieved by the pipelined processor over the non-pipelined processor (round off to 2 decimal places) is _____.

GATE CSE 2020

The instruction pipeline of a $$RISC$$ processor has the following stages: Instruction Fetch $$(IF),$$ Instruction Decode $$(ID),$$ Operand Fetch $$(OF),$$ Perform Operation $$(PO)$$ and Writeback $$(WB).$$ The $$IF,$$ $$ID,$$ $$OF$$ and $$WB$$ stages take $$1$$ clock cycle each for every instruction. Consider a sequence of $$100$$ instructions. In the $$PO$$ stage, $$40$$ instructions take $$3$$ clock cycles each, $$35$$ instructions take $$2$$ clock cycles each, and the remaining $$25$$ instructions take $$1$$ clock cycle each. Assume that there are no data hazards and no control hazards.

The number of clock cycles required for completion of execution of the sequence of instructions is ______.

GATE CSE 2018

The stage delays in a $$4$$-stage pipeline are $$800, 500, 400$$ and $$300$$ picoseconds. The first stage (with delay $$800$$ picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays $$600$$ and $$350$$ picoseconds. The throughput increase of the pipeline is percent.

GATE CSE 2016 Set 1

Suppose the functions $$F$$ and $$G$$ can be computed in $$5$$ and $$3$$ nanoseconds by functional units $${U_F}$$ and $${U_G},$$ respectively. Given two instances of $${U_F}$$ and two instances of $${U_G},$$ it is required to implement the computation $$F\left( {G\left( {{X_i}} \right)} \right)$$ for $$1 \le i \le 10.$$ Ignoring all other delays, the minimum time required to complete this computation is _____________ nanoseconds.

GATE CSE 2016 Set 2

Consider a $$3$$ $$GHz$$ (gigahertz) processor with a three-stage pipeline and stage latencies $${\tau _1},{\tau _2},$$ and $${\tau _3}$$ such that $${\tau _1} = 3{\tau _2}/4 = 2{\tau _3}.$$ If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is ____________ $$GHz,$$ ignoring delays in the pipeline registers.

GATE CSE 2016 Set 2

Consider the following reservation table for a pipeline having three stages $${S_1},{S_2}$$ and $${S_3}.$$

The minimum average latency $$(MAL)$$ is ________.

GATE CSE 2015 Set 3

Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is_________.

GATE CSE 2015 Set 1

Consider the sequence of machine instructions given below:

MUL	R5, R0, R1
DIV	R6, R2, R3
ADD	R7, R5, R6
SUB	R8, R7, R4

In the above sequence, $$R0$$ to $$R8$$ are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following $$4$$ stages: $$(1)$$ Instruction Fetch and Decode $$(IF), (2)$$ Operand Fetch $$(OF), (3)$$ Perform Operation $$(PO)$$ and $$(4)$$ Write back the result $$(WB).$$ The $$IF,$$ $$OF$$ and $$WB$$ stages take $$1$$ clock cycle each for any instruction. The $$PO$$ stage takes $$1$$ clock cycle for $$ADD$$ or $$SUB$$ instruction, $$3$$ clock cycles for $$MUL$$ instruction and $$5$$ clock cycles for $$DIV$$ instruction. The pipelined processor uses operand forwarding from the $$PO$$ stage to the $$OF$$ stage. The number of clock cycles taken for the execution of the above sequence of instructions is _______________________ .

GATE CSE 2015 Set 2

Consider the following processors ($$ns$$ stands for nanoseconds). Assume that the pipeline registers have zero latency.
$$P1:$$ Four-stage pipeline with stage latencies $$1$$ $$ns,$$ $$2$$ $$ns,$$ $$2$$ $$ns,$$ $$1$$ $$ns.$$
$$P2:$$ Four-stage pipeline with stage latencies $$1$$ $$ns,$$ 1$$.5$$ $$ns,$$ $$1.5$$ $$ns,$$ $$1.5$$ $$ns.$$
$$P3:$$ Five-stage pipeline with stage latencies $$0.5$$ $$ns,$$ $$1$$ $$ns,$$ $$1$$ $$ns,$$ $$0.6$$ $$ns,$$ $$1$$ $$ns.$$
$$P4:$$ Five-stage pipeline with stage latencies $$0.5$$ $$ns,$$ $$0.5$$ $$ns,$$ $$1$$ $$ns,$$ $$1$$ $$ns,$$ $$1.1$$ $$ns.$$

Which processor has the highest peak clock frequency?

GATE CSE 2014 Set 3

An instruction pipeline has five stages, namely, instruction fetch $$(IF),$$ instruction decode and register fetch $$(ID/RF)$$ instruction execution $$(EX),$$ memory access $$(MEM),$$ and register writeback $$(WB)$$ with stage latencies $$1$$ ns, $$2.2$$ $$ns,$$ $$2$$ $$ns,$$ $$1$$ $$ns,$$ and $$0.75$$ $$ns,$$ respectively ($$ns$$ stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the $$ID/RF$$ stage into three stages $$(ID, RF1, RF2)$$ each of latency $$2.2/3$$ $$ns,$$ Also, the $$EX$$ stage is split into two stages $$(EX1, EX2)$$ each of latency $$1$$ ns. The new design has a total of eight pipeline stages. A program has $$20$$% branch instructions which execute in the $$EX$$ stage and produce the next instruction pointer at the end of the $$EX$$ stage in the old design and at the end of the $$EX2$$ stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average $$CPI$$ of one in both the designs. The execution times of this program on the old and the new design are $$P$$ and $$Q$$ nanoseconds, respectively. The value of $$P/Q$$ is _____________.

GATE CSE 2014 Set 3

Consider a $$6$$-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this $$6$$-stage pipeline, the speedup achieved with respect to non-pipelined execution if $$25$$% of the instructions incur $$2$$ pipeline stall cycles is________________.

GATE CSE 2014 Set 1

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction $$(FI),$$ Decode Instruction $$(DI),$$ Fetch Operand $$(FO),$$ Execute Instruction $$(EI)$$ and Write Operand $$(WO).$$ The stage delays for $$FI, DI, FO, EI$$ and $$WO$$ are $$5$$ $$ns,$$ $$7$$ $$ns,$$ $$10$$ $$ns,$$ $$8$$ $$ns$$ and $$6$$ $$ns$$, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is $$1$$ $$ns.$$ A program consisting of $$12$$ instructions $${{\rm I}_1},{{\rm I}_2},{{\rm I}_3},......,\,\,{{\rm I}_{12}}$$ is executed in this pipelined processor. Instruction $${{\rm I}_4}$$ is the only branch instruction and its branch target is $${{\rm I}_9}$$. If the branch is taken during the execution of this program, the time (in $$ns$$) needed to complete the program is

GATE CSE 2013

Consider an instruction pipeline with four stages $$\left( {S1,\,S2,\,S3,} \right.$$ and $$\left. {S4} \right)$$ each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure.

What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?

GATE CSE 2011

A $$5$$-stage pipelined processor has Instruction Fetch $$(IF),$$ Instruction Decode $$(ID),$$ Operand Fetch $$(OF),$$ Perform Operation $$(PO)$$ and Write Operand $$(WO)$$ stages. The $$IF, ID, OF$$ and $$WO$$ stages take $$1$$ clock cycle each for any instruction. The $$PO$$ stage takes $$1$$ clock cycle for $$ADD$$ and $$SUB$$ instructions, $$3$$ clock cycles for $$MUL$$ instruction, and $$6$$ clock cycles for $$DIV$$ instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions?

GATE CSE 2010

Consider a $$4$$ stage pipeline processor. The number of cycles needed by the four instructions $${\rm I}1,$$ $${\rm I}2,$$ $${\rm I}3,$$ $${\rm I}4,$$ in stages $$S1, S2, S3, S4$$ is shown below.

What is the number of cycles needed to execute the following loop?
For $$\left( {i = 1} \right.$$ to $$\left. 2 \right)$$ $$\left\{ {{\rm I}1;{\rm I}2;{\rm I}3;{\rm I}4;} \right\}$$

GATE CSE 2009

The use of multiple register windows with overlap causes a reduction in the number of memory accesses for
$$1.\,\,\,\,$$ Function locals and parameters
$$2.\,\,\,\,$$ Register saves and restores
$$3.\,\,\,\,$$ Instruction fetches

GATE CSE 2008

The following code is to run on a pipelined processor with one branch delay slot
$$\eqalign{ & {{\rm I}_1}:\,\,ADD\,\,{R_2}\,\, \leftarrow \,\,{R_7} + {R_8} \cr & {{\rm I}_2}:\,\,SUB\,\,\,{R_4}\,\, \leftarrow \,\,{R_5} - {R_6} \cr & {{\rm I}_3}:\,\,ADD\,\,{R_1}\,\, \leftarrow \,\,{R_2} + {R_3} \cr & {{\rm I}_4}:\,\,STORE\,\,Memory\,\,\left[ {{R_4}} \right]\,\, \leftarrow \,\,{R_1} \cr & BRANCH\,\,to\,\,Label\,\,if\,\,{R_1} = = 0 \cr} $$

Which of the instructions $${{\rm I}_1},\,{{\rm I}_2},\,{{\rm I}_3}$$ or $${{\rm I}_4}$$ can legitimately occupy the delay slot without any other program modification?

GATE CSE 2008

In an instruction execution pipeline, the earliest that the data $$TLB$$ (Translation Look aside Buffer) can be accessed is

GATE CSE 2008

Which of the following are NOT true in a pipelined processor?
$$1.$$ Bypassing can handle all RAW hazards
$$2.$$ Register renaming can eliminate all register carried WAR hazards
$$3.$$ Control hazard penalties can be eliminated by dynamic branch prediction.

GATE CSE 2008

Consider a pipelined processor with the following four stages
$$\,\,\,\,\,$$$$IF:$$ Instruction Fetch
$$\,\,\,\,\,$$$$ID:$$ Instruction Decode and Operand Fetch
$$\,\,\,\,\,$$$$EX:$$ Execute
$$\,\,\,\,\,$$$$WB:$$ Write Back

The $$IF, ID$$ and $$WB$$ stages take one clock cycle each to complete the operation. The number of clock cycles for the $$EX$$ stage depends on the instruction. The $$ADD$$ and $$SUB$$ instructions need $$1$$ clock cycle and the $$MUL$$ instruction needs $$3$$ clock cycles in the $$EX$$ stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?

GATE CSE 2007

A CPU has five stages pipeline and runs at $$1$$ $$GHz$$ frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instruction following a conditional branch until the branch outcome is known. A program executes $${10^9}$$ instructions out of which $$20$$% are conditional branches. If each instruction takes one cycle to complete on average, then total execution time of the program is

GATE CSE 2006

A $$5$$ stage pipelined $$CPU$$ has the following sequence of stages $$IF$$-Instruction fetch from instruction memory, $$RD$$-Instruction decode and register read, $$EX$$-Execute: $$ALU$$ operation for data and address computation, $$MA$$-Data memory access-for write access the register read and $$RD$$ stage it used, $$WB$$-Register write back.

Consider the following sequence of instructions:
$$\eqalign{ & {{\rm I}_1}:L\,R0,\,\,Loc1;\,R0 < \,\, = M\,[Loc1] \cr & {{\rm I}_2}:A\,R0,\,R0;\,\,\,\,\,\,R0 < \,\, = R0 + R0 \cr & {{\rm I}_3}:A\,R2,\,R0;\,\,\,\,\,\,R2 < \,\, = R2 - R0 \cr} $$

Let each stage takes one clock cycle.
What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of $${{\rm I}_1}?$$

GATE CSE 2005

A $$4$$-stage pipeline has the stage delays as $$150, 120,160$$ and $$140$$ nano seconds respectively. Registers that are used between the stages have a delay of $$5$$ nanoseconds each. Assuming constant clocking rate, the total time taken to process $$1000$$ data items on this pipeline will be

GATE CSE 2004

The performance of a pipelined processor suffers if

GATE CSE 2002

Marks 5

An instruction pipeline has five stages where each stage takes $$2$$ nanoseconds and all instructions use all five stages. Branch instructions are not overlapped, i.e., the instruction after the branch is not fetched till the branch instruction is completed. Under ideal conditions.

(a) Calculate the average instruction execution time assuming that $$20$$% of all instruction executed are branch instructions. Ignore the fact that some branch instructions may be conditional.

(b) If a branch instruction is a conditional branch instruction, the branch need not be taken. If the branch is not taken, the following instructions can be overlapped. When $$80$$% of all branch instructions are conditional branch instructions, and $$50$$% of the conditional branch instructions are such that the branch is taken, calculate the average instruction execution time.

GATE CSE 2000

An instruction pipeline consists of $$4$$ stages: Fetch (F), Decode field (D), Execute (E), and Result-Write (W). The $$5$$ instructions in a certain instruction sequence need these stages for the different number of clock cycles as shown by the table below.

Find the number of clock cycles needed to perform the $$5$$ instructions

GATE CSE 1999