Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

3.3.3.1.2.1. Pipelining Loops Within A Component

Within a component, loops are the primary source of pipeline parallelism.

When the Intel® HLS Compiler pipelines a loop, it attempts to schedule the loop execution such that the next iteration of the loop enters the pipeline before the previous iteration has completed. This pipelining of loop iterations can lead to higher throughput.

The number of clock cycles between iterations of the loop is called the Initiation Interval (II).

For the highest performance, a loop iteration would start every clock cycle, which corresponds to an II of 1.

Data dependencies that are carried from one loop iteration to another can affect the ability to achieve II of 1. These dependencies are called loop-carried dependencies.

The II of a loop must be high enough to accommodate all loop carried dependencies.

Tip: The II required to satisfy this constraint is a function of the fMAX of the design. If the fMAX is lower, the II might also be lower. Conversely, if the fMAX is higher, a higher II might be required.

The Intel® HLS Compiler automatically identifies these dependencies and tries to build hardware to resolve them while minimizing the II, subject to the target fMAX.

Naively generating hardware for the code in Figure 17 results in two loads: one from memory b and one from memory c. Because the compiler knows that the access to c[i-1] was written to in the previous iteration, the load from c[i-1] can be optimized away.

Figure 17. Pipelining a Datapath with Loop Iteration


The dependency on the value stored to c in the previous iteration is resolved in a single clock cycle, so an II of 1 is achieved for the loop even though the iterations are not independent.

For additional information about pipelining loops, refer to Pipeline Loops.

When the Intel® HLS Compiler cannot initially achieve II of 1, it chooses from several optimization strategies:

These optimizations are applied automatically by the Intel® HLS Compiler, and additionally can be controlled through pragma statements in the design.