Intel® Agilex™ SEU Mitigation User Guide

ID 683128
Date 12/30/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

2.3. Scrubbing

Intel® Agilex™ devices support automatic CRAM error correction without reloading the original CRAM contents from an external copy of the original programming bit-stream.

Alternatively, you can scrub through partial reconfiguration by reloading the impacted advanced SEU detection (ASD) region reported by the Advanced SEU Detection Intel® FPGA IP. During the partial reconfiguration process, hold the logic in the partial configuration region in reset until the process completes.

Although scrubbing corrects the SEU error, the SEU error message queue keeps the SEU error message until you retrieve it.

Internal Scrubbing

The internal scrubbing feature automatically corrects single-bit errors.

Intel® recommends that you turn on internal scrubbing. If you do not enable internal scrubbing, the device turns off the SEU mitigation feature for a sector after an error occurs in the sector. Subsequently, the device stops detection of correctable or uncorrectable SEU occurrence in the affected sector.

If you enable the internal scrubbing feature, you must still plan your recovery sequence. Although the scrubbing feature can restore the CRAM array to the intended configuration, a latency period exists between detection and correction of the soft error. During this latency period, the Intel® Agilex™ device may be operating with errors.

For uncorrectable errors, the SDM periodically inserts an error message to the error message queue. The insertion reasserts the SEU_ERROR pin to alert you about the error.

Priority Scrubbing

You can assign portions of the design as high priority sectors for internal scrubbing; the unassigned portions become low priority sectors. The Intel® Agilex™ EDC circuitry detects and corrects errors that occur in the high priority sectors more frequently than the other sectors.

The EDC operation for the priority sectors and non-priority sectors run in parallel when priority scrubbing is enabled. The sectors in group 0 of both the priority sectors and non-priority sectors run the EDC process at time T0. This is followed by the EDC operation of sectors in group 1 of both the priority sectors and non-priority sectors at time T1, sectors in group 2 at time T2, followed by the sectors of the remaining group until the last available group. This corresponds to 1 complete cycle of the EDC process.

For an Intel® Agilex™ device with a total of 30 sectors and the maximum number of sectors allowed to run EDC concurrently, Smax = 5, Intel® Quartus® Prime will determine the number of groups for the priority sectors, GP and non-priority sectors, GN based on the number of priority sectors, P and non-priority sectors, N. Within GP, there is at most Smax-1 number of sectors allocated so that there is at least one sector slot left for the normal sector to run concurrently.

If P≤Smax-1, then GP = 1 and GN is determined by allocating N to the vacant sector slots. For example, P+N = 30, where P = 2 and N = 28, as shown on the left side of the following figure. Since P≤Smax-1, then GP = 1. Two sector slots are occupied by the priority sectors and three sector slots are occupied by the non-priority sectors. GN = 10, is determined by allocating N into three sector slots. The priority sectors only need one unit of time to complete the EDC process compared to the non-priority sectors which need 10 units of time.

If P>Smax-1, then GP and GN are determined by allocating P equally into Smax-1 and N to the last vacant sector slot. For example, P = 6 and N = 24 as shown on the right side of the following figure. Since P>Smax-1, then GP = 2 where the priority sectors are divided equally. Three sector slots are occupied by the priority sectors and two sector slots are occupied by the non-priority sectors, resulting in GN = 12. The priority sectors only need two units of time to complete the EDC process compared to the non-priority sectors which need 12 units of time.

With priority scrubbing, priority sectors will undergo more frequent SEU detection and correction compared to non-priority scrubbing. This provides faster SEU detection and correction for critical design modules. The interval time for priority sectors is always the minimum interval that the Intel® Agilex™ device is capable of, regardless of the user setting of the Minimum SEU Interval in Device and Pin Options. For example, the interval time for one priority sector is always 2160 microseconds regardless if the Minimum SEU Interval is set to 0 or 10000 microseconds.

A warning message is prompted in Intel® Quartus® Prime when P exceeds a certain threshold which causes the priority sectors to require a longer time to complete one cycle of EDC process compared to the non-priority sectors.

Figure 2. Priority Scrubbing