# An Energy Efficient Design Using CSAT Approach on Ultra Scale FPGA

Tanesh Kumar, Mohamed Hashim Minver, Teerath Das Department of Computer Science South Asian University Delhi, India tanesh.sitani@hotmail.com, mhminver@gmail.com, teerath.sitani@gmail.com

*Abstract*— In this paper an approach called Capacitance Scaling Ambient Temperature (CSAT) is used to design an energy efficient as well as thermal efficient GCD generator. Energy efficient means it dissipates less power either IO power or leakage power than their traditional counterpart. Thermal efficient means junction temperature of FPGA is lesser than their traditional counterpart. Where, Junction temperature is the highest temperature of an actual device or silicon die in FPGA. Our work shows the variance of I/O power and Junction temperature at different capacitance values under different frequencies. There is 15.22% reduction in I/O Power when we scale down capacitance from 100pF to 80pF. There is 29.31%, 44.83%, 58.62%, and 74.14% reduction when we further scale down capacitance to 60pF, 40pF, 20pF and 0pF. When we scale down capacitance from 100pF to 80pF, 60pF, 40pF, 20pF and 0pF, when we scale down capacitance from 100pF to 80pF, 60pF, 40pF, 20pF and 0pF, there is 3%, 6%, 9.14%, 12.14% and 15.19% reduction in junction temperature respectively at 100GHz device operating frequency. Virtex-6 is our target FPGA device for this design.

Keywords—CSAT, Greatest Common Divisor (GCD), Capacitance, Energy Efficient Design, I/Os Power, Junction Temperature FPGA.

## I. INTRODUCTION

Capacitance Scaling Ambient Temperature(CSAT) is energy and thermal efficient techniques in which we scale down capacitance from 100pF to 80pF, 60pF, 40pF, 20pF and then finally 0pF for 25 °C ambient temperature. In latest FPGA, we can set output load in term of pF i.e. pico Farad. The scaling in capacitance reduces both power dissipation and junction temperature. This approach is used for designing an energy efficient and thermal efficient GCD. Our target FPGA is Virtex-6 Field Programmable Gate Array (FPGA). Energy efficient GCD means it dissipates less power either IO power or leakage power than their traditional counterpart GCD. Thermal efficient GCD means after implementation of GCD on FPGA it's junction temperature is lesser than their traditional counterpart GCD. Where, Junction temperature is the highest temperature of a actual device or silicon die in FPGA. Our work shows the variance of I/O power, Leakage power and Junction temperature at different capacitance values under different frequencies. Usually capacitance value is taken as most important feature of a capacitor. Any type of real capacitor have capacitance features along with resistance and inductance features. Inductances takes place among the FPGA device, capacitors and voltage regulator. The parasitic

inductance (ESL) has the equal or higher importance in power system applications. The amount of parasitic inductance can be examined by body size of capacitor. It is considered that physically large capacitors have more parasitic inductance than physically small capacitors. Parasitic or real capacitor is shown in Figure 1.



Figure 1: Parasitic of Real, Non Ideal Capacitor [3]

Concept of capacitance in power reduction is very useful. Device capacitance requirements with CLB and I/O utilization. The relation between power and capacitance is shown in figure 2.



Figure 2: Mathematical Expression for Power and Capacitance

The highest temperature of a actual device or silicon die is known as Junction Temperature, where as usually normal standard temperature is considered as Ambient Temperature. Ambient temperature is directly proportional to junction temperature. Mathematical relation between junction and ambient temperature is as follows.

$$T_J = T_A + R_{OJA} * Power$$

Where  $T_J$  is considered as junction temperature,

T<sub>A</sub> is considered as Ambient Temperature,

R<sub>QJA</sub> is considered as junction to ambient thermal resistance.

## II. RELATED WORK

In [1] stand alone electrical capacitance technology system is designed with the help of FPGA technology. In the field of electrical tomography, it happens first time that reconfigurable hardware is used in [1]. Compared to DSP implementation FPGA based electrical capacitance technology system gets better performance in terms of power and speed in [1]. In [2] ALU power is reduced by capacitance scaling technique and implemented it on 28nm field programmable gate array (FPGA). Under different operating frequencies and capacitance power reduction of 40nm FPGA is compared with power reduction with 28nm FPGA in [2]. On operating frequency

10GHz, for capacitance 5pF and 4pF, it achieves 17.16% and 17.99% respectively reduction in power in [2]. In[3] we get basic knowledge about PCB Design and pin planning. In a 90nm CMOS technology [4] tells about low voltage high supply FPGA I/O cell IC. In reference [7], capacitance scaling system uses two component, first is a commercial impedance (inductance-capacitance-resistance) meter and second is a single decade inductive voltage divider as impedance comparator. Four terminal pair capacitors in decade (10 : 1) steps from 10nF to 100nF are measured in [7]. [8] describe a probabilistic methodology that consider probable usage of a routing resource in order to estimate interconnect capacitance and dynamic power consumption. This model is also used to estimate leakage power distribution of interconnect resources[8].

## III. BLOCK DIAGRAM OF GCD GENERATOR

## A. RTL Elaborated SchematicDesign of GCD

GCD is taking two 4-bit inputs, one is Input1 and Other Input2. GCD\_Out is the final result, which is GCD of the two input.





Figure 3 is a 4 Bit GCD, having two instances, one of them is data path and other one is controller. There are 16 I/O ports and 23 Nets. There are 38 Look up tables and 19 flip flops. Out of 19 flip flops, ten are D Flip-Flop with Asynchronous Clear (FDC), one is D Flip-Flop with Asynchronous Preset (FDP) and eight are D Flip-Flop with Clock Enable and Synchronous Reset (FDRE). It also includes one clock buffer, ten input buffers and five output buffers.

## B. Waveform of GCD



Figure 4: Verilog Test Fixture of GCD In Figure 4, there are two inputs. One is 1000 i.e. 8. Others is 1010 i.e. 10. We know that GCD(8,10) is 2. Which we see on GCD\_Out.

C. Report Utilization of GCD

| Denister                                                                        |                               |        |
|---------------------------------------------------------------------------------|-------------------------------|--------|
| register                                                                        |                               |        |
| Available:                                                                      |                               | 279360 |
| Estimation:                                                                     |                               |        |
|                                                                                 |                               |        |
| LUI                                                                             |                               |        |
| Available:                                                                      |                               | 139680 |
| Estimation                                                                      | Eul 37 (<1% of available) CCD |        |
| Estimotion                                                                      | E J J (<1/6 01 available) GCD |        |
|                                                                                 |                               |        |
| - Global Clock Bu                                                               | ffer                          |        |
| Global Clock Bu                                                                 | ffer                          |        |
| Global Clock Bu<br>Available:                                                   | ffer                          | 32     |
| Global Clock Bu<br>Available:<br>Estimation:                                    | ffer<br>                      | 32     |
| Global Clock Bu<br>Available:<br>Estimation:                                    | ffer<br>                      | 32     |
| Global Clock Bu<br>Available:<br>Estimation:                                    | ffer<br>                      | 32     |
| Global Clock Bu<br>Available:<br>Estimation:<br>IO<br>Available:                | ffer<br>                      | 240    |
| Global Clock Bu<br>Available:<br>Estimation:<br>IO<br>Available:<br>Estimation: | ffer<br>                      | 240    |

Figure 5: Report Utilization of GCD

In Figure 5, shows that out of available 279360 registers, only 19 resisters is been used, which is less than 1% as compared to overall number of registers. Only 37 Look Up Table has been used in total number of 139680 available LUT. Out of available Global Clock Buffers and IO, only 3% and 7% respectively has been utilized.

## IV. CAPACITANCE SCALING AMBIENT TEMPERATURE(CSAT)

CSAT is a combination of two well tested energy and thermal efficient algorithm. One is capacitance scaling and the other is ambient temperature control. First we apply capacitance scaling then we get energy and thermal efficient GCD. For further reduction of power dissipation and junction temperature, we control ambient temperature, and got the most energy efficient GCD. As the frequency increases, I/O power and Leakage power also increases. Junction temperature is also directly proportional with frequency. Here ambient temperature is 25 °C. Ambient temperature is also called room temperature.

A. Power And Junction Temperature Consumption at100pF

|        | Leakage | I/O    | Junction    |  |
|--------|---------|--------|-------------|--|
|        | Power   | Power  | Temperature |  |
| 1GHz   | 1.295   | 0.058  | 53.8        |  |
| 10GHz  | 1.314   | 0.581  | 56.2        |  |
| 100GHz | 1.544   | 5.809  | 79.9        |  |
| 1THz   | 1.633   | 50.086 | 125.0       |  |

Table 1: Power and Junction Temperature Consumption at 100pF

In Table 1, I/O power is 0.058W, 0.581W, 5.809W and 50.086 and Leakage power is 1.295W, 1.314W, 1.544W and 1.633W at 1GHz, 10GHz, 100GHz and 1THz frequency respectively. Junction temperature is  $53.8 \degree$ C,  $56.2 \degree$ C,  $79.9 \degree$ C and  $125.0 \degree$ C at 1GHz, 10GHz, 100GHz and 1THz respectively as shown in Table 1. The result shown in Table 1 is for 100pF output load.

|        | Leakage | I/O         | Junction |
|--------|---------|-------------|----------|
|        | Power   | Power Power |          |
| 1GHz   | 1.295   | 0.049       | 53.8     |
| 10GHz  | 1.312   | 0.495       | 55.9     |
| 100GHz | 1.518   | 4.947       | 77.5     |
| 1THz   | 1.633   | 49.466      | 125.0    |

B. Power And Junction Temperature Consumption at80pF

Table 2: Power and Junction Temperature Consumption at 80pF

C. Power And Junction Temperature Consumption at60pF

|        | Leakage         | I/O    | Junction    |  |
|--------|-----------------|--------|-------------|--|
|        | Power Power Tem |        | Temperature |  |
| 1GHz   | 1.295           | 0.041  | 53.7        |  |
| 10GHz  | 1.310           | 0.408  | 55.7        |  |
| 100GHz | 1.492           | 4.085  | 75.1        |  |
| 1THz   | 1.633           | 40.846 | 125.0       |  |

Table 3: Power and Junction Temperature Consumption at 60pF

In Table 3, for 1GHz, 10GHz, 100GHz and 1THz device operating frequency, Leakage power is 1.295W, 1.310W, 1.492W and 1.633W respectively and IO power is 0.041W, 0.408W, 4.085W and 40.846W respectively and Junction Temperature is 153.7 °C, 55.7 °C, 75.1 °C and 125 °C respectively while capacitance is 60pF. The result shown in Table 3 is for 60pF output load.

D. Power And Junction Temperature Consumption at 40pF

|        |         | -      | _           |  |
|--------|---------|--------|-------------|--|
|        | Leakage | I/O    | Junction    |  |
|        | Power   | Power  | Temperature |  |
| 1GHz   | 1.294   | 0.032  | 53.7        |  |
| 10GHz  | 1.308   | 0.322  | 55.4        |  |
| 100GHz | 1.466   | 3.223  | 72.6        |  |
| 1THz   | 1.633   | 32.227 | 125.0       |  |

Table 4: Power and Junction Temperature Consumption at 40pF

It is recommended that for a operating device, its junction temperature must be less than 125 °C. In Table 4, at 40pF Junction temperature is 53.7 °C, 55.4 °C, 72.6 °C and 125.0 °C, Leakage power is 1.294W, 1.308W, 1.466W and 1.633W, and IO power is 0.032W, 0.322W, 3.223W and 32.227W, for 1GHz, 10GHz, 100GHz and 1THz device operating frequencies respectively. The result shown in Table 4 is for 40pF output load.

In Table 2, Junction temperature is 53.8 °C, 55.9 °C, 77.5 °C and 125.0 °C, Leakage power is 1.295W, 1.312W, 1.518W and 1.633W and I/O power is 49mW, 495mW, 4.947W and 49.466W at 1GHz, 10GHz, 100GHz and 1THz respectively. The result shown in Table 2 is for 80pF output load.

Gyancity Journal of Engineering and Technology Vol.1 No.1 January 2015 ISSN: 2456-0065 DOI: 10.21058/gjet.2015.1101

| Table 5: Power and Junction Temperature Consumption at 20pF |        |         |        |             |  |  |
|-------------------------------------------------------------|--------|---------|--------|-------------|--|--|
|                                                             |        | Leakage | I/O    | Junction    |  |  |
|                                                             |        | Power   | Power  | Temperature |  |  |
|                                                             | 1GHz   | 1.294   | 0.024  | 53.7        |  |  |
|                                                             | 10GHz  | 1.306   | 0.236  | 55.2        |  |  |
|                                                             | 100GHz | 1.492   | 2.361  | 70.2        |  |  |
|                                                             | 1THz   | 1.633   | 23.607 | 125.0       |  |  |

*E.* Power And Junction Temperature Consumption at20pF

For 20pF Output Load, In Table 5, Leakage power is 1.24W, 1.306W, 1.492W and 1.633W, I/O power is 0.024W, 0.236W, 2.361W and 23.607W and Junction temperature is 53.7 °C, 55.2 °C, 70.2 °C and 125.0 °C on 1GHz, 10GHz, 100GHz and 1THz device operating frequency respectively.

F. Power And Junction Temperature Consumption at 0pF

|        | Leakage | I/O    | Junction    |  |
|--------|---------|--------|-------------|--|
|        | Power   | Power  | Temperature |  |
| 1GHz   | 1.294   | 0.015  | 53.7        |  |
| 10GHz  | 1.304   | 0.150  | 55.0        |  |
| 100GHz | 1.419   | 1.499  | 67.8        |  |
| 1THz   | 1.633   | 14.987 | 125.0       |  |

Table 6: Power and Junction Temperature Consumption at 0pF

In Table 6, I/O power is 0.015W, 0.150W, 1.499W and 14.987W, Leakage power is 1.294W, 1.304W, 1.419W and 1.633W, and Junction temperature is  $53.7 \,^{\circ}$ C,  $55.0 \,^{\circ}$ C,  $67.8 \,^{\circ}$ C and  $125 \,^{\circ}$ C at 1GHz, 10GHz, 100GHz and 1THz respectively while 0pF capacitance is taken.

G. Comparsion of I/O power at Different Frequencies

| Capacitance→ | 100pF  | 80pF   | 60pF   | 40pF   | 20pF   | 0pF    |
|--------------|--------|--------|--------|--------|--------|--------|
| Frequency↓   |        |        |        |        |        |        |
| 1GHz         | 0.058  | 0.049  | 0.041  | 0.032  | 0.024  | 0.015  |
| 10GHZ        | 0.581  | 0.495  | 0.408  | 0.322  | 0.236  | 0.150  |
| 100GHz       | 5.809  | 4.947  | 4.085  | 3.223  | 2.361  | 1.499  |
| 1THz         | 50.086 | 49.466 | 40.846 | 32.227 | 23.607 | 14.987 |

 Table 7: I/O Power Comparison at different Frequencies

Table 7 and Figure 6-7, shows the variance of I/O power at different capacitance values under different frequencies. There is 15.22% reduction in I/O Power when we scale down capacitance from 100pF to 80pF. There is 29.31%, 44.83%, 58.62%, and 74.14% reduction when we further scale capacitance to 60pF, 40pF, 20pF and 0pF.

Gyancity Journal of Engineering and Technology Vol.1 No.1 January 2015 ISSN: 2456-0065 DOI: 10.21058/gjet.2015.1101



Figure 2: I/O Power for Different Capacitance and 100GHz, 1THz Frequency



Figure 7: I/O Power for Different Capacitance and 100GHz, 1THz Frequency

## H. Comparsion of Junction Temperature at Different Frequencies

| Capacitance→ | 100pF  | 80pF   | 60pF   | 40pF   | 20pF   | 0pF    |
|--------------|--------|--------|--------|--------|--------|--------|
| Frequency↓   |        |        |        |        |        |        |
| 1GHz         | 53.8C  | 53.8C  | 53.7C  | 53.7C  | 53.7C  | 53.7C  |
| 10GHZ        | 56.2C  | 55.9C  | 55.7C  | 55.4C  | 55.2C  | 55.0C  |
| 100GHz       | 79.9C  | 77.5C  | 75.1C  | 72.6C  | 70.2C  | 67.8C  |
| 1THz         | 125.0C | 125.0C | 125.0C | 125.0C | 125.0C | 125.0C |

Table 8: Junction Temperature Comparison at different Frequencies

Junction temperature of FPGA is measured in Celsius. In Table 8 and Figure 8, when we scale down capacitance from 100pF to 80pF, 60pF, 40pF, 20pF and 0pF, there is 3%, 6%, 9.14%, 12.14% and 15.19% reduction in junction temperature respectively at 100GHz device operating frequency. On 1THz operating frequency device will not work because junction temperature approaches to 125 °C.

Gyancity Journal of Engineering and Technology Vol.1 No.1 January 2015 ISSN: 2456-0065 DOI: 10.21058/gjet.2015.1101



Figure 8: Comparison of Junction Temperatures at Different Capacitances and Frequencies

## V. CONCLUSION

With application of energy efficient CSAT technique, we design an energy and thermal efficient GCD generator. We are using Xilinx ISE 14.4 as simulator and Verilog as HDL. There is 74.1% less I/O power reduction is observed when we are changing the capacitance from 100 pF to 0 pF, while the device is operating at 1GHz, 10GHz and 100GHz. On 1THz it is noticed that by changing capacitance from 100 pF to 0 pF, we are getting 70.07% reduction in I/O power. When capacitance change from 100pF to 0pF, and GCD generator is operating at 1GHz, 10GHz and 100GHz device operating frequencies, we get 0.18%, 2.13% and 15.1% respectively less reduction in junction temperature. The junction temperature of device approaches 125 °C for each capacitance at 1THz operating frequency. The device will not work anymore at this junction temperature.

### VI. FUTURE SCOPE

Using CSAT technique, this GCD generator is implemented on 40nm 6 series FPGA. There is open possibility to implement this GCD generator on 28nm and 20nm ultra scale FPGA. We have implemented CSAT technique for designing energy efficient GCD generator, there is a wide scope to use this technique on different components of processor and arithmetic circuits, in order to make more power optimized processor component and arithmetic circuit.

#### REFERENCES

- [1] W. A. Deabes, M. Abdallah, O. Elkeelany, M. A. Abdelrahman, "Reconfigurable wireless standalone platform for Electrical Capacitance Tomography", IEEE Symposium on Computational Intelligence in Control and Automation (CICA), pp.112-116, 2009
- [2] B.Pandey, J. Yadav, D. Singh, V. Parthiban, "Capacitance based Low Power ALU Design and Implementation on 28nm FPGA", International Journal of Scientific Engineering and Technology, Vol.2, Issue.6, June 2013 (ISSN : 2277-1581)
- [3] 7 Series FPGAs PCB Design and Pin Planning Guide, http://www.xilinx.com/support/documentation/user\_guides/ug373.pdf
- [4] N. Zhang, X. H. Wang, H. Tang, A. Z. H. Wang, Wang, Z.Hua, Y. B. Chi, "Low-voltage and highspeed FPGA I/O cell design in 90nm CMOS", IEEE 8th International Conference on ASIC, 2009, pp.533-536, ASICON '09.
- [5] Y. W. Kim et.al, Low Power CMOS Synchronous Counter with Clcok gating Embedded into Carry Propogation, IEEE Trnsaction on Circuits and Sytems II: Express briefs, vol.56, Issue:8, pp. 649-653, 2009

- [6] Yohwan Yoon; Deog-Kyoon Jeong, "A Multidrop Bus Design Scheme With Resistor-Based Impedance Matching on Nonuniform Impedance Lines", IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.58, Issue.6, 2011.
- [7] S.A. Zamurovic, A.D. Koffman, B.C. Waltrip, Y. Wang, "Evaluation of a Capacitance Scaling System", IEEE Transactions on Instrumentation and Measurement, Volume:56, Issue: 6, pp.2160-2163, 2007
- [8] S. Bhoj, D. Bhatia, "Pre-Route Interconnect Capacitance and Power Estimation in FPGAs", International Conference on Field Programmable Logic and Applications, pp.159-164, 2007
- [9] T Das, B Pandey, Md A Rahman, T Kumar "SSTL Based Green Image ALU Design on different FPGA", IEEE International conference on Green Computing, Communication and Conservation of Energy(ICGCE), 2013.