CHAPTER 5
PERFORMANCE AND ANALYSIS
- INTRODUCTION
The digital filter is considered as the essential requirement in the signal processing based applications. Due to the digital screen, there are two types. One is finite impulse response; another one is the infinite impulse response. The tremendous impulse response is contrasted with the limited impulse response based function. Now we preferred the finite impulse response; for the service based on the stability, they also produce the readily available linear-phase property and low quantization word length sensitivity. All these desirable characteristics come with a drawback compared to their recursive counterparts IIR filters: increased computational workload, hence power. This, in turn, leads to an excessive amount of power applications. In the majority of digital signal processing (DSP) applications, the critical operations are the adder. It consumes low power delay and is always a key to achieve a high-performance digital signal processing system. Finite impulse response (FIR) filters are widely used in various DSP applications. This chapter is to analyze optimized area efficiency and power delay product for FIR filter. Don't use plagiarised sources.Get your custom essay just from $11/page
- RESULTS AND DISCUSSION
5.2.1 ERROR COMPARISON
An important metric to evaluate the performance of approximate computing designs is the percentage of accuracy, which can be estimated in terms of error percentage.
Overall error (OE): = |Rc ∼ Re|, OE is the difference between the correct result Rc and the result obtained by the adder Re. Percentage of error tolerance = [OE/Rc]/100%.
Accuracy (ACC): It indicates how ‘correct’ the output of an adder is for the given input. ACC = (1 − (OE/Rc))/100%. Its value ranges from 0 to 100%.
Parameter | Standard CSLA | ET-CSLA | SAET-CSLA | Proposed EFA | Proposed FTFA 1 | Proposed FTFA 2 |
Rc | 41,775 | 41,775 | 41,775 | 41,775 | 41,775 | 41,775 |
Re | 41,775 | 41,071 | 41,839 | 41,775 | 41,771 | 41,743 |
OE | – | 704 | 64 | 0 | 4 | 32 |
OE tolerance | – | 0.0168 | 0.0015 | 0.0 | 0.000009 | 0.00007 |
Accuracy is (1-OE/Rc)*100% | – | 98.32 | 98.85 | 100 | 99.99 | 99.92 |
Table 5.1 Error metrics estimation of proposed and previous approximate adder designs for sample inputs
Adder design | Area | Delay ps | Power W | PDP* | Average error% | Maximum error |
Conv.CSLA[13] | 593 | 1764 | 27,537 | 48.6 | – | – |
ET-CSLA[22] | 404 | 1725 | 13,876 | 23.9 | 30.23 | 32,768 |
SAET-CSLA [23] | 481 | 1725 | 19,645 | 33.9 | 2.36 | 256 |
Proposed EFA | 461 | 1979 | 21,618 | 42.8 | 0.67 | 17 |
Proposed FTFA 1 | 426 | 1718 | 17,579 | 30.2 | 0.73 | 256 |
Proposed FTFA 2 | 385 | 1718 | 17,173 | 29.5 | 1.55 | 256 |
Table 5.2 Area, delay and power estimates of proposed and state of the art designs for n − 16
Figure 5.1 Block diagram of the image blending system
An illustration demonstrating output and error metrics estimation of various adder designs for the sample inputs 23,445 and 18,330 is shown in Table 5.1. Based on the above methodology, error metrics of different models are estimated with a random set of data, and the percentage of average error is calculated based on accuracy and is shown in Table 5.2. It is seen from Table 5.2 that the average error of proposed designs is significantly lower compared to all other models, thanks to faithful approximation logic.
Note that ET-CSLA demonstrates a 30.2% average error, which is considerably higher for any error-tolerant application. This is due to the use of approximate FA cells in the least and most significant part of the adder. Though in SAET-CSLA design, significance approximation is performed, the error percentage is relatively higher compared to proposed plans.
5.3 AREA, POWER AND DELAY COMPARISON
The proposed faithful parallel adder and similar state-of-the-art designs used for comparison viz., conventional CSLA, ET- CSL, SAET-CSLA are designed using structural Verilog HDL code and synthesized using Cadence Encounter with 90 nm technology in ASIC platform. We have used conventional CSLA architecture as the standard for performance comparison of all the error-tolerant approaches. Performance measures in terms of area, delay, power dissipation in terms of power, and power-delay product (PDP) of the adder designs are shown in Table 5.2 for n = 16. It is evident from Table 4 that ET-CSLA demonstrates lower area and PDP compared to proposed models and SAET-CSLA; this is due to the approximation of logic in the least and most significant part of the design. However, the maximum error is significantly more significant in ET-CSLA compared to all other models. Proposed plans viz., proposed-EFA, proposed-FTFA1, and proposed-FTFA2 demonstrate an area reduction of 4, 11.4, and 19.9%, respectively, compared to SAET-CSLA, but the maximum errors of all the designs are limited to 2n/2. Also, note that the PDP of proposed-EFA is higher than ET-CSLA and SAET-CSLA. However, it can restrict maximum error to significantly negligible value.
5.4 FPGA IMPLEMENTATION IN DIGITAL IMAGE PROCESSING
To determine the novelty of the proposed parallel adder in error-tolerant applications and verify its driving capability and implementation in processing applications viz., image blending and filtering are done in the FPGA platform. The Verilog HDL model of the proposed adder and designs used for comparison are synthesized using Xilinx ISE 14.2 tool, and hardware for application system is prototyped on Spartan 6 FPGA (XC6XLX45-CSG324 device). Input images are fed from the MATLAB environment to the hardware on FPGA using Xilinx-MATLAB co-simulation with the System Generator tool.
5.4.1 IMAGE BLENDING
Image blending is the process of adding the pixel values of one image in a linear combination with corresponding pixel values of the same size image. The pixel value of the resulting image is calculated using
G (x, y) = (1 – * f1(x, y) +
where f1(x, y) and f2(x, y) are the pixel values of input images, α is the ratio factor which influences the contribution of each input image on the pixel intensity in output image; an effective transition is obtained by varying the value of α from 0 to 1. Figure 8 shows the architecture for image blending. F1 (x, y) and f2(x, y) are the input images of size (256 * 256), which are scaled by factor α and 1 − α, respectively, before adding. The scaled images are added by the adder unit to produce the resultant blended image. The quality metrics MSE and structural similarity index (SSIM) between output images processed by blending system designed with error-tolerant and exact adders are used as a measure to evaluate the performance of the proposed approximate designs
MSE = 2
Equation (3.23) shows the computation of MSE where G(x, y) and G′ (x, y) is the exact and error-tolerant system outputs, a and b represent the size of an image.
Figure 9 shows the standard input images and the blended output images obtained from a system implemented with various adder designs for α = 0.8 and 0.2. Note that SSIM values between the output images processed by a conventional adder system and proposed adder implemented systems are >0.97. At the same time, that of ET- CSLA, and SAET-CSLA performed systems are significantly low.
Figure 5.2 Input and output images processed by image blending system implemented with proposed and state-of-art adder design. (a) Input images f1and f2, (b)–(g) Output images processed by image blending system implemented with various adders, (b) Conventional CSLA, (c) ET-CSLA, (d) SAET-CSLA, (e) Proposed-EFA, (f) Proposed-FTFA1, (g) Proposed-FTFA2 for (i) α = 0.8 and (ii) α = 0.2.
Design | Area (no of LUTs) | Power mW |
Conv. CSLA | 33 | 36.1 |
ET-CSLA | 24 | 18.1 |
SAET-CSLA | 24 | 26.1 |
Proposed EFA | 26 | 28.3 |
Proposed FTFA 1 | 22 | 23.4 |
Proposed FTFA 2 | 22 | 22.5 |
Table 5.3 Comparison of area, power on Spartan 6 FPGA hardware of image blending system
Also, it is observed from Figure 5.2 that the visual quality of output blended images obtained from proposed adder designs are almost similar to the output image obtained from the standard adder system. In contrast, the output images obtained from ET-CSLA and SAET-CSLA based blended systems show the visual difference. This is due to the faithful approximation logic incorporated in the proposed adder designs. Table 5.3 shows the hardware area in terms of the number of LUTs occupied and power consumption of the image blending system implemented with proposed and prior parallel adder designs. Note that ET-CSLA occupies the lowest hardware area and demonstrate 50% reduced power dissipation compared to standard CSLA system. Proposed parallel adders viz., proposed-EFA, suggested- FTFA1 and proposed-FTFA2 implemented blending modes show 21.6, 35.2% and 37.7% power reduction compared to the standard system. Also note that area occupied by proposed-FTFA1 and proposed-FTFA2 implemented policies is low compared to other adder implemented systems.
5.4.2 MEAN FILTER
To evaluate the performance of proposed error-tolerant adder designs in repeated addition, implementation in the parallel mean filter is done. The mean filter performs smoothing operation on images by reducing the intensity variation between one pixel and the next. The architecture of parallel mean filter with three × three input window is shown in Figure 10a and b. The proposed and prior approximate adder designs are implemented in the parallel mean filter. The quality metric PSNR is used to measure the filter efficiency. In contrast, the metrics MSE and SSIM are estimated between output images processed by low filter implemented with conventional CSLA and error-tolerant adder designs. Below equation shows the computation of PSNR
PSNR = 20 log
Figure 5.4 shows the input image and output images processed by the mean filter implemented with conventional and various approximate designs. Note from Figure 13 that the visual quality of output image processed by proposed approximate systems is almost similar to traditional CSLA implemented filter system, while the output image obtained from ET-CSLA based mean filter shows few patches.
Figure 5.3 Mean filter (a) Block diagram, (b) Parallel mean computation unit.
Figure 5.4 Input and output images processed by parallel mean filter implemented with proposed and state-of-art adder designs. (a) Input image, (b) Output images processed by mean filter implemented with conventional CSLA, (c) ET-CSLA, (d) SAET-CSLA, (e) Proposed-EFA, (f) Proposed-FTFA1, (g) Proposed-FTFA2
This is due to the approximation of FA cells used in the most and least significant part of the adder. Also, a note from Figure 5.4 that the SSIM values of processed images obtained from proposed adder designs are significantly higher compared to processed images obtained from ET-CSLA and SAET_CSLA based mean filter. Also, it is noted that the MSE value of output processed images by ET-CSLA and SAET-CSLA based filter is significantly high compared to propose design implemented mean filter. Concerning the PSNR metric, the proposed approximate designs perform better compared to SAET_CSLA and ET-CSLA designs.
5.4.3 FIR FILTER
To verify the functionality of the proposed parallel adder in the Digital Signal Processing application, an implementation in 27 taps finite impulse response (FIR) filter is done. The architecture of the FIR filter is shown in Figure 5.5. Coefficients for the FIR filter are selected using Remeez command in MATLAB.
Figure 5.5 block diagram of the FIR filter
Figure 5.6 Speech Signal processed by FIR filter implemented with conventional and proposed adder (a) Input speech signal, (b) Sampled values of bird’s eye view of a speech signal, (c) Outputs of FIR filter implemented with conventional & proposed-FTFA2 designs
The sampled version of the portion of the bird’s eye view of the input speech signal shown in Figure 5.4 is shown in Figure 5.4 b applied as input to the filter. The processed outputs of the FIR filter implemented with a standard adder and proposed-FTFA2 faithful adder are shown in Figure 5.4 c. Note that we have used proposed-FTFA2 design only for FIR filter implementation since it has a high error percentage compared to the other two proposed models. It is evident from Figure 5.4 c that the proposed parallel approximate adder design shows almost similar performance compared to a standard adder in the FIR filter system with a maximum error deviation of 3.3% at the highlighted area. This maximum error is tolerable for a few signal and image processing applications.
5.5 RESULTS AND DISCUSSION
The simulation result of a proposed exact full adder by using 95.833 ns when pr [31.0], as [15.0], bs [15.0] are shown in figure 5.7a.
Figure 5.7a schematic sequential circuit of proposed exact full adder
The simulation result of an introduced correct full snake by using 139.482 ns when pr [31.0], as [15.0], bs [15.0] are shown in figure 5.7b.
Figure 5.7 b schematic sequential circuit of proposed exact full adder
The simulation result of the proposed Fault-tolerant full adder design one by using 90.130 ns when pr [31.0], as [15.0], bs [15.0] are shown in figure 5.8a.
Figure 5.8a schematic sequential circuit of proposed FTFA design 1
The simulation result of the proposed Fault-tolerant full adder design one by using 140.145 ns when pr [31.0], as [15.0], bs [15.0] are shown in figure 5.8b.
Figure 5.8b schematic sequential circuit of proposed FTFA design 1
The simulation result of proposed Fault-tolerant full adder design two by using 90.995 ns when pr [31.0], as [15.0], bs [15.0] are shown in figure 4.19a.
Figure 5.9a schematic sequential circuit of proposed FTFA design 2
The simulation result of proposed Fault-tolerant full adder design two by using 139.208 ns when pr [31.0], as [15.0], bs [15.0] are shown in figure 5.9b.
Figure 5.9b schematic sequential circuit of proposed FTFA design 2
The simulation result of the proposed array multiplier by using 99.728 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.10a.
Figure 5.10a schematic sequential circuit of proposed array multiplier
The simulation result of the proposed array multiplier by using 138.993 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.10b.
Figure 5.10b schematic sequential circuit of proposed array multiplier
The simulation result of the submitted data multiplier by using 100.050 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.11a.
Figure 5.11a schematic sequential circuit of proposed dadda multiplier
The simulation result of the submitted data multiplier by using 139.830 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.12b.
Figure 5.12b schematic sequential circuit of proposed dadda multiplier
The simulation result of the proposed Vedic multiplier by using 89.763 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.13a.
Figure 5.13a schematic sequential circuit of proposed Vedic multiplier
The simulation result of the proposed Vedic multiplier by using 139.068 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.13b.
Figure 5.13b schematic sequential circuit of proposed Vedic multiplier
The simulation result of the proposed Wallace multiplier by using 89.915 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.14a.
Figure 5.14a schematic sequential circuit of proposed Wallace multiplier
The simulation result of the proposed Wallace multiplier by using 139.604 ns when z [31.0], x [15.0], y [15.0] are shown in figure 5.14b.
Figure 4.14b schematic sequential circuit of proposed Wallace multiplier
5.6 SUMMARY
The proposed ET-CSLA demonstrates better average error, which is significantly higher for any error-tolerant application. This is due to the use of approximate FA cells in the least and most significant part of the adder. Though in SAET-CSLA design significance approximation is performed, the error percentage is relatively higher compared to proposed designs