info@itechprosolutions.in | +91 9790176891

VLSI 2015 Projects

Category Archives

Optimal Factoring of FIR Filters

Abstract:

New insights suggest that the most efficient FIR digital filters can be created by using a scaled sequence of stages, each representing a factor of the filter‘s transfer function. A crucial capability for building such filters concerns finding the best FIR filter factors, then carefully scaling and sequencing them. The efficiency of the resulting structure depends heavily upon obtaining such optimal factors. We offer an algorithm to find, scale and sequence optimally factored FIR filters.


Novel Design Algorithm for Low Complexity Programmable FIR Filters Based on Extended Double Base Number System

Abstract:

Coefficient multipliers are the stumbling blocks in programmable finite impulse response (FIR) digital filters. As the filter coefficients change either dynamically or periodically, the search for common subexpressions for multiplierless implementation needs to be performed over the entire gamut of integers of the desired precision, and the amount of shifts associated with each identified common subexpression needs to be memorized. The complexity of a quality search is thus beyond the existing design algorithms based on conventional binary and signed digit representations. This paper presents a new design paradigm for the programmable FIR filters by exploiting the extended double base number system (EDBNS). Due to its sparsity and innate abstraction of the sum of binary shifted partial products, the sharing of adders in the time-multiplexed multiple constant multiplication block of the programmable FIR filters can be maximized by a direct mapping from the quasi-minimum EDBNS. The multiplexing cost can be further reduced by merging double base terms. Logic synthesis results on more than one hundred programmable filters with filter taps ranging from 10 to 100 and coefficient word lengths of 8, 12, and 16 bits show that the average logic complexity and critical path delay of the programmable FIR filters designed by our proposed algorithm have been reduced by up to 47.81% and 14.32%, respectively over the existing design methods.


High-Throughput Finite Field Multipliers Using Redundant Basis for FPGA and ASIC Implementations

Abstract:

Redundant basis (RB) multipliers over Galois Field ( GF(2m)) have gained huge popularity in elliptic curve cryptography (ECC) mainly because of their negligible hardware cost for squaring and modular reduction. In this paper, we have proposed a novel recursive decomposition algorithm for RB multiplication to obtain highthroughput digit-serial implementation. Through efficient projection of signal-flow graph (SFG) of the proposed algorithm, a highly regular processor-space flow-graph (PSFG) is derived. By identifying suitable cut-sets, we have modified the PSFG suitably and performed efficient feed-forward cut-set retiming to derive three novel multipliers which not only involve significantly less time-complexity than the existing ones but also require less area and less power consumption compared with the others. Both theoretical analysis and synthesis results confirm the efficiency of proposed multipliers over the existing ones. The synthesis results for field programmable gate array (FPGA) and application specific integrated circuit (ASIC) realization of the proposed designs and competing existing designs are compared. It is shown that the proposed highthroughput structures are the best among the corresponding designs, for FPGA and ASIC implementation. It is shown that the proposed designs can achieve up to 94% and 60% savings of area-delay-power product (ADPP) on FPGA and ASIC implementation over the best of the existing designs, respectively.


Fault Tolerant Parallel Filters Based on Error Correction Codes

Abstract:

Digital filters are widely used in signal processing and communication systems. In some cases, the reliability of those systems is critical, and fault tolerant filter implementations are needed. Over the years, many techniques that exploit the filters‘ structure and properties to achieve fault tolerance have been proposed. As technology scales, it enables more complex systems that incorporate many filters. In those complex systems, it is common that some of the filters operate in parallel, for example, by applying the same filter to different input signals. Recently, a simple technique that exploits the presence of parallel filters to achieve fault tolerance has been presented. In this brief, that idea is generalized to show that parallel filters can be protected using error correction codes (ECCs) in which each filter is the equivalent of a bit in a traditional ECC. This new scheme allows more efficient protection when the number of parallel filters is large. The technique is evaluated using a case study of parallel finite impulse response filters showing the effectiveness in terms of protection and implementation cost.


Exact and Approximate Algorithms for the Filter Design Optimization Problem

Abstract:

The filter design optimization (FDO) problem is defined as finding a set of filter coefficients that yields a filter design with minimum complexity, satisfying the filter constraints. It has received a tremendous interest due to the widespread application of filters. Assuming that the coefficient multiplications in the filter design are realized under a shift-adds architecture, the complexity is generally defined in terms of the total number of adders and subtractors. In this paper, we present an exact FDO algorithm that can guarantee the minimum design complexity under the minimum quantization value, but can only be applied to filters with a small number of coefficients. We also introduce an approximate algorithm that can handle filters with a large number of coefficients using less computational resources than the exact FDO algorithm and find better solutions than existing FDO heuristics. We describe how these algorithms can be modified to handle a delay constraint in the shift-adds designs of the multiplier blocks and to target different filter constraints and filter forms. Experimental results show the effectiveness of the proposed algorithms with respect to prominent FDO algorithms and explore the impact of design parameters, such as the filter length, quantization value, and filter form, on the complexity and performance of filter designs.


An Efficient VLSI Architecture of a Reconfigurable Pulse-Shaping FIR Interpolation Filter for Multistandard DUC

Abstract:

This brief proposes a two-step optimization technique for designing a reconfigurable VLSI architecture of an interpolation filter for multistandard digital up converter (DUC) to reduce the power and area consumption. The proposed technique initially reduces the number of multiplications per input sample and additions per input sample by 83% in comparison with individual implementation of each standard’s filter while designing a root-raised-cosine finite-impulse response filter for multistandard DUC for three different standards. In the next step, a 2-bit binary common subexpression (BCS)-based BCS elimination algorithm has been proposed to design an efficient constant multiplier, which is the basic element of any filter. This technique has succeeded in reducing the area and power usage by 41% and 38%, respectively, along with 36% improvement in operating frequency over a 3-bit BCS-based technique reported earlier, and can be considered more appropriate for designing the multistandard DUC.


An Efficient Constant Multiplier Architecture Based on Vertical-Horizontal Binary Common Sub-expression Elimination Algorithm for Reconfigurable FIR Filter Synthesis

Abstract:

This paper proposes an efficient constant multiplier architecture based on verticalhorizontal binary common subexpression elimination (VHBCSE) algorithm for designing a reconfigurable finite impulse response (FIR) filter whose coefficients can dynamically change in real time. To design an efficient reconfigurable FIR filter, according to the proposed VHBCSE algorithm, 2-bit binary common subexpression elimination (BCSE) algorithm has been applied vertically across adjacent coefficients on the 2-D space of the coefficient matrix initially, followed by applying variable-bit BCSE algorithm horizontally within each coefficient. This technique is capable of reducing the average probability of use or the switching activity of the multiplier block adders by 6.2% and 19.6% as compared to that of two existing 2-bit and 3-bit BCSE algorithms respectively. ASIC implementation results of FIR filters using this multiplier show that the proposed VHBCSE algorithm is also successful in reducing the average power consumption by 32% and 52% along with an improvement in the area power product (APP) by 25% and 66% compared to those of the 2-bit and 3-bit BCSE algorithms respectively. As regards the implementation of FIR filter, improvements of 13% and 28% in area delay product (ADP) and 76.1% and 77.8% in power delay product (PDP) for the proposed VHBCSE algorithm have been achieved over those of the earlier multiple constant multiplication (MCM) algorithms, viz. faithfully rounded truncated multiple constant multiplication/accumulation (MCMAT) and multi-root binary partition graph (MBPG) respectively. Efficiency shown by the results of comparing the FPGA and ASIC implementations of the reconfigurable FIR filter designed using VHBCSE algorithm based constant multiplier establishes the suitability of the proposed algorithm for efficient fixed point reconfigurable FIR filter synthesis.


A High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

Abstract:

Transpose form finite-impulse response (FIR) filters are inherently pipelined and support multiple constant multiplications (MCM) technique that results in significant saving of computation. However, transpose form configuration does not directly support the block processing unlike direct-form configuration. In this paper, we explore the possibility of realization of block FIR filter in transpose form configuration for area-delay efficient realization of large order FIR filters for both fixed and reconfigurable applications. Based on a detailed computational analysis of transpose form configuration of FIR filter, we have derived a flow graph for transpose form block FIR filter with optimized register complexity. A generalized block formulation is presented for transpose form FIR filter. We have derived a general multiplier-based architecture for the proposed transpose form block filter for reconfigurable applications. A low-complexity design using the MCM scheme is also presented for the block implementation of fixed FIR filters. The proposed structure involves significantly less area-delay product (ADP) and less energy per sample (EPS) than the existing block implementation of direct-form structure for medium or large filter lengths, while for the short-length filters, the block implementation of direct-form FIR structure has less ADP and less EPS than the proposed structure. Application-specific integrated circuit synthesis result shows that the proposed structure for block size 4 and filter length 64 involves 42% less ADP and 40% less EPS than the best available FIR filter structure proposed for reconfigurable applications. For the same filter length and the same block size, the proposed structure involves 13% less ADP and 12.8% less EPS than that of the existing direct-form block FIR structure.


Reverse Converter Design via Parallel-Prefix Adders: Novel Components, Methodology, and Implementations

Abstract:

In this brief, the implementation of residue number system reverse converters based on well-known regular and modular parallelprefix adders is analyzed. The VLSI implementation results show a significant delay reduction and area × time2 improvements, all this at the cost of higher power consumption, which is the main reason preventing the use of parallelprefix adders to achieve high-speed reverse converters in nowadays systems. Hence, to solve the high power consumption problem, novel specific hybrid parallelprefix-based adder components that provide better tradeoff between delay and power consumption are herein presented to design reverse converters. A methodology is also described to design reverse converters based on different kinds of prefix adders. This methodology helps the designer to adjust the performance of the reverse converter based on the target application and existing constraints.


Reliable Low-Power Multiplier Design Using Fixed-Width Replica Redundancy Block

Abstract:

In this paper, we propose a reliable lowpower multiplier design by adopting algorithmic noise tolerant (ANT) architecture with the fixedwidth multiplier to build the reduced precision replica redundancy block (RPR). The proposed ANT architecture can meet the demand of high precision, low power consumption, and area efficiency. We design the fixedwidth RPR with error compensation circuit via analyzing of probability and statistics. Using the partial product terms of input correction vector and minor input correction vector to lower the truncation errors, the hardware complexity of error compensation circuit can be simplified. In a 12×12 bit ANT multiplier, circuit area in our fixedwidth RPR can be lowered by 44.55% and power consumption in our ANT design can be saved by 23% as compared with the state-of-art ANT design.


Recursive Approach to the Design of a Parallel Self-Timed Adder

Abstract:

This brief presents a parallel single-rail selftimed adder. It is based on a recursive formulation for performing multibit binary addition. The operation is parallel for those bits that do not need any carry chain propagation. Thus, the design attains logarithmic performance over random operand conditions without any special speedup circuitry or look-ahead schema. A practical implementation is provided along with a completion detection unit. The implementation is regular and does not have any practical limitations of high fanouts. A high fan-in gate is required though but this is unavoidable for asynchronous logic and is managed by connecting the transistors in parallel. Simulations have been performed using an industry standard toolkit that verify the practicality and superiority of the proposed approach over existing asynchronous adders.


Level-Converting Retention Flip-Flop for Reducing Standby Power in ZigBee SoCs

Abstract:

In this paper, we propose a levelconverting retention flipflop (RFF) for ZigBee systems-on-chips (SoCs). The proposed RFF allows the voltage regulator that generates the core supply voltage (VDD,core) to be turned off in the standby mode, and it thus reduces the standby power of the ZigBee SoCs. The logic states are retained in a slave latch composed of thick-oxide transistors using an I/O supply voltage (VDD,IO) that is always turned on. Level-up conversion from VDD,core to VDD,IO is achieved by an embedded nMOS pass-transistor level-conversion scheme that uses a low-only signal-transmitting technique. By embedding a retention latch and level-up converter into the data-to-output path of the proposed RFF, the RFF resolves the problems of the static RAM-based RFF, such as large dc current and low readability caused by threshold drop. The proposed RFF does not also require additional control signals for power mode transitioning. Using 0.13-μm process technology, we implemented an RFF with VDD,core and VDD,IO of 1.2 and 2.5 V, respectively. The maximum operating frequency is 300 MHz. The active energy of the RFF is 191.70 fJ, and its standby power is 350.25 pW.


Graph-Based Transistor Network Generation Method for Super gate Design

Abstract:

Transistor network optimization represents an effective way of improving VLSI circuits. This paper proposes a novel method to automatically generate networks with minimal transistor count, starting from an irredundant sum-of-products expression as the input. The method is able to deliver both series–parallel (SP) and non-SP switch arrangements, improving speed, power dissipation, and area of CMOS gates. Experimental results demonstrate expected gains in comparison with related approaches.


Design of Self-Timed Reconfigurable Controllers for Parallel Synchronization via Wagging

Abstract:

Synchronization is an important issue in modern system design as systems-on-chips integrate more diverse technologies, operating voltages, and clock frequencies on a single substrate. This paper presents a methodology for the design and implementation of a selftimed reconfigurable control device suitable for a parallel cascaded flip-flop synchronizer based on a principle known as wagging, through the application of distributed feedback graphs. By modifying the endpoint adjacency of a common behavior graph via one-hot codes, several configurable modes can be implemented in a single design specification, thereby facilitating direct control over the synchronization time and the mean-time between failures of the parallel master-slave latches in the synchronizer. Therefore, the resulting implementation is resistant to process nonidealities, which are present in physical design layouts. This paper includes a discussion of the reconfiguration protocol, and implementations of both a sequential token ring control device, and an interrupt subsystem necessary for reconfiguration, all simulated in UMC 90-nm technology. The interrupt subsystem demonstrates operating frequencies between 505 and 818 MHz per module, with average power consumptions between 70.7 and 90.0 μW in the typical-typical case under a corner analysis.


Array-Based Approximate Arithmetic Computing: A General Model and Applications to Multiplier and Squarer Design

Abstract:

We propose a general model for arraybased approximate arithmetic computing (AAAC) to guide the minimization of processing error. As part of this model, the Error Compensation Unit (ECU) is identified as a key building block for a wide range of AAAC circuits. We develop theoretical analysis geared towards addressing two critical design problems of the ECU, namely, determination of optimal error compensation values and identification of the optimal error compensation scheme. We demonstrate how this general AAAC model can be leveraged to derive practical design insights that lead to optimal tradeoffs between accuracy, energy dissipation and area overhead. To further minimize energy consumption, delay and area of AAAC circuits, we perform ECU design simplification by introducing logic don’t cares. By applying this model and using a commercial 90 nm CMOS standard cell library, we propose an approximate 16 × 16 fixed-width Booth multiplier that consumes 44.85% and 28.33% less energy and area compared with theoretically the most accurate fixed-width Booth multiplier. Furthermore, it reduces average error, max error and mean squared error by 11.11%, 28.11%, and 25.00%, respectively, when compared with the most accurate reported approximate Booth multiplier and outperforms the same design significantly by 19.10% for the energy-delay-mean squared error product. Using the same approach, significant energy consumption, area and error reduction is achieved for a squarer unit. To further reduce error and cost by utilizing extra signatures and don’t cares, we demonstrate a 16-bit fixed-width squarer that improves the energy-delay-max error product by 15.81%.


An Accuracy-Adjustment Fixed-Width Booth Multiplier Based on Multilevel Conditional Probability

Abstract:

This brief proposes an accuracyadjustment fixedwidth Booth multiplier that compensates the truncation error using a multilevel conditional probability (MLCP) estimator and derives a closed form for various bit widths L and column information w. Compared with the exhaustive simulations strategy, the proposed MLCP estimator substantially reduces simulation time and easily adjusts accuracy based on mathematical derivations. Unlike previous conditionalprobability methods, the proposed MLCP uses entire nonzero code, namely MLCP, to estimate the truncation error and achieve higher accuracy levels. Furthermore, the simple and small MLCP compensated circuit is proposed in this brief. The results of this brief show that the proposed MLCP Booth multipliers achieve low-cost high-accuracy performance.


Aging-Aware Reliable Multiplier Design With Adaptive Hold Logic

Abstract:

Digital multipliers are among the most critical arithmetic functional units. The overall performance of these systems depends on the throughput of the multiplier. Meanwhile, the negative bias temperature instability effect occurs when a pMOS transistor is under negative bias (Vgs = -Vdd), increasing the threshold voltage of the pMOS transistor, and reducing multiplier speed. A similar phenomenon, positive bias temperature instability, occurs when an nMOS transistor is under positive bias. Both effects degrade transistor speed, and in the long term, the system may fail due to timing violations. Therefore, it is important to design reliable high-performance multipliers. In this paper, we propose an agingaware multiplier design with a novel adaptive hold logic (AHL) circuit. The multiplier is able to provide higher throughput through the variable latency and can adjust the AHL circuit to mitigate performance degradation that is due to the aging effect. Moreover, the proposed architecture can be applied to a columnor row-bypassing multiplier. The experimental results show that our proposed architecture with 16 × 16 and 32 × 32 column-bypassing multipliers can attain up to 62.88% and 76.28% performance improvement, respectively, compared with 16×16 and 32×32 fixed-latency column-bypassing multipliers. Furthermore, our proposed architecture with 16 × 16 and 32 × 32 row-bypassing multipliers can achieve up to 80.17% and 69.40% performance improvement as compared with 16×16 and 32 × 32 fixed-latency row-bypassing multipliers.


A High-Speed FPGA Implementation of an RSD-Based ECC Processor

Abstract:

In this paper, an exportable application-specific instruction-set elliptic curve cryptography processor based on redundant signed digit representation is proposed. The processor employs extensive pipelining techniques for Karatsuba–Ofman method to achieve high throughput multiplication. Furthermore, an efficient modular adder without comparison and a high-throughput modular divider, which results in a short datapath for maximized frequency, are implemented. The processor supports the recommended NIST curve P256 and is based on an extended NIST reduction scheme. The proposed processor performs single-point multiplication employing points in affine coordinates in 2.26 ms and runs at a maximum frequency of 160 MHz in Xilinx Virtex 5 (XC5VLX110T) field-programmable gate array.


A Combined SDC-SDF Architecture for Normal I/O Pipelined Radix-2 FFT

Abstract:

We present an efficient combined single-path delay commutator-feedback (SDCSDF) radix2 pipelined fast Fourier transform architecture, which includes log2 N – 1 SDC stages, and 1 SDF stage. The SDC processing engine is proposed to achieve 100% hardware resource utilization by sharing the common arithmetic resource in the time-multiplexed approach, including both adders and multipliers. Thus, the required number of complex multipliers is reduced to log4 N – 0.5, compared with log2 N – 1 for the other radix2 SDC/SDF architectures. In addition, the proposed architecture requires roughly minimum number of complex adders log2 N + 1 and complex delay memory 2N + 1.5log2 N – 1.5.


RECENT PAPERS