

**International Journal of Advanced Research in Computer Science** 

**RESEARCH PAPER** 

## Available Online at www.ijarcs.info

# Power Gating and Parity Bit Techniques for Low Power CAM

ASHOK KUPPUSAMY Ph.D Scholar, Sejong University Seoul, South Korea. YUVARAJ KRISHNAN

PG Scholar, K.S.Rangasamy College of Technology, Namakkal, Tamil Nadu.

*Abstract:* Content addressable memory is one of the special type of memory device used to high speed searching devices. Main feature of CAM is it can search entire memory in a single clock cycle. And CAM is the faster search application by comparing RAM. CAM requires more power consumption because of parallel match line comparison. High speed and low power Match-line sense amplifiers are very highly sought-after in CAM designs. A parity bit technique is best for to reduce area and power consumption. Furthermore, in this paper we introduced gated-power match-line sensing technique to reduce average power consumption.

Keywords: Content addressable memory (CAM), Search-line (SL), Match-line (ML), ML sense amplifier.

### I. INTRODUCTION

CAM is a memory in which data's are accessed by their contents from its physical locations. While giving input for search data, i.e., a search word and it gives the similar word from stored in the data-bank [1]. It is mainly comparing input data with stored data and producing address of the matching data. It is the attractive solution for the searching devices and applications like Google. So this makes CAM circuit as very complicated. Parallel searching part needed extra hardware cost so it's more expensive. And a CAM processing three different operations: READ form input data, WRITE from input data, and COMPARE from search data [2].

Fig. 1 shows the block diagram of the traditional CAM combination of a search data register with output encoder. While loading an n-bit input search word into the search data register compare operation will be start. Then the search data's are circulated the input data into the memory cells over *n* pairs of complementary search lines ( $\sim$ SL) and by using comparison circuits directly compared with every bit of stored word from memory.All stored word in data bank has the match-line that is connected its bits and producing the comparison results. Encoder is one of the important part in CAM so by using output encoder location of the matched word will be identified. The match-lines are apprehended at ground voltage level while search line(SL) and complementary search line(~SL) in the pre-charge level. At evaluation process stage Complementary search data is allocated to the SL and ~SL. In the CAM cell any mismatch occurred means the transistor P3 and P4 will be on condition and charging up the ML to a higher voltage level.



Fig.1 Block diagram of a conventional CAM.

The sense amplifier [3] is mainly used to sense the logic levels from a search line which symbolizes a data bit stored in the CAM memory cell on the chip and to identify the logic levels it is amplify the small voltage change so at the output terminal of the chip the data can be construed properly. To consent the single bit errors, we able to modify the match-line sense amplifier (MSLA) so the search words are accurately match or a single bit mismatch in observance with successful search while all other causes make up an unsuccessful search. Perchance one or more cell in a search word has a mismatch among its SL and stored bit then the path will be found from the ML to ground. If all cells in search word matched means then no path find among ML to ground.

A sense amplifier is used to catch the voltage change on the match-line and enlarges it to a full CMOS voltage output. In the CAM cells row if mismatch not happen there is no charge up path is formed and the voltage of the match-line will not be changed and representing a match in CAM cell. Finally the ML encoder signifies the MLSA outputs to the binary encoded match result. While giving input search data all stored data's in the CAM are compared in parallel and the result of search operation can be obtained in a single clock cycle. Therefore CAMs operations are faster than other hardware applications. The complete parallel search operation leads to grave challenges in designing a low power systems for high-speed high-capacity CAMs.



Fig. 2. Conventional pre-computation CAM.



Fig. 3. Simulation result of pre-computation CAM design.

For illustration, [3] proposed a match-line sensing technique that assigns low power to match verdicts involving a higher number of mismatched bits. Subsequently the majority of CAM words are mismatched, scheme results in a substantial CAM power reduction. Ali Sheikholeslami and Oleksiy Tyshchenko introduced the stability based sensing, [5] mainly to reduce the amount of energy delivered to mismatch MLs comparison. In the direction of search word distribution the pipelined driving technique mainly reduce the average number of active CAM blocks.

In this design, a parity bit leads to increase the searching speed of the parallel CAM with less than 1% of power and area overhead. A power gated sense amplifier is designed to improve the performance of the CAM match line comparison in terms of power. At the starting point of each and every search cycle, power gated sense amplifier decrease the peak turn-on current. The rest of the paper is explained as follows. Section II introducesCAM architecture based on parity-bit. In section III, gated power scheme is proposed. Performance analyses are explained in Section IV. And section V concludes the paper.

### II SEARCH SPEED IMPROVEMENT USING A PARITY BIT

In this type of search operation an introduced parity bit is equal to the existing pre-computation technique but its operation process is different. Formerly presenting the suggested parity bit based CAM design the pre-computation CAM design was presented.



Fig. 4. Synthesis report for pre-computation design.

### *Q. Pre-Computation CAM Design*

Pre-computation is mainly storing the counting bit information along with each word that is used in the search operation to save power. From the stored word these counting bits are derived and before searching main word counting bits used in initial search. If first search gets fails, then the CAM ends the subsequent search for saving power. The traditional CAMs memory organization consists of the data memory and the valid bit field, where the valid bit field indicates the obtainability of stored data.

In the searching data operation, the input data is sent to CAM to compare with all stored data and address of the matches of comparison sent to the output. Here precomputation CAM is used for additional counting bit to filter some mismatched CAM words before the actual search of comparison. The counting bits are derived from data bits and used for first comparison stage. In Fig. 2 number of "1"s in the search data bits are counted and held in reserve in the counting bit segment.

The number of "1"s in the data bits are counted while searching operation starts and stored in the counting bits. In the second sensing stage this counting bit information is compared, then the same numbers of "1"s are turned on for further comparison. A significant amount of power is reduced by pre-computation CAM design which required for data comparison, statistically. To reduce power consumption the main idea is to use additional silicon area and search delay. Among matched ML and mismatch ML is determined by ML sense amplifier. Consequently we propose a new auxiliary bit that can instantaneously increase the sensing speed of the ML. The simulation result of pre-computation CAM design is given in Fig. 3. It spectacles that the matched data word ML is "1" otherwise all bits are "0". The final synthesis report for pre-computation CAM design is given in Fig. 4. The elapsed time is 7.00 S and total memory usage is 122004 KB.

| Parity bit  |   |   | Data bits |   |   |   |   |   |     |
|-------------|---|---|-----------|---|---|---|---|---|-----|
|             | 0 | 0 | 0         | 0 | 0 | 0 | 0 | 0 | MLO |
|             | 0 | 0 | 1         | 0 | 0 | 0 | 0 | 1 | ML1 |
|             | 1 | 0 | 0         | 0 | 0 | 0 | 0 | 1 | ML2 |
|             | 0 | 0 | 0         | 0 | 0 | 1 | 0 | 1 | ML3 |
|             | 0 | 0 | 0         | 0 | 1 | 1 | 1 | 1 | ML4 |
| Search data |   |   |           |   |   |   | а |   |     |
|             | 0 | 0 | 0         | 0 | 0 | 1 | 0 | 1 |     |

Fig. 5. Proposed parity-bit based CAM.



Fig. 6. Simulation result of parity bit based CAM.

### b. Parity Bit Based CAM

In the end of a binary code a parity bit is added which is extra bit that is used to indicate the number of bits in the string is even or odd. In the data bits the data word contains an odd number of one's a parity generator will produce logic "1" at its output otherwise logic "0". Fig. 5 shows the parity bit based CAM design containing the original data bit segment and a parity bit segment. From the original data bits these parity bits are derived. In the given data bit we need to find the parity bit is even number of "1"s or not. After finding the parity bit it will directly assign to the corresponding parity bit segment and ML. Thus the new architecture is same as the conventional CAM additionally added one extra bit.

In the conventional CAM while during search operation there is only one single stage. If a matched parity bits of the search word and stored word are same a match output produced by overall search result. In the stored word and search word must be different by 1 while 1-mismatch occurs in the data bit segment numbers of "1"s. As a result, the consistent parity bits are different. If two mismatchoccurin the data segment (ML0, ML1, and ML4) the parity bits are same and generally we have two mismatch. If there are more mismatch occurs in data segment they are not a critical case so we can skip these case. The mismatch and matched cases are identified by sense amplifier. When driving capability of mismatch word is twofold strong, the proposed design greatly improves the search speed and power consumption of CAM. The simulation result of a parity bit based CAM design is given in Fig. 6 and it shows matched data word ML is "1" otherwise "0". Then synthesis report for precomputation CAM design is given in Fig. 7. Total memory usage of design is 129364 KB and the elapsed time is 9.00 S.

| Design Statistics<br># IOs      | :    | 13       |
|---------------------------------|------|----------|
| Cell Usage :                    |      |          |
| # BELS                          |      | 1665     |
| # AND2                          |      | 791      |
| # AND3                          |      | 5        |
| # AND4                          |      | 2        |
| # AND6                          |      | 1        |
| # AND7                          |      | 2        |
| # AND8                          |      | 3        |
| # GND                           |      | 1        |
| # INV                           |      | 349      |
| # OR2                           |      | 288      |
| # XOR2                          |      | 223      |
| # FlipFlops/Latches             |      | 42       |
| # FD                            |      | 37       |
| # FDCE                          |      | 5        |
| # IO Buffers                    |      | 13       |
| # IBUF                          |      | 8        |
| # OBUF                          | :    | 5        |
| CPU : 9.20 / 9.38 s   Elapsed : | 9.00 | / 9.00 s |

Total memory usage is 129364 kilobytes

Fig. 7. Synthesis report for Parity bit design.

### III GATED-POWER ML SENSE AMPLIFIER DESIGN

### A. Operating Principle

The suggested CAM design is shown in Fig. 8. The CAM cells are combined into rows and columns of words and bits respectively. Both CAM cell contain same number of transistors as the conventional P-type NOR CAM and use a similar ML structure. All bits in a SRAM have four transistors (M1, M2, M3 and M4) and that are formed as a two cross-coupled invertors. This bit has two stable states "0" and "1". Two additional transistors are used to access the control of a storage cell while read and write operations. The SRAM cell has state three dissimilar states they are standby, reading and writing. The SRAM has operated for read and write so it should have readability and write stability access.

#### 1) Standby

If the word line is not confirmed the transistor M5 and M6 disconnect the cell from search lines. The M1, M2, M3, M4 transistors are formed two cross coupled invertors and extend to make stronger and they are connected by supply.

#### 2) Reading

The content of memory is "1" and it stored at Q. And started read operations by pre-charged each search line to logical "1", then antagonistic the word line WL, allowing both the access transistors (M1 and M5). The second step will occurs when the values stored in Q and ~Q and they are transferred to the search lines by leaving SL at its pre-charge value and discharging SL through M1 and M5 to logical 0. The M4 and M6 transistors will pull the search line towards vdd, a logical 1.



Fig. 8. Proposed CAM architecture.



Fig. 9. Simulation result of power controller.

Uncertainly the content of the memory has a 0, so the opposed would happen and SL would be pulled towards 1 and SL towards 0.Here sense amplifier is sensing which lines will reach to sense amplifier and SL and ~SL have small difference and sense amplifier will produce the output based on the stored is 1 or 0. Important feature of sense amplifier is fastest speed of read operation.

While giving the input value to write the write operation will starts. If we need to write a "0" in the search line, we can give input 0 in search line and ~SL to 0, i.e. opposite to SL. A "1" is written by bit line because the bit line is inverting the values. Word line (WL) is then declared and the value that is to be stored is latched in. In the CAM cell the bit line of the input drivers are designed as stronger than weak transistors so they can easily override previous state of cross coupled invertors, i.e. SRAM in the CAM.

Nevertheless, the cross coupled invertors, i.e. SRAM unit or comparison unit are powered by two separate metal rails and they are  $V_{DDML}$  and  $V_{DD}$  individually. The power transistor and feedback loop are controlling the  $V_{DDML}$  and that can

auto turn off the match-line (ML) current to save power. If there is only one power rail in CAM, there are chances for power disturbance so the separate power rails are completely separated by the SRAM unit from power disturbance while during compare cycle. In the Fig. 8 the gated power transistor fully controlled by feedback loop, and denoted as "power control". If the voltage on the ML reaches a certain threshold it can turn off power automatically.

At the starting of each and every compare cycle, the matchline (ML) is firstly initialized by a global control signal EN, which is set to low and power transistor ( $P_X$ ) is turned OFF. So this modification make the signal ML and C1 primed to ground and  $V_{DD}$  respectively. After that the control signal EN is set to HIGH and initiates the compare stage. If in the CAM cell one or more mismatches occur the ML will be charged up. If all the cells of a row will share a slice of current obtainable by the power transistor  $P_X$ , regardless of whatever number of mismatch occur.

Once the ML voltage will reach threshold voltage of the ML delay, the NAND2 gate would be toggled and again the power transistor is turned off. As a result the ML is limited to some voltage slightly change the threshold voltage of M8, and ML is not fully charged to  $V_{DD}$ . The simulation result of the proposed controller is given in Fig. 9 and power consumption is 0.118mW. Here we introduced the power transistor C1 so by using that C1, the driving strength of the mismatch is weaker than that of the conventional design and thus slower.

### B. CAM Cell Layout

The layout diagram of CAM cell is given in Fig. 10 and its using 65nm CMOS process. By comparing a new CAM cell with the conventional design it has similar topology. These two CAM cells layouts has different height but same length. The simulation result of single CAM cell is given in Fig. 11. The power consumption of the single CAM is  $2.979\mu$ W. So the low power and high speed of search operations are achieved by new CAM architecture.



Fig. 10. Layout of CAM cell.



Fig. 11. Simulation result of CAM cell.



Fig. 12. Total average energy consumption.

### **IV. PERFORMANCE COMPARISONS**

In this section we discuss about performance comparison and proposed design will be appraised by the conventional circuit [6], [7].At the beginning of search the power consumption is limited by the amount of charge injected to the ML and to boost the searching speed some similar concept is utilized with positive feedback loop.

#### A. Power Consumption

When the output is obtained at the sense amplifier the power gated transistor is turned off and the proposed design technique is reducing the average power consumption. This is essentially due to the reduced voltage swing on the matchline (ML). Another one feature to reduce the average power consumption is the EN signal turns off transistor Px in each row so no need to give pre charge for SL buses.

Fig. 12 illustrates the average energy consumption of the proposed design as compared to other two benchmark designs including all the power overhead of the control circuitry. Since [6], [7] and the proposed design do not precharge the SLs before each compare cycle, their SLs energy consumption is only half of that of the conventional circuit. As for the ML energy, at 1V supply voltage the proposed design only dissipates  $2.9-7\mu$ W. Our ML energy consumption is lower than that of [6] (24.23 $\mu$ W) and [7] (19.87 $\mu$ W) but as will be shown in Fig. 11, our proposed design is much more strong against process and environment variations.

The average energy consumption of our proposed design as compared with other benchmark designs have illustrates in Fig. 12.It contains all the power overhead of the control circuitry and the energy consumption of the SLs is only half of that of the conventional circuit. For the ML energy in the proposed design 1V supply only dissipates 2.97  $\mu$ W. Our proposed design is more strong alongside process and ML energy consumption is lower than that of reference paper [6] (24.23 $\mu$ W) and [7] (19.87 $\mu$ W) and it shown in Fig. 11.

### **V. CONCLUSION**

A CAM architecture with a parity bit based encoder and gated power technique have many advantages that are increasing search speed and reduce the average power consumption. And these techniques are more stable than other techniques for low power CAM. By comparing this design to conventional CAM its solidity is reduced at tremendously low supply voltage. Both designs are similarly stable with no sensing errors in 1 V operating conditions according to Monte Carlo simulations. For high capacity parallel CAM in sub 65nm CMOS technologies these techniques are most suitable design. Asynchronous Transfer Mode (ATM) switching network CAM is used as a translation table. In ATM networks CAM can act as an address translator.

### REFERENCES

- K. Pagiamtzis and A. Sheikholeslami, "Contentaddressable memory (CAM) circuits and architectures: A tutorial and survey," IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006.
- [2] N. Mohan and M. Sachdev, "Low leakage storage cells for ternary content addressable memories," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 5, pp. 604–612, May 2009.
- [3] Hrisanthopoulos, A., Moisiadis, Y., Tsiatouhas, Y. and Arapoyanni, A, "Comparative study of different current mode sense amplifier in submicron CMOS technology," IEEE Pro. Circuits, Devices and Systems, vol. 149, no. 3, pp1 154-158, June 2002.
- [4] Arsovski and A. Sheikholeslami, "A mismatchdependent power allocation technique for matchline sensing in content-addressable memories," IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 1958– 1966,Nov. 2003.
- [5] O. Tyshchenko and A. Sheikholeslami, "Match sensing using match-line stability in content addressable memory (CAM)," IEEE J. Solid-State Circuits, vol. 43, no. 9, pp. 1972–1981, Sep. 2008.
- [6] N. Mohan, W. Fung, D. Wright, and M. Sachdev, "A low-power ternary CAM with positive-feedback match-line sense amplifiers," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 3, pp. 566–573, Mar. 2009.

[7] S. Baeg, "Low-power ternary content-addressable memory design using a segmented match line," IEEE

Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 6, pp. 1485–1494, Jul. 2008.