CODING AND ANALYSIS OF SPEECH IN COCHLEAR IMPLANT: A REVIEW

About 5% of the total population of world has hearing impairment and this is 6.3% in India. The hearing impairment in patients can be corrected by using a hearing aid or cochlear implant (CI) .Hearing aid is a traditional way while CI, which is surgically implanted, is a modern way to correct the hearing impairment. In comparison to hearing aids CI increases speech intelligibility at a greater level. Also speech intelligibility decreases in noise though acceptable in noise-free environments. With the advancement of technology over past decades, it shows increase in speech intelligibility with new emerging speech coding strategies. In this paper the basic speech processing strategies for speech coding and different techniques and algorithms, which aimed to increase the speech intelligibility, are being reviewed. Speech will be analyzed based on some measures like Analysis of Variation (ANOVA), Root Mean Square Error (RMSE) etc. The focus will be on speech perception and intelligibility enhancement.


INTRODUCTION:
In human auditory system, the sound waves collected by ear, travel from outer ear through the auditory canal and strike the eardrum causing it to vibrate. The central part of the eardrum connected to small bones of middle ear and because of this connection it transmits the sound waves to other small bones. This action is passed on to cochlea, having spiral structure, contains receptor organs for hearing. It contains tiny hair cells which translate these vibrations of sound into electrical impulses that are carried to brain through sensory organs [1]. The sound signals are interpreted and identified by tonotopical and temporal characteristics of these electrical impulses.
Hearing loss is a disability in which hearing of a person is impaired. The severity of impairment can be high, moderate or low. The hearing loss above 40 decibels (dB) in the better hearing ear of a person is the disabling hearing loss. Hearing loss hinders the development of an individual, affects the learning and is a cause of depression in many cases. The advancement in the world of science and technology has evolved new methods and strategies. CI helps in increasing the speech intelligibility in hearing impaired person and thus increases hearing ability of the patients.
Hearing impairment can be caused due to several factors like certain infectious diseases, congenital causes, complications at birth, chronic ear infections, exposure to excessive noise, ill-effect of particular drugs and ageing. Hearing loss may be congenital or it may happen to a person who had normal hearing previously. The severity of hearing loss increases over time as the hair cells keeps on corroding over time and in noisy environment. Due to this corroding, speech intelligibility may not be achieved by conventional hearing aids so, CI is used.
The CI is most popular man-made interface with human brain. According to a survey 324,200 people were registered for CI across the globe. This, an electronic device, is surgically implanted under the skin of the patient. It helps hearing impaired patients having severe to profound deafness to gain speech intelligibility by simulating the normal hearing process by stimulating auditory nerves electrically [2], helping in communication and language development in children. From the past few decades vast improvement in the CI technologies have been done, enabling the CI listeners to have a good speech perception in quiet environment though it is quite challenging in noisy conditions. According to the studies the noise affect CI patients in speech perception considerably more than Normal Hearing (NH) listeners. Even a small amount of noise can make them uncomfortable and loose the target sound entirely, whereas it may not be a problem for normal hearing people. This difficulty is related to many abnormalities in sound perception [2]due to signal quality. The signal quality can be measured by evaluating noise reduction and speech distortion.
Different modules are implemented in a CI where the outer environmental acoustic signals are collected through a microphone, and then these signals are processed using different strategies in the processor of the CI.These speech processing strategies are intended to increase the speech intelligibility in CI. The resultant signal after processed with one or two strategies is transmitted to the transmitter under the skin and signals are sent to the brain through the implanted cochlea as shown in Fig: 1. This paper is divided into six parts including introduction, a brief discussion on factors associated with CI, followed by a brief explanation of sound processing strategies, Sound production analysis, comparison and conclusion.

FACTORS ASSOCIATED WITH CI PERFORMANCE
Post implantation fitting procedure is done to adjust the stimulation levels of each channel for minimum stimulation level and maximum level of comfortable loudness. Kuczapski et al. [3]designed and implemented a method to connect to CI to display the generated pulses on the computer. This provides technical support for researches in CI. It Detects faults during fitting procedure. Also for demonstration purposes it provides simple auralization. The system composed of a Detector box which transduces the received information into the electric signals, 12 analog channel data-acquisition module for real time signals, Computer connected with this module through a USB for real time auralization and visualization. It approximated and replayed the perceived sound, using registered signals.
The data rate for data transfer between outer and inner part of CI is one of the challenge affecting the performance of CI. Mai et al. [4], proposed a cochlear system with implanted DSP to address the problem of data rate. The model had two parts, external and internal. The external part consist of microphone, A/D converter, Automatic gain controller (AGC), and modulator while the internal part contained demodulator, the DSP, a D/A converter and the stimulating circuit. Connection between external and implant part is through an inductive link which was responsible for transferring data and power between the two parts with a PWD scheme. Data transmission rate of 100kbpsvoice-band signals is used in this system to remove data rate bottleneck. The power consumed by system increased by less than 10% when speech processing strategies are optimized as compared to traditional system. Power transmission efficiency was promoted above 40% at more than 1MHz bandwidth.
There is a huge mismatch between electrodes in CI and neurons of human auditory system which is responsible for the poor frequency resolution in the CI due to which the performance of the CI has remained largely unchanged [5]. Minor improvement can be included pre-curved electrode arrays to get stimulated sites closer to the cochlear modulus. Some innovative research is also emerging which inject neurophin or stem cells to attract neurons to grow towards electrodes. Also Advanced Bionics cochlear like, Med EI and nurotron, CI manufacturing companies, competing for the tender to address the performance gap.
The main objective of CI is the processing and conversion of the sound signals to electrical signals to obtain robust speech intelligibility. Speech intelligibility depends upon temporal and spectral (tonotopical) features. Spectral information contains low and high frequencies in specific orientation in the cochlea. While temporal information deals with the precise timing action between auditory nerve fibres. Speech envelop, fundamental frequency periodicity and the temporal fine structure constitutes temporal information of the acoustic signal. Temporal fine structure signifies the fast fluctuations in a particular signal to localize the sound signals, perceive pitch of that signal, etc.
The quality of sound for CI user varies according to the environmental conditions of user. Presence of noise in the environment degrades the performance of CI. There exists a large set of algorithms for noise reduction both for single and multichannel. For example, in [6] S. Arora, has proposed coherence-based algorithms, having spatial filtering post-filters, for speech enhancement. Also there are no of metrics for analysis of speech production from CI users. For example in [7] it is done using SNR (Signal to noise) and long-term averaged spectra while feature extraction of speech is done using glottal spectral slop and fundamental frequency.
In addition to the audio-logical factors the non-audio-logical factors should be considered to increase the performance for speech intelligibility. Hickson et al. had examined these factors to make best use of hearing aids for old age people [8]. The participants of above 60years of age are taken as a study sample and their demographics and psychological factors were evaluated. A binomial multivariate logistics regression model is used in this paper. The non-audio logical factors were proved to be important for the success of hearing aid used in older adults.

SOUND PROCESSING STRATEGIES:
For enhancement in the sound signals different sound processing strategies are employed. These strategies can be divided into waveform strategies and feature based strategies. Compresses-analog approach and Continuous

Compressed analog approach:
Compressed analog approach was widely used initially in CI. In this waveform strategy the signals are compressed based on an automatic gain control, filtered into the four frequency bands, having center frequencies at 3.4,2,1 and 0.5 kHz. These waveforms are sent to four electrodes of CI by going through the adjustable gain control. This waveform delivered in analog form simultaneously to the electrodes.
Dorman et al. concluded median score of 45% speech intelligibility in word identification from given sentences. This score decreased to 14% for monosyllabic and twosyllable words [9].Furthermore, compresses-analog approach yielded better speech intelligibility performance for multichannel over single-channel [10].

Continuous Interleaved Strategy (CIS)
CIS is a waveform strategy. It was one of the most common strategies used for speech processing. It addressed the problem of simultaneous channel interactions in compressed-analog strategy. In CIS, the input signals are first preprocessed by pre-emphasizing the signal and removing the noise by non-linear bandpass filters, with respective to each channel in the signal. Full rectification is done, followed by lowpass filters to compute the envelop of the signal [11]. After calculating envelop, the envelop is compressed because of the small variation range of the acoustic amplitudes in implant listeners for conversational speech. In [11] power law and logarithmic compression function for this compression has been used. At the end the signal is the modulated with the biphasic pulses for time multiplexing of all the channels with each other. There exists inter channel interaction in the speech signal. This channel interaction leads the problem of spectral overlapping, hence reducing the quality of speech. So, CIS overcome this problem by time multiplexing the signal. Computation of CIS is multithreaded process. Ahmad et al. had implemented CIS by using polyphase filters, in frequency domain and was proved to be computationally efficient.
To overcome the problem of spectral overlap among electrodes of CI, Sanketha et al. [12]used Functional Delay Filter (FDF). For multi peak channels of CI, some methods like CIS algorithm and confessed analog approach are used but can have limitation of spectral overlap and FDF is suitable for the overlap among electrodes. Spline and langrage interpolation are found to be suitable to design FDF, due to non-oscillating responses in these interpolations, for avoiding signal interference.
A novel algorithm for speech processing, based on harmonicity cues in CI was investigated and compared with CIS by Wang et al. [13] . In this algorithm the processing is done over the extracted temporal envelop and periodicity cues to improve the tonal information of speech. On comparing this algorithm with the CIS strategy, this algorithm showed consistently higher recognition rates for tone recognition.

Feature-based Strategies
In this spectral information is extracted and used for the stimulation to the electrodes. These are basically depends upon three formants of speech signal, F0, F1 and F2.

F0/F2
The F0 is called the fundamental frequency. The formant frequencies are the three peaks (F1, F2, and F3) in the frequency which are required for proper speech perception. In this, F0 and F2 are calculated using zero-crossing detector at the return of 270 Hz lowpass filter and 1000-4000 Hz bandpass filter. Among different electrodes (22 electrode), the stimulation of an appropriate electrode is done at the rate of F0 pulses per second.

F0/F1/F2
This strategy includes first formant F1 along with the previously present F0 and F2. Here at output of 280-1000 Hz band pass filter the zero crossing detector estimates F1. It provides an edge over F0/F2 as it stimulates apical and basal electrodes with F1 and F2 information respectively. These formant strategies work fine mainly for lowfrequency signals while this is not suitable for high frequency signals. For high frequency signals different strategies are used.

Spectral Maxima Sound Processor (SMSP)
In spectral maxima speech processing the speech is passed to the bank of band pass filters. This bank generally contains 16 bandpass filters. After passing through the band pass filters the signal is rectified and filtered using low pass filters. Out of 16 outputs, around six outputs having maximum amplitude are selected and stimulate them in the electrodes. Here instead of extracting features, maximum amplitude from the outputs are selected for stimulation.

Advanced Combination Encoder (ACE)
ACE is functionally similar to the existing strategy SMSP. Previously viewed strategies are mainly based on two formants (F1/F2) and fundamental frequency (F0). In strategies like SMSP there is deterioration in signal with the presence of even small noise. ACE instead of estimating acoustic features explicitly, provides salient aspects of signal spectral shape. The modules of ACE are similar to many strategies like CIS but differ in channel selection module [14]. Also in comparison to many other strategies ACE enhances some spectral features for perception by CI users. The across-frequency delays in the stimulation patterns of ACE were investigated based upon which speech perception of CI users , by Taft et al. [15]with group improvement of 3% with no delay while 20% improvement with 6ms delay in a subject for mean word recognition in noise.

Spike-based Temporal Auditory representation (STAR)
New CI speech processing strategy, STAR uses fine grained temporal information. The behavior of the hair cells and auditory nerve are modeled using thresholds or zero crossing to obtain spikes. In auditory system, there is placed based and temporal coding of speech. Grayden et al. [16], provided an enhanced representation of fine grained temporal information with increased noise tolerance.

Multi-peak stimulation strategy (MPEAK)
This strategy is advancement over F0/F2 and F0/F1/F2 strategies. In this strategy, high frequency information is extracted for stimulation in the electrodes. The F1 and F2 are determined as above in F0/F1/F2 strategy with refinement of F2 frequency range to 800-4000Hz. It improves F2, second formant, representation and enhances the perception of high frequency information from the signals like consonants.

Fine Structure Processing (FSP)
FSP processes on fine structure of the signal, in addition to the calculation of the envelop of the acoustic signals. Dillon et al. compared speech perception by two different sound processing strategies, Fine Structure Processing (FSP) and high definition CIS (HDCIS)Strategies [17]. Subjects were assigned to listen through either having FSP or HDCIS activated in the speech processors in their CIs randomly. It was concluded that the perception of speech was almost same by using either strategy.
Boucherit et al. compared different auditory filters, on the basis of their performance, at different noise levels and condition over conventional filters [18]. The data set of 30 sentences was taken from both genders. These were corrupted with different real-world noises including babble noise, car noise, station noise, street noises etc. at different Signal-to-Noise ratio (SNR) levels. Normalized covariance matrix (NCM) was used for performance estimation. Here an increase in SNR increases NCM. In different filters there were no considerable differences found. This model didn't improve speech intelligibility in comparison to conventional filters.
With the rationale of finding better strategy, Fig: 3shows the flow of some of the strategies discussed above. shows strategy specific elements.

SOUND PRODUCTION ANALYSIS
Sound production of the CI listeners depends upon their speech intelligibility. Lee et al. analyzed speech production of CI users with respect to environmental noise conditions. Four different noisy places were selected for analysis where background noise was analyzed using long-term averaged spectrum and SNR. For speech analysis, fundamental frequency and glottal spectral slope were analyzed [7]. Also the speech intelligibility has been investigated under fast acting Automatic Gain Control(AGC), a presentation level function, by Khing et al. [19]. Both high and low SNR conditions were considered for investigation. The little improvement in the intelligibility performance was found.
A linear regression model was proposed by Nagathil et al. to get perceptual ratings prediction of music by the CI listeners [20]. A listening test was conducted for which CI listeners were asked to give their ratings for music excerpted at different scales. Data set of eleven CI listeners was used, where listeners were assisted with bilateral or unilateral MED-EL implants coded with FSP or CIS strategy. The proposed model along with applied cross-validation technique yielded 0.41 to 0.68 Root Mean Square, thus showed great prediction accuracy of the above model.
To increase the speech intelligibility different noise reduction mechanism has been employed by researchers. J. Dingemanse et al. analyzed the effect of transient noise reduction (TNR) method in CI and the TNR interaction with continuous noise reduction(CNR) algorithm [21]. It was found that annoyance was reduced significantly by TNR from transient sounds and CNR found to be beneficial for increasing speech intelligibility. Considering higher SNR requirement in CI listeners, Arora et al. has researched over two methods one was based on coherence function and other one is based on spatial filtering [6]. In spatial Noise reduction method beam former post-filters were used for noise reduction and give the improvement of up to 4.6db.
The maintenance of 50% of speech intelligibility with more noise was observed in Spatial noise reduction method. The coherence method was concluded better for speech intelligibility in the presence of more than one noise sources and even in the presence of multiple sources.
The effects of spectral subtraction and multi-band frequency compression was concluded by Tiwari et al. to reduce computational complexity and to reduce intra speech spectral masking which in turn reduces the noise in the intra speech spectrum [22]. The improvement of 4-13db in Signal to noise ratio was found. The traditional spectral subtraction method into a statistical-based-model to update signal-tonoise ratio to reduce speech distortion and musical noise which the traditional spectral method unable to reduce, modified by Yuan et al. [2]. This method affected sentence recognition as [F (3,18) [23]. The novel method found to be more efficient by reducing MSE by 60.7% for babble and impulsive noise and 58% for AWGN. Considering other factors apart from noise reduction for speech enhancement S.Jain et al. has investigated the effect of manipulating speech signal parameters on speech intelligibility.79.6% variance is attributed to signal processing in speech intelligibility of a signal. It was found that pulse rate per channel is a parameter, showing maximum variance for speech processing [24].
Another method of enhancing speech is by using neural network for to increase the speech intelligibility of CI users where the speech enhancement was done by estimating important perceptual information by feeding the extracted auditory features to neural networks [25] This information is used to retain the speech dominated and attenuate the noise dominated channels. Significant improvement over 14 CI patients ranging from 1.4db-6.4db Speech Reception Threshold (SRT) was shown.

COMPARISON BASED ON DIFFERENT METHODS AND TECHNIQUES.
Speech intelligibility is analyzed and compared on the basis of different techniques and methods used as shown in Table  1. Speech is analyzed by using different analysis parameters such as Signal-to-noise ratio (SNR), fundamental frequency (F0), Glottal Spectral Slop (GSS), PESQ Score, Normalized Covariance Matrix (NCM), Root Mean Square Error (RMSE), accuracy and standard deviation.

CONCLUSION:
Speech perception enhancement by increasing speech intelligibility using CI is a research area which haven potential to change the lives of millions of people. Improving speech coding in CI is a promising area in this field. Different speech processing strategies and algorithms based on different aspects have been reviewed to explore new research directions for research in this area. These strategies are capable of good speech perception not only in quite environments but also in different noisy environments with certain limitations. The new algorithms are emerging based on different aspects of sound, noise reduction mechanism, gain control mechanism and features of the signal have been analyzed to achieve the main objective of increasing the speech intelligibility for CI patients, to gain the essence of natural hearing to patients.