logo

Our Adaptation of Lyon’s Auditory Model for Python

In contrast to our usual theoretical posts, in this story, we will discuss our recent adaptation of the popular Lyon’s auditory model for Python.

The ability of the human auditory system to recognize speech in adverse and noisy conditions has prompted researchers to introduce features of human perception in speech recognition systems. The early 1980s saw an outburst of computational models based on physiological measurements of the response of individual auditory nerve fibers. One of the models that emerged at that time was Lyon’s passive longwave cochlear model.

Original Model

The original Lyon’s auditory model is a part of the auditory toolbox written for MATLAB. This toolbox is useful for researchers and speech and auditory engineers who want to see how the human auditory system represents sounds.

Richard F. Lyon, an American scientist and prominent inventor, described an auditory model showing the propagation of sound in the inner ear and the conversion of acoustical energy into neural representations followed by several stages of adaptation.

The model simulates the behavior of the cochlea, the most important part of the inner ear. In its essence, the cochlea is a non-linear filter bank: thanks to the variability of its stiffness, different places along it are sensible to sounds with different spectral content. However, the model does not try to literally describe each structure in the cochlea but treats it as a “black box.” Like the sound entering the cochlea is converted into nerve firings that travel up the auditory nerve into the brain, the model outputs a vector proportional to the firing rate of neurons at each point in the cochlea.

Lyon’s filters

The cochlear model combines a series of filters that recreate the traveling pressure waves with Half Wave Rectifiers (HWR) to detect the energy in the signal and several stages of Automatic Gain Control (AGC):

This behavior is simulated by a cascade filter bank. The number of such filters depends on the sampling rate of the signals, the overlapping factor of the filter band, the quality of the resonant part of the filters, and other factors. The more filters the more accurate is the model.

Cochleagram

The model results in maps of auditory nerve firing rates, called cochleagrams. Cochleagrams are a variation of spectrograms and refer to two-dimensional time-frequency representations that are used to better reveal spectral information.

While at a coarse temporal scale, cochleagrams and spectrograms look quite similar except for the scale of the frequency axis, cochleagrams can preserve more of the fine time scale structure of each sound component, e.g. the glottal pulses.

Computer Model

Written for C and MATLAB, the computer adaptation for Lyon’s passive longwave cochlear model implements multiple stages of the multiplicative adaptive gain. Its input is a number of channels from a filter bank. An array of state filters, one per channel and per stage, measure a running average of the energy in the channel. These variables are then used to drive a single multiplicative gain per stage per channel.

Our Adaptation for Python

In our speech recognition project, we experimented with waveform segmentation and ASR, and it turned out that Lyon’s model-based features outperformed standard MFE and MFCC features. However, we used Python for the project, and we could not find any previous implementation of Lyon’s model for Python, only the original C code from AuditoryToolbox. Moreover, from the global perspective, increasingly more AI-related projects are written in Python, so we found it useful to port Lyon’s model to this language.

To successfully call LyonPassiveEar() we removed the MEX-related part from soscascade.c, agc.c and sosfilters.c, made a ctypes wrapper for soscascade(), agc() and sosfilters() calls and translated the necessary files from MATLAB to Python.

A complete description and the installation guide for the project can be found on the corresponding PyPI page.

Usage

If you want to test how it works, you can use our code to compute a cochleogram for a sample sound:

from lyon import LyonCalc

calc = LyonCalc()

waveform, sample_rate = load('audio/file/path.wav')

decimation_factor = 64

coch = calc.lyon_passive_ear(waveform, sample_rate, decimation_factor)

print(coch.shape)

The code above outputs the shape of the resulting auditory nerve response: [<number of samples / decimation_factor>, 86].

We hope that adapting this famous model for Python will be a helpful addition to Python’s range of NLP tools and will help further work on speech processing.

Richard F. Lyon (b. 1952) is an influential American inventor and engineer. One of the key figures in signal processing, he worked on both optical and auditory signals. He is the author of a cochlear model that is used as the basis of much auditory research and optical and integrated-circuit techniques. For instance, this model improved the accuracy of digital color photography. Lyon was one of two people who independently invented the first optical mouse device. He also designed early Global Positioning System test transmitters and the first single-chip Ethernet device. In 2017, Lyon published his first book, Human and Machine Hearing: Extracting Meaning from Sound.