Artificial intelligence hearing

Digital processing and analysis of acoustic signals remains a pressing issue to date. The research in this field is conducted, but the more problems are solved, the more arise. The global goal of the research is to process speech in real-life sound environment, for example, when several people talk simultaneously during a conversation. And this goal is not reached yet.

A group of researchers from Peter the Great St. Petersburg Polytechnic University (SPbPU) has proposed and developed a new approach that is based on the simulation of the process of sensory sounds coding by modelling the auditory periphery. The current results of this study were published in a scientific article “Semi-supervised Classifying of Modelled Auditory Nerve Patterns for Vowel Stimuli with Additive Noise” [https://link.springer.com/chapter/10.1007/978-3-030-01328-8_28].

Human nervous system processes information in the form of neural responses. The peripheral nervous system, which involves analyzers (particularly visual and auditory), provides perception of the external environment. They are responsible for the initial transformation of external stimuli into the neural activity stream and peripheral nerves ensure that this stream reaches to the highest levels of the central nervous system. This lets a person qualitatively recognize the voice of a speaker in an extremely noisy environment. At the same time, according to researches, existing speech processing systems are not effective enough and require powerful computational resources.

Speech signal and its transformation into the reaction of the auditory nerve

Speech signal and its transformation into the reaction of the auditory nerve

To solve this problem, a research was conducted by the scientists of the Measuring information technologies department at SPbPU. During the study, the researchers have developed methods for acoustic signal recognition based on peripheral coding. Basically, the scientists have partially reproduced the processes performed by the nervous system while processing information and then integrated this process into a decision-making module, which determines the type of the incoming signal.

Clustering of the reaction of the auditory nerve for two phonemes on the distance matrix

Clustering of the reaction of the auditory nerve for two phonemes on the distance matrix

“The main goal is to give the machine a human-like hearing, that is to achieve the corresponding level of machine perception of acoustic signals in the real-life environment,” says the project leader Anton Yakovenko. During the research a large database of examples of the neural activity evoked by vowels has been created with the auditory nerve model. The data was processed by an algorithm that conducted structural analysis to identify the neural activity patterns, which were used by the model to recognize each phoneme. The proposed approach incorporates self-organizing neural networks and graph theory. According to the research, analysis of the reaction of the auditory nerve fibers allowed to identify vowel phonemes correctly under significant noise exposure and surpassed the most common methods for parameterization of acoustic signals.

“The algorithms for processing and analysing big data implemented within the research framework are universal and can be implemented to solve the tasks that are not related to acoustic signal processing,” says Anton Yakovenko. He adds that one of the proposed methods was successfully applied for the network behavior anomaly detection.

The SPbPU researchers believe that the developed methods should help create a new generation of neurocomputer interfaces, as well as provide better human-machine interaction. In this regard, this study has a great potential for practical application: in cochlear implantation (surgical restoration of hearing), separation of sound sources, creation of new bioinspired approaches for speech processing, recognition and computational auditory scene analysis based the machine hearing principles.