Advanced Microphone Array Processing (AMAP)
As depicted in Figure 1, Fortemedia’s AMAP utilizes a combination of spatial signal processing, signal separation and adaptive filtering to provide critical voice processing functions. AMAP supports single microphone and multiple microphone configurations. Spatial diversity of multiple microphones provides the possibility of separating a speech of interest from an interference which arrives via a different acoustic path, regardless the type of noise source – both constant and time varying. The advanced spatial filtering in the microphone array identifies the locations of different sound sources and applies attenuation on the path of noise or interference. When combined with source signal separation algorithms and advanced statistical filtering techniques this spatial signal processing is the first step in a complex integrated process that is employed by AMAP to perform ambient noise suppression, acoustic echo cancellation and other voice processing functions.
The spatial filtering system incorporates a time-frequency processing system. The time-frequency processing system generates cues and parameters including a group of voice activity detectors that monitor the speech activities across frequencies. Statistical variables for both noise and speech are also estimated in this stage. The time and frequency system incorporates voice pitch and sound location direction-of-arrival information, to separate target speech from background echo and interference. Voice pitch extracted from monaural input exhibits unique spectra structure and viable frequency range, interacting with direction of arrival information and other cues enabled by the multiple-microphone input, it provides the footprint of the speech of interest. Such valuable information achieves a good pinpointing of far-end echo, ambient background noise and spot interference and helps their segregation from target speech.
In the final stage of the AMAP process, the spectrum of the main microphone input after spatial enhancement will be modified by gain factors across frequencies to enhance the speech elements and suppress noise components. These spectrum modifications are performed sequentially on each concurrent output of the advanced statistical analysis filter. The corresponding processed frequency bins are reconstructed by the synthesis filter as the final output.
Upon those spectral signals, statistical filtering for noise suppression, and echo cancellation are conducted along with generating the voice activity statistics extracted previously. After stages of linear and non-linear processing, spectra signals are then converted back to temporal domain signal by a frequency-to-time mapping engine as clean speech output.