Google AI can differentiate voices even in a crowded place

Finally, its search giant Google has come up with a technology which will pick up individual voices in a crowded place. Here is the full report.

By Karan

The technology has improved since the introduction of the smartphone in the global market. The smartphone cameras are now so smart that they allow users to focus on a single object among many. Soon we will also see some advanced technology which will pick up individual voices in a crowd while suppressing other ambience sounds.

Google AI can differentiate voices even in a crowded place


The new AI (Artificial Intelligence) system developed by Google researchers is making this thing possible. This is very important because computers are not good at paying attention to a particular person during noisy circumstances. Usually, it gets confused when there is more than one voice speaking in the background and give unexpected results.

"However, automatic speech separation -- separating an audio signal into its individual speech sources -- remains a significant challenge for computers," Inbar Mosseri and Oran Lang, software engineers at Google Research, wrote in a blog post this week.

"In this work, we are able to computationally produce videos in which speech of specific people is enhanced while all other sounds are suppressed," Mosseri and Lang said.

In a news report, the researchers from Google has demonstrated the deep learning audio-visual model for separating a single audio signal from a mixture of sounds.

The technique works on ordinary video recording with a single audio track. The user needs to just select the face of the person whom he or she is recording. While doing this the technology will automatically start focusing on the person's voice based on context.

Google researchers believe that this will bring a lot of improvement for a wide range of application, from speech enhancement and recognition in videos. The hearing aids will be improved especially in conditions like video conferences, where there are more than two or three people speaking.

"A unique aspect of our technique is in combining both the auditory and visual signals of an input video to separate the speech," the researchers added.

Auto Expo 2018: How AI can change the face of Mobility

"Intuitively, movements of a person's mouth, for example, should correlate with the sounds produced as that person is speaking, which in turn can help identify which parts of the audio correspond to that person," they explained.

Most Read Articles
Best Mobiles in India

Best Phones

Get Instant News Updates
Notification Settings X
Time Settings
Clear Notification X
Do you want to clear all the notifications from your inbox?
Yes No
Settings X
We use cookies to ensure that we give you the best experience on our website. This includes cookies from third party social media websites and ad networks. Such third party cookies may track your use on Gizbot sites for better rendering. Our partners use cookies to ensure we show you advertising that is relevant to you. If you continue without changing your settings, we'll assume that you are happy to receive all cookies on Gizbot website. However, you can change your cookie settings at any time. Learn more