Wont it be cool to have an AI that listens to someone’s voice in a matter of seconds and then able to speak different things in the same voice? Well, this voice replicating possibility now exists.
Last month, researchers at Baidu Silicon Valley AI lab developed a neural voice cloning system. This system requires only a few samples of a person’s voice to generate speech in their voice. Until now, AI neural networks required a large amount of training data (voice) to carry out the cloning process where it performs this task based on thousands and even millions of algorithms. But Baidu’s voice cloning tool, Deep Voice requires only 3.7 seconds of audio to clone a voice which wins over its own previous efforts that took 30 minutes of audio/data to do the same last year.
Remember the movie “Terminator”, where Arnold Schwarzenegger in a dramatic scene tries to find out if the woman is alive via replicating a kid’s voice who is standing next to him inside a phone booth. This futuristic movie has now turned into some sort of reality and questioning whether AI is a boom or a bane. The developed algorithm by Baidu’s researchers enables robots to learn on their own without pre-programmed solutions that are used by existing voice recognition programs such as Siri. The cloning tool is more realistic with the results when more voice-based data is used. It seems like the collective intelligence working behind Artificial Voice Generation is accelerating at a limitless pace. And it’s quite surprising how fast the tech world is evolving.
However, there’s a known fact that every coin has two sides. Likewise, every new approach towards technology brings a new ray of hope to make life more interesting or a possibility of turning it into misery. Whatever the hot debate may say, the truth is everything has a flip side. If used appropriately, be it technology or any other benefit, can do wonders, but if misused can bring havoc.
Advantages of Personalized Voice Cloning and Internet of Things (IoT)
Perhaps the winning part of speech recognition and then it’s cloning is that the technology is placed right where it’s becoming good enough for companies to enable exciting applications for the users. Like for example, if one wants to caption a video content (lecture) and make it accessible to everyone, with deep learning technology, a user can withdraw an effective line and save the effort without playing around with words. And it’s possible that in the future one can do a lot of things with high quality. Today, things such as hands-free interfaces in cars make it safer to use the technology. It also makes mobile devices and home devices easier, efficient and more enjoyable to use. Typing letter by letter into flat panels of mobile phones and tablets may become history and people may soon stop using keyboards. With voice detection facility, individuals can feed screen with voice securely. Presently, if someone has recorded a podcast and wants to get a professional transcript with 100% accuracy, then they have to do it manually and invest time to listen and note from scratch. This task creates a limited capacity of scale and low margin. However, with the arrival of speech recognition technology, it will become easier to fabricate such complexities of speech accurately.
With such an invention, companies active in making customers living experience better can identify this as a capitalizing opportunity. Similar to Google Assistant, where it is now offering celebrities voices as an assistant, this speech recognition technology can bring a huge response from its users. Similarly, Baidu may sell or add a facility for its users in the future to provide celebrity-voice services as their assistants. It can also be used to create the best experience of late actors, in case if one wants them to play a specific role in the movies. To make the situation more authentic, their visual imagery generated with motion capture and AI-cloned voice of the late actors can provide the most realistic experience to the audiences ever. Additionally, listening songs from favorite singers, synchronizing different lyrics, or stories from old folks without disturbing their sleep can become a reality.
Moreover, the coming days will define if the voice from AI is a boom or a bane but today the smart speakers are going to explode as IoT is changing the lifestyle of people all around the world. All thanks to the AI-voice interfaces such as Alexa and Google home devices, customers now desire to convert their homes into ‘AI Heaven’. In addition, the governments of developed countries are discussing ‘how to make the cities smart with IoT in the next twenty years’. With AI-technology, the next step will aim towards reading facial expressions along with voice recognition. Assuming sensors built in the room which could read expressions and suggest something good to eat to overcome stress can be predicted. Despite these avenues, on the other side of the coin, it may go deeper and distract individuals’ medical condition and bring a whole lot of other senses that were not even needed to be attached.
Is Voice-Cloning Entering the Dark Side of AI?
A company like Baidu with its voice-cloning system capability results in the voice convincing enough that it’s able to fool a voice recognition system over 95% of the time. Some commentators are concerned that fraudsters may use it to trick voice-based authentication systems and gain access to private information. Robots recognizing patterns and then making a prediction are very different from human actions and their own actions. This may result in the world depending on robots and their future activities. This will also make the evolved world restless, fearful, and never-ending process of finding peace.
In 1997, Deep Blue, the computer created by IBM, defeated World Chess Champion, Garry Kasparov. This was one of the first major examples of AI in action.
The voice cloning technology is intense because people can now witness an exponential growth in just a matter of short time. They might not be prepared to enroll such technology in their lives Let’s figure this out. It is believed that the development of language is the biggest evolution of Homo sapiens in order to communicate with each other. It was the basic and necessary creation that took place, further helping them to trigger the evolution process. Also, the fact is because of diversity in languages developed and manipulated time-after-time, people fought with each other to become superior. Likewise, many inventions that have happened so far have brought threat as well as changes in the society. And so the truth to be told, history is warning the mankind again that whenever humanity has achieved something ahead of its time, people themselves became a major threat to their own existence. Therefore, with the advent of voice cloning technology, its massive benefits can turn into potential drawbacks.
But not to worry, this is not the first time people are concerned about something, which they cannot ignore. Technology is always meant to be man’s best friend, but misusing and abusing it, is mankind’s failure. Though AI-fraud is a present-day concern, the negativity outweighs its positive aspects. However, if people become aware and take proper actions, only then will it be called a true development.