In this day and age, smartphones aren’t characterised by email and GPS but by the voice assistants which come bundled with them. We have all asked Google Assistant if there’s any need to carry an umbrella on a certain day or played around with Siri asking if it loved us back. Microsoft’s Cortana in most computers and Alexa bundled in Amazon Echo too have grown relatively deep roots in the homes of many consumers. Bixby still doesn’t understand you though. Despite having been around in the market for quite some time now, these assistants are still restricted to pretty primitive functions in our smartphone such as placing calls, asking for weather updates or taking notes. Could it, however, do more? Run our entire house like JARVIS or become our better-half for life, such as Samantha from Her?
Today’s assistants at best produce scripted results to queries and have no distinctive answers depending on the person’s mood, tone, speech etc. What all does stand in our way before the next generation of intelligent computing can take the place of a human companion?
Contextual Speech Recognition
The biggest issue one faces while trying to talk with an assistant is that instructions have to be explicitly posed with intricate details every time you place a query. This does not only make the experience for a user strenuous but also makes it difficult to handle multiple sub-tasks within the same instruction. Natural Language Processing is a branch of computer science which tries to bridge the gap between speech recognition of a computer system and human style of interaction. Its ability to generate results by filling in and assuming necessary details to process requests which are not provided by the user plays a pivotal part in making the system ‘human-like’. The capability to understand an entire conversation as one rather than treating every instruction as a separate interaction is central to making assistants user-friendly and efficient. Humans use their previous knowledge of the world to understand the context better and resolve any word-sense disambiguations, while it still poses a challenge for computers.
Training and Improvement
The voice recognition model heavily relies on continually updating and improving using the massive data pool it receives from across the globe. That is primarily the reason why even the cheapest of phones today have Google Now that too with no subscription fees. Everything you say is saved as a data set for future versions of the speech recognition algorithm, for bug fixes and improvements. Recognizing speech in comparison with other neural networks additionally pose the obstacles of background noise, variation in accents, pitch and echo. It employs the use of deep neural networks to model complicated non-linear relationships. A deep neural network is a form of artificial neural network which has multiple hidden layers between the input weights and output. Neural Networks form the heart and soul of artificial intelligence systems such as voice-enabled assistants to get better with time and every test input. During this process error of predictions and results is determined, minimised and tweaked to reach maximum accuracy.
Almost all of the human behaviour is a consequence of emotions and not merely the knowledge of right and wrong. Concrete facts can be fed to a system to produce desired results but how one reacts in situational circumstances where there is no pre-defined path brings in ambiguity for assistants. The emotional modulation is inevitable to provide a broader context of the physical world and present qualities of empathy and social sensitivity. The sensation of touch and the ability to seek enables to determine the course of action in humans. However, assistants fundamentally lack sentimental capabilities, making them redundant in situations of emotional polarity. Introduction of sensory input is a significant step towards processing our emotional concerns such that even peculiar behaviour of every individual can be analysed and assessed. The valley between a modelled response and that of a proactive cognitive reaction is still too broad and will have to narrow in the coming years.
The aim of AI, however, is not to accelerate technological singularity but to explore smarter versions of ourselves by efficiently exploiting the shortcomings of the human race and by building systems which would launch us to even greater heights.
Written By Abhishek Mishra