Ever since we saw Sci-fi movies such as Star Trek or 2001: A Space Odyssey we became enamored with the idea of a computer that we can correspond with. Then finally, in 2010, our dreams came true and Siri leaped out of our imaginations and into our cell phones and tablets.
Most iPhone users love the personal virtual assistant, known as Siri, that comes built into the phone itself. You can use Siri for almost anything such as find GPS navigation, organize your schedule make a call or text and many other functions that have become much easier thanks to Siri’s speech recognition software. However, exactly what is the technology behind Siri? Let’s take a closer look.
How Siri Works Technically
Siri uses two main technologies: speech recognition and natural language processing (NLP). The first technology is taking the words that a human being said and converting it into a textual form. In practice, when beginning a sentence with the words “Hey, Siri” you activate Apple’s speech recognition software that changes your words into written form. However, this is not so simple because every person has a unique voice timbre and accent which may vary from state to state and country to country.
Apple uses huge, datasets to provide Siri with an effective model of speech recognition which is then trained on varying datasets that are made up of voice samplings from lots of people, which allows Siri to recognize all sorts of accents, inflections, and pace of speech.
Over the past couple of years, there have been many developments in deep learning and the mistake rate of such software has dropped below 10%. When you give Siri a command or ask a question, Siri comprehends your speech, but it sends the converted text back to Apple for additional processing. Apple’s servers would run additional NLP algorithms to get the gist of your question or request. For example, there are many ways of asking Siri to remind you about a meeting. One option is “Hey Siri, can you remind me about the meeting tomorrow at 11?” or “Can you give me a heads up about the meeting tomorrow at 11?” Siri first needs to figure out from all of the various ways formulating a request or question that you would like it to remind you about a meeting tomorrow at 11.
If your phone is not connected to the internet, this could be a problem since Siri does not process your speech on your phone, but on the bright side, there are also a couple of benefits. First of all, by offloading the lion’s share of the work to powerful computers, it allows you to save valuable resources and the data that is collected, is used to continuously improve Siri’s performance.
Such an analysis of your intent requires an extraordinary amount of data to train NLP algorithms. This is why Apple hires lots of engineers who have previously worked with the above-mentioned technologies to train the Siri algorithm. Also, let’s not forget, that when Siri receives a response from Apple’s servers, it must then take the text and convert it into speech, which is not as difficult when compared to processing user command, but it still requires effort on the part of Siri.