An Echo smart speaker sits on display inside an Amazon 4-star store in Berkeley, California. Image Credit: Bloomberg


Amazon’s chief technology officer has defended the company’s collection of audio recordings taken from people’s interactions with voice assistant Alexa, calling them a vital part of machine learning.

Alexa is Amazon's voice-enabled virtual assistant. She can be commanded to provide traffic updates, play specific songs, or purchases groceries from, among many others skills.

A recent Bloomberg News report claimed that Amazon — one of the world’s most valuable companies — employs thousands of people to listen to audio recordings made when people talk to Alexa, in an effort to improve the service.

Now, the company’s CTO Werner Vogels has come out in defence of the practice.

“Privacy and security will be forever our number one priority, in all of our businesses,” Vogels said in response to a question from Gulf News.

“Whether that’s retail, whether that’s AWS (Amazon Web Services), or whether that’s voice.”

Most automated machine services like Alexa require some element of human training to help make sense of a user’s commands, which aren’t always plainly phrased. This way, Alexa can more easily understand context and colloquialisms, Vogels argued.

The thing about training machine models, Vogels said, “is that you need to be able to figure out what are the things that went wrong.”

“It’s not that our employees are listening in on conversations that people had with Alexa,” he added.

Vogels said that Amazon employees — who are reportedly based around the world from India, to Costa Rica, to Romania — mark all of the utterances that are given to Alexa, and then labels the ones she failed on. “We can mark the ones she didn’t understand,” he said.

But the executive, who has worked at Amazon since 2004 and is largely credited with developing AWS in to the behemoth clouding company that it is today, said that those interactions with Alexa were anonymised, with all account access removed, before they are given to a human employee.

The account numbers associated with the recording are hashed, Vogels said, meaning workers do not have access to who made the recording, or where it was said.

“We’re talking here about a very, very small fraction of all the interactions with Alexa, and all those interactions that Alexa didn’t understand,” he said.

“Then we use humans for verification. That allows you to actually train [Alexa].”

Explaining Amazon’s choice to open multiple data centres in Bahrain, and not a more obvious choice such as the UAE or Saudi Arabia, Vogels said that it was a decision made simply based on the company’s so-called latency map.

Latency refers to the time it takes for data to leave its source and reach the user who is requesting it.

For example, if someone is trying to watch Netflix in Dubai, after they press play the request may have to be re-routed all the way through the US and back to the UAE before the video begins streaming.

“The choice for Bahrain is a latency map consideration,” Vogels said. “We need to make sure that everyone in the geographical area has low latency access.”

The company CTO said that instead of looking at geographical placement, Amazon looked at the network map in order to make a decision.

Many had speculated that Bahrain’s regulatory regime, or it’s proximity to Khobar, where Saudi Aramco is located, were behind the choice.

“We’re really careful in locating these regions such that, in general, everyone in the area has low latency access,” he said.

“That’s the driver for that.”