_
_
_
_
_

Do Alexa, Google and Siri listen to our conversations?

A viral video on social media shows how the Amazon speaker records voice fragments that do not include the activation word. How much can these devices capture?

Amazon's Alexa device inside a home
Amazon's Alexa device inside a home.MIKE BLAKE (Reuters / ContactoPhoto)

“Alexa, set the alarm for 8 a.m.”, “Alexa, play the movie Oppenheimer” or “Alexa, tell me what the weather will be like at Easter.” All of these interactions with the smart speaker are recorded and are available to any user who requests them from Amazon. And that’s exactly what criminologist María Aperador did. But she was surprised to discover that some of the recordings made by the device were not preceded by the activation word, “Alexa.” She reported her findings in a video on TikTok and Instagram that has since gone viral. How is this possible?

Amazon’s policy is clear: no audio is stored or sent to the cloud unless the device detects the activation word. This is confirmed by the company. Amazon adds that a user knows when Alexa sends their request to the cloud due to the blue light indicator or the alert sound from the speaker.

With this in mind, David Arroyo, a Spanish researcher specialized in cybersecurity and data, explains what might have happened: “The system they have is only activated when someone says the trigger word. But, for various reasons, it can have false positives. What we would have to see is to what extent it is able to counter elements that are disturbing the interpretation of the wake word.”

Voice recognition systems that use machine learning, such as those used by Alexa, Google and Apple speakers, incorporate disparate elements to improve their performance. But still, it is not an easy task. “These systems are designed to identify all the elements of variability due to pronunciation,” says Arroyo, referring to different accents and ways of speaking, but also to changes in resonance or reverberation in the room in which the device is located. “We would need to know in detail the accuracy and false positive rate of the algorithm used by Amazon.”

EL PAÍS spoke with María Aperador to learn a little more about the recordings, which last around six seconds. They are fragments of casual conversations, in which she or other people are talking at her home. The criminologist has not reviewed the more than 500 audio files that Amazon sent her, but in the 50 or so that she has listened to, she found two in which there was no trigger word.

A study by researchers from Ruhr University Bochum and the Max Planck Institute for Security and Privacy highlights the problem of accidental activations in smart speakers. After analyzing 11 devices from eight different manufacturers, the researchers found more than 1,000 involuntary activations. “We are talking about voice recognition systems, which depending on how they are implemented, can work better or worse,” says Josep Albors, director of Research and Awareness at the cybersecurity firm ESET Spain, in reference to false positives.

How speakers detect the activation word

To activate after hearing “Alexa,” “Ok, Google” or “Hey, Siri,” smart speakers have a system that constantly tracks those terms. “In the end, they are devices that are constantly listening. But smartphones or many intercoms also do this. It is not exclusive to Alexa,” says Albors.

Arroyo also makes this assessment. “When you put the speaker on active standby, that means it is constantly absorbing what you are talking about. It doesn't record it. But the algorithm is processing it, because it has to see what words are being spoken.”

This is an algorithm that works locally, on the device itself, searching for the acoustic patterns corresponding to the trigger word. Amazon claims that its technology only relies on information from sound waves to detect the wake word, adding that the speaker can also be activated with a button, which would avoid sound monitoring. In the case of recordings, which occur when the device is activated, users can choose not to have them stored in their privacy options.

What’s the problem with a device constantly tracking the trigger word? The two cybersecurity specialists agree that if a person’s conversation was processed to extract data beyond the keyword search, this would be a very serious privacy issue. But they also agree that there is no evidence that this is the case. “There are a lot of vested interests for why they don’t want that to happen, because it would mean the loss of trust in all devices and a very considerable economic damage for these companies,” says Albors.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Archived In

Recomendaciones EL PAÍS
Recomendaciones EL PAÍS
_
_