AI can hear what you’re typing over Zoom with 93 per cent accuracy

An AI can detect what is being typed according to the sounds different keys make when being pressed on a keyboard.

Be careful what you type during Zoom meetings: a deep learning AI algorithm can identify the keys pressed on a keyboard with 93 per cent accuracy, based on the sounds of your keystrokes.

Pressing different keys on a keyboard generates different sounds, which may enable an AI to detect what you’re typing
Luis Alvarez/Getty

Joshua Harrison at Durham University, UK, and his colleagues trained the CoAtNet deep learning AI model, most commonly used to classify images, to “hear” which keystrokes correlated to the letters and numbers pressed on a keyboard by feeding it the sound waveforms created when each key was pressed.

When put to the test, the model picked up which keys were being pressed with 95 per cent accuracy when audio of the keystrokes was recorded on a phone 17 centimetres away from the laptop, falling slightly to 93 per cent accuracy from a recording of a Zoom call.

Harrison believes that Zoom’s noise suppression features, designed to tamp down background noise from calls, may account for the small difference in performance. Zoom didn’t respond to a request to comment on the researchers’ findings.

Because it is a deep learning model, which makes connections that aren’t always immediately clear to people, Harrison isn’t fully certain how it works, but believes it is identifying the difference in sound based on the parts of the keyboard that are used.

“If you think of a drum, if you hit different parts of the drum skin – whether it’s near the wall, whether it’s going to centre – it makes different sounds,” he says. “Similarly, with something like a laptop or a keyboard, the placement of the keys on that board could lead to the difference in the sounds that this model picks up.”

This experiment only looked at one AI model run on one computer analysing one keyboard, but Harrison says the model could probably be made to work on other devices. “The core takeaway is that this very high level of accuracy was achieved using completely open-source software [and] off the shelf devices,” he says. “This accuracy was best in class for this field of research.” Advancements in AI since the experiment was conducted make it probable more recent AI models will be even more accurate, says Harrison.

“I would tend to take this seriously,” says Eerke Boiten at De Montfort University in Leicester, UK, who was surprised by how accurate the model was. He believes the research will raise awareness of the risks of so-called side channel attacks, which harness data leaked inadvertently through a tool, though he isn’t sure that the researchers’ suggestion to avoid such an attack – that they take video calls in a room where no microphones are present – is practical.


arXivDOI: 10.48550/arXiv.2308.01074

Post a Comment