The other night, while I was sitting, waiting for a blind XSS to pop, a post caught my eye. This ended up in a multi-hour wormhole on modern keylogging.
Certain smartphone features have become so ubiquitous that they’re not even used to advertise anymore, such as tilt control, shake control, and touch screens. Nowadays, these all seem like the very base features a device should have. We’ve been so spoiled by our tech overlords that you may have not even questioned how your phone is able to dynamically adjust the screen brightness or warn you that moisture has been detected in the charging port (time to get the rice).
Inside your distraction rectangle, there are numerous sensors that generate the streams of data that are processed to provide these features, like a thermometer so you can be warned when your device overheats or a compass to help guide you to your destination.
In contrast to camera or microphone access, sensor data is innocuous. Right? Surely, you would be uncomfortable if your fitness app could eavesdrop on your conversations or snap a selfie at any moment. But who cares if that same app can monitor how many steps you’ve taken throughout the day? Right? Besides being passively shamed for being lazy, there’s no security concern. Right?
The platforms seem to agree with this sentiment, which is why you’ve never seen a prompt like this:
Since the data that sensors generate is generally considered insensitive, user permission grants haven’t been a major concern.
However, as unthreatening to privacy as sensor data may seem, over the years, multiple studies have demonstrated how sensor data can be leveraged to invade privacy and carry out attacks.
In what are known as “inference attacks”, available data is analyzed to extrapolate sensitive information that was not directly included. Since these attacks leverage statistics to detect trends or patterns, the accuracy of predictions scales with dataset size.
AKA weaponized guessing.
While there are many sensors embedded in modern devices, the inertial measurement objects (IMUs) have been ousted as the benign offenders to security.
IMUs measure acceleration, angular velocity, and magnetic fields.
These sensors are contained in small micro electro-mechanical system (MEMS) chips that sense acceleration, rotation, angular velocity, vibration, displacement, and heading.
A gyroscope measures angular velocity (how quickly something rotates over time) around three relative axes:
An accelerometer, as you may guess, measures linear acceleration (the rate at which velocity changes over time) along three axes:
A magnetometer measures orientation by detecting the strength and direction of Earth’s surrounding magnetic field along three axes:
As far back as 2011, researchers at the University of California at Davis detailed how inference attacks can be carried out against touch screen smartphones in their publication, TouchLogger: Inferring Keystrokes on Touch Screen from Smartphone Motion.
To protect user input on a touch screen, the screen and its supporting firmware will only relay the coordinates of tap events to the currently opened view, to prevent background processes from reading user input.
However, by developing an Android application that collected and analyzed orientation data, the research team was able to correlate the distinct vibrations and device rotations caused by taps to their associated key coordinates
For the numerical keyboard (the one displayed when making phone calls), TouchLogger was able to infer, with 71.5% accuracy, which digits were input.
This allowed the team to infer highly-sensitive information that is commonly entered on the numerical keyboard, such as SSNs, dates of birth, credit card numbers, and PIN numbers.
Smartphones are not the only type of modern device vulnerable to inference attacks. In 2016, researchers from Wichita State University presented their study on smartwatch-based inference attacks in the publication, Smartwatch-Based Keystroke Inference Attacks and Context-Aware Protection Mechanisms.
The team was able to demonstrate how a single smartwatch can be used to infer what a user is typing on an external QWERTY keyboard.
While observing the wrist movements made by an individual as they typed a fixed sequence of letters, they found a level of consistency that led them to believe an attacker could create a dictionary of commonly used words and map them to their corresponding wrist movement patterns.
Although this attack also required microphone access in order to detect when a keystroke was made, the researchers argue that a malicious application could seek access in order to support voice commands or dictation, a common feature in smartwatch devices.
In the first preliminary experiment on a single participant, the attack achieved 100% accuracy in classifying the left and right key regions and 95% accuracy in directional transitions. When attempting full sentence recovery from a small dictionary, 93.75 of words four letters or more were successfully recovered.
In more realistic tests with 25 participants, a contextual dictionary built from news articles was used. In this case, only an average of 31.2% of the words typed were recovered, as the difference in typing speed and consistency between participants skewed the results. Users who typed at a speed similar to the training reference were more vulnerable than users who typed much faster or slower.
However, the accuracy of the first experiment against a single participant shows how individually tailored attacks are highly effective.
More recently, in the 2019 publication Spearphone: A Lightweight Speech Privacy Exploit via Accelerometer-Sensed Reverberations from Smartphone Loudspeakers, the research team discovered that vibrations from a smartphone’s inbuilt loudspeaker can affect accelerometer readings.
Leveraging this discovery, the team developed Spearphone, an attack that uses accelerometer data, signal processing, and readily available machine learning tools to eavesdrop on voice calls and audio played through the speaker. Even though they had no direct access to the microphone, the attack was able to infer speech with alarming accuracy.
Spearphone successfully identified the gender of whoever was talking with 90% accuracy and correctly matched their identity 80% of the time.
In number recognition, the attack was able to recognize digits zero through ten and the word “oh” to account for the alternative pronunciation of zero with 74% accuracy when spoken by a single person. Out of 58 words within the tested wordset, Spearphone was able to correctly recognize them 81% of the time.
In tests of how accurately Spearphone could reconstruct natural speech, 82% of words across full sentences were recognized. When excluding minor words such as “with” and “is” that had a minimal impact on understanding, the success rate increased to 96%.
We have grown so accustomed to certain smartphone features that we’ve paid them no mind. However, the components that make them possible can be used to leak some of our most sensitive information, with little to no means to opt out of their use.
As our devices become more sensor-rich and the capabilities of machine learning continue to improve, inference attacks are poised to become more accurate, scalable, and less visible. Security and privacy are no longer just concerns reserved for camera and microphone access.
These sensors that produce “insensitive” data pose an even greater threat. If an application asked for permission to log your text messages, would you allow it to?
When we assume something is low-risk, we invite risk.