Home Computer graphics Protecting Web Browsing Data From Hackers | MIT News

Protecting Web Browsing Data From Hackers | MIT News


Malicious agents can use machine learning to launch powerful attacks that steal information in ways that are hard to prevent and often even harder to investigate.

Attackers can capture data that “leaks” between software programs running on the same computer. They then use machine learning algorithms to decode these signals, allowing them to obtain passwords or other private information. These are called “side channel attacks” because the information is acquired through a channel not intended for communication.

MIT researchers have shown that machine learning-assisted side-channel attacks are both extremely robust and poorly understood. The use of machine learning algorithms, which are often impossible to fully understand due to their complexity, is a particular challenge. In a new paper, the team investigated a documented attack that was thought to work by capturing leaked signals when a computer accesses memory. They found that the mechanisms behind this attack were misidentified, which would prevent researchers from designing effective defenses.

To study the attack, they removed all memory access and noticed that the attack became even more powerful. Next, they looked for information leak sources and discovered that the attack was actually monitoring events that interrupt other processes on a computer. They show that an adversary can use this machine learning-assisted attack to exploit a security hole and determine which website a user is browsing with near-perfect accuracy.

With this knowledge in hand, they have developed two strategies that can thwart this attack.

“The focus of this work is really on analysis to find the root cause of the problem. As researchers, we should really try to dig deeper and do more analytical work, rather than blindly using black-box type machine learning tactics to demonstrate one attack after another. The lesson we’ve learned is that these machine learning-assisted attacks can be extremely misleading,” says lead author Mengjia Yan, Homer A. Burnell Career Development Assistant Professor of Electrical Engineering and Computer Science (EECS) and member of Computer Science. and Artificial Intelligence Laboratory (CSAIL).

The main author of the paper is Jack Cook ’22, a recent computer science graduate. Co-authors include CSAIL graduate student Jules Drean and Jonathan Behrens PhD ’22. The research will be presented at the International Symposium on Computer Architecture.

A secondary surprise

Cook started the project while taking Yan’s advanced seminar course. For a class assignment, he tried to replicate a machine learning-assisted side-channel attack from the literature. Previous work concluded that this attack counts the number of times the computer accesses memory when loading a website and then uses machine learning to identify the website. This is called a website fingerprinting attack.

He showed that previous work relied on faulty analysis based on machine learning to incorrectly identify the source of the attack. Machine learning cannot prove causation in these types of attacks, Cook says.

“All I did was remove the memory access and the attack still worked as well or even better. So, I wondered, what actually opens the side channel ? ” he says.

This led to a research project in which Cook and his collaborators embarked on a careful analysis of the attack. They designed an almost identical attack, but without memory access, and studied it in detail.

They discovered that the attack actually records a computer’s timer values ​​at fixed intervals and uses this information to infer which website is being accessed. Essentially, the attack measures computer occupancy over time.

A fluctuation in the timer value means that the computer is processing a different amount of information in that interval. This is due to system interrupts. A system interrupt occurs when computer processes are interrupted by requests from hardware devices; the computer must interrupt what it is doing to process the new request.

When a website loads, it sends instructions to a web browser to run scripts, display graphics, load videos, and more. Each of these elements can trigger many system interrupts.

An attacker monitoring the timer can use machine learning to infer high-level information from these system interrupts to determine which website a user is visiting. This is possible because the interrupt activity generated by a website, like CNN.com, is very similar each time it loads, but very different from other websites, like Wikipedia.com, Cook explains.

“One of the really scary things about this attack is that we wrote it in JavaScript, so you don’t need to download or install any code. All you have to do is open a website. Someone could embed that into a website and then theoretically be able to spy on other activity on your computer,” he says.

The attack is extremely successful. For example, when a computer is running Chrome on the macOS operating system, the attack was able to identify websites with 94% accuracy. All of the commercial browsers and operating systems they tested delivered an attack with over 91% accuracy.

There are many factors that can affect a computer’s timer, so figuring out what led to an attack with such precision is like finding a needle in a haystack, Cook says. They conducted many controlled experiments, removing one variable at a time, until they realized the signal must arrive for system interrupts, which often cannot be handled separately from the attacker’s code.


Once the researchers understood the attack, they devised security strategies to prevent it.

First, they created a browser extension that generates frequent interruptions, like pinging random websites to create bursts of activity. The added noise makes it much more difficult for the attacker to decode the signals. This dropped the attack’s accuracy from 96% to 62%, but it slowed down computer performance.

For their second countermeasure, they modified the timer to return values ​​close to the actual time, but not. This makes it much harder for an attacker to measure computer activity over an interval, Cook says. This mitigation reduced the accuracy of the attack from 96% to just 1%.

“I was surprised how such a small mitigation like adding randomness to the timer could be so effective. This mitigation strategy could really be implemented today. affect how you use most websites,” he says.

Building on this work, the researchers plan to develop a systematic analysis framework for machine learning-assisted side-channel attacks. This could help researchers find the root cause of more attacks, Yan says. They also want to see how they can use machine learning to discover other types of vulnerabilities.

“This paper introduces a new interrupt-based side-channel attack and demonstrates that it can be used effectively for website fingerprinting attacks, where previously such attacks were considered possible due to side-channels. cache sides,” says Yanjing Li, an assistant professor in the University of Chicago’s Department of Computer Science, who was not involved in this research. “I liked this article immediately after reading it for the first time, not only because the new attack is interesting and successfully challenges existing notions, but also because it highlights a key limitation of attacks by ML-assisted side channel – blindly relying on machine learning. models without in-depth analysis can provide no understanding of the real causes/sources of an attack, and may even be misleading. This is very insightful and I think that it will inspire many future works in this direction.

This research was funded, in part, by the National Science Foundation, the Air Force Office of Scientific Research, and the MIT-IBM Watson AI Lab.