Friday, January 29, 2010

cookies by many different names

Cookies are great, and everyone loves them (chocolate chip are my favorite) but if we leave the Internet to its own device it could potentially drive itself into a state of udder deception where other technologies are secretly used in place of cookies for tracking and identification purposes.

Spending the past two days submerged in various privacy discussions, I've started again deeply thinking about cookies and tracking. The fundamental privacy concerns about HTTP cookies (and other varieties like Flash LSOs) come from the fact that such a technology gives a web server too much power to connect my browsing dots. Third-party cookies exacerbate this problem -- as do features like DOM storage, google gears, etc.

Come to think of it, cookies aren't unique in their utility as dot-connectors: browsing history can also be used. A clever site can make guesses at a user's browsing history to learn things such as which online bank was recently visited. This is not an intended feature of browsing history, but it came about because such a history exists.

But wait, cookies, Flash LSOs, DOM storage, and browsing history aren't uniquely useful here either! Your browser's data cache can be used like cookies too! Cleverly crafted documents can be injected into your cache and then re-used from the cache to identify you.

In fact, all state data created or manipulated in a web browser by web sites has the potential to be a signal for tracking or other dot-connecting purposes. Even if the state change seems to be write-only there could be other features that open up the other direction (e.g., the CSS history snooping trick mentioned above -- or timing attacks).

Stepping Back and thinking about these dot-connecting "features" in the context of the last couple days' privacy discussions has got me wondering if there's not a way we can better understand client-side state changes in order to holistically address the arbitrary spewing of identifying information. I think the first step towards empowering users to protect themselves better online is to understand what types of data is generated by or transmitted by the browser, and what can be used for connecting the dots. After we figure that out, maybe we can find a way to reflect this to users so they can put their profile on a leash.

But while we want to help users maintain the most privacy possible while browsing, we can't forget that many of these dot-connecting features are incredibly useful and removing them might make the Web much less awesome. I like the Web, I don't want it to suck, but I want my privacy too. Is there a happy equilibrium?

How Useful is the web with cookies, browsing history and plug-ins turned off? Can we find a way to make it work? There are too many questions and not enough answers...