Saturday, July 17, 2010

mind the gap



( Image credit: tnaric via Flickr )

Web privacy is a very hard problem to solve. Well, at least what people perceive as "the privacy problem."

One of the main reasons it's hard to "solve privacy" is that the term privacy is used in many contexts to indicate many things.

This is ironic, since user data is also used in many contexts to indicate many different things. That is, a piece of data may be considered private in some contexts, but not in others.

I'd like to take a more focused approach and concentrate on one main cause of the "ZOMG my privacy is violated!!1!" uproars to see if we can't help address it.

Facebook Beacon. In late 2007, Facebook launched a new Beacon feature that caused a brisk community reaction. The feature automatically syndicated users' activities on partner sites to their Facebook news feed. For example, if you bought tickets to the Harry Potter movie on fandango.com (a Beacon partner), it might be broadcast to all your friends where and when you were going to the movie. People were mad because non-Facebook activities were now automatically imported into Facebook and shared.

Google Buzz. Google turned their new "buzz" feature on for some of google users in February 2010. This feature automatically created a twitter-like stream for things you do (such as what you read in google reader and photos you upload to picasa) and immediately connected you to "follow" other google users in your "exchanged mail with" list. Harriet Jacobs' article exemplifies the reaction. She didn't want people who emailed her on occasion to know everything she does, but suddenly this new technology connected her activities to everyone she had received mail from.

LSOs, a.k.a. Flash Cookies. When people clear their cookies, not all cookies actually get deleted! Gasp! Here's why: Adobe's Flash plug-in has its own data storage space on your computer -- separate from where your browser stores cookies, bookmarks and passwords. The browser doesn't have direct control over Flash's data, since Flash is essentially a separate application that happens to show its content inside your browser window. The result? You clear cookies, but your browser doesn't know how to clear flash cookies. How is this used? In many ways, but one particular sneaky use rubs many people the wrong way: web sites can use flash to keep longer lived cookies on your system that can be used to re-populate regular cookies after you clear them. People are mad. (FYI, this is being worked out, see this bug).

The Gap. There's this dark and mysterious area between what users think is happening with the data they put on the web and what actually happens. I call this the Privacy Perception Gap (PPG). There are a variety of reasons this gap exists:
  1. Software makers are not psychologists -- they don't know what people expect, only how the system works.
  2. Software makers are not anthropologists -- they don't know how different cultures expect secrets to be kept or shared.
  3. Software is reactive -- users complain, software is re-engineered, and the cycle repeats
  4. The PPG is not well understood

This last reason is something we can address with proper research. First, we need to understand the size and reason for the PPG before we can close it, especially before we know who is best poised to do the work. Is it users, user agents, infrastructure, applications, or a combination who should take the giant leap? How big is this gap on average? Surely it's different for various web applications.

If we minimize the PPG, we can expect users to be better informed, and that may have solved the variety of situations enumerated above. Users wouldn't be surprised with what happens, and the suspicion that web companies are out to violate their users would be reduced significantly.

I'm a big fan of transparency (see Open and Obvious), as it is a big part of the PPG problem. We should start by making data relationships transparent: this includes disclosure first and then most importantly user accessibility second. For instance, having a privacy policy linked from my web site doesn't really make me transparent unless users can find and understand it. The gap doesn't shrink if users don't understand! My theory is that an informed user is a happy user, and if we can better understand the PPG we can take the first step towards making web users happy.


Thursday, July 01, 2010

privacy-preserving videos

It's hard to resist the YouTube hotness -- especially if you feel like blogging about the adorable kittens you just saw on YouTube because all your friends must absolutely see it. Sarcasm aside, there are plenty of reasons to embed videos on a blog or web site, but there's something to consider: the embedded video might be used to track your visitors!

Frankly, any content embedded on your site but hosted by another site can cause privacy concerns. When the content is requested, the user's cookies go along with it. The effect is that not only does YouTube know what site a user is on when viewing the kitten video, but they potentially also know who the viewer is! This means that unless you log out of YouTube regularly, they know all the videos you've viewed and on what sites you saw them.

This isn't breaking news. This stuff has been around for a while, it's just Not Obvious.

So you run a blog or web site and you don't want to aid in YouTube tracking your visitors. What do you do? You have three options in my view:

1. Use YouTube's privacy mode. You can embed videos that suppress sending cookies until the visitor has clicked to watch them. This works by changing URLs for any content automatically loaded by your site from the domain "youtube.com" to "youtube-nocookie.com". This way the cookies (for youtube.com) aren't sent until more content is loaded from youtube.com after the click.

2. Don't trust 'em? Use EFF's MyTube. This does something similar, but the static content is loaded from your own site, then when the visitor clicks on the video, it is loaded from youtube.com and the cookies are sent.

3. Open Video FTW! If possible, host the video yourself (on your own website) and use the <video> tag! More info here. Flash is not the only way to display video on the web! Use open standards!

Thursday, June 03, 2010

open and obvious privacy practices

Lately, there's been all sorts of hubbub about use of private information on websites (ahem, Facebook), but it's not really clear what's okay, and what's not okay.

Personally, I don't have a problem with sites using the data I give them as long as they're straightforward and actually Ask me for the data and admit they'll share it. I'll even hand out more extra credit brownie points if they tell me what the plan to do with it. And I'm not talking about linking to legalese privacy policies (maybe 0.5% of visitors to a site have read the privacy policy), it's gotta be up-front and in the main content. For most people, I imagine the feeling of violation comes in when there's perceived deception in data use practices.

For example, if a site says to me, "if you give me your address, I'll show you a list of stores selling fruit in your area", I'm happy to provide my address for that service. I feel comfortable in knowing what is happening with the data I provide, and this transparency gives me comfort.

On the other hand, if the same site doesn't say anything to me and simply infers my location from some sort of browser history sniffing trick, then shows me the same ads, I'll feel a bit violated when I figure out what happened. There are two points of friction in this second scenario: (1) I wasn't asked for the data, and (2) I was unaware of how the data would be used or with whom it would be shared.

Stephanie Clifford of the Times writes an article about sites that are starting to be transparent and straightforward with their data collection and use. When your users enter into a relationship with you knowing well that you intend to use the data they provide, everything works out swimmingly. If you instead just collect the data and later start using it for something new that users catch wind of, they are shocked, feel violated, and you end up in a predicament like Zuckerberg.

In a few upcoming posts, I'll go more in depth about my thoughts on web privacy. For now I'll conclude with hope that more sites will be upfront and transparent with what they do and will keep descriptions of their privacy-related practices accessible to users--users who armed with an understanding can make an educated choice on whether or not they should be sharing their data.

Friday, May 21, 2010

view source

View-source and inspection techniques are probably the most important feature set for the open web. It is creativity lubricant, and helps aspiring web authors learn new tricks. I strongly believe that this is one of the main forces driving rapid innovation on the web. What other platform is so open that you can just pop the hood and take a peek? Yeah yeah, cars have hoods, you can pop them and peek, I know; to make a car you need lots of fabrication equipment or at least parts and an engine hoist, but to make a web site you just need a computer and vi.

Software is a magnificent, intangible product that is completely the result of imagination at work. One could liken it to art: software is a clever rearrangement of bits of digital data whereas art is a clever rearrangement of "bits" of color and texture. When we can inspect how the artist creates, we learn new tricks that evolve our own web-art. A web without inspection tools is like viewing low-resolution copies of famous paintings; the artists' brush strokes and exact color choices aren't present, so hints to any method is gone and you only see the end image. I can't grow my skill as an artist by knowing what other artists paint, but I can learn an awful lot by seeing their brush strokes up close.

When I asked him what he thinks is the best part of the web, Cory Doctorow said "view source." He spends lots of time thinking about technology with respect to its benefits and drawbacks, so I give much credence to his opinion. Inspection is also how I learned to make my bits of the web, so I am a bit partial.

Related:

Monday, May 03, 2010

facebook privacy erosion

I went into my privacy settings on facebook to turn off the "instant personalization" program (I don't really want facebook to provide my info to other sites automatically), and was a little miffed by the experience of disabling it:

First, I unchecked the box that said "Allow select partners to instantly personalize their features with my public information when I first arrive on their websites." This was me reverting backwards towards previous policies facebook had back when it was not sharing data with third party sites.

Anyway, when I checked the box, I got the usual "are you sure?" dialog that attempted to convince me to reconsider. In addition, it let me know that checking the box won't completely opt me out, since my friends will still be leaking my information to these third party sites.

 Kudos on facebook for telling me this, but why can't the check box actually control both the data I allow to be transmitted and that sent by my friends? They explain in the dialog (and in fine print on the pref page) that I can block the application and that will stop my data flowing from my friends, but for the life of me I can't figure out what the application is called and how to block it. Any advice here?

I don't like that I have to review the facebook privacy policy and the settings page what seems like every time I log in; this is a nasty side-effect of the slow erosion of their privacy policy and settings. I constantly have to be figuring out what kind of relaxing of the privacy policy facebook is doing next. I realize the importance of monetization (and I'm impressed that they're trying to find something new, something not advertisements to make them money), but I guess I value control of my data a bit more than facebook does.

Friday, April 09, 2010

history sniffing fix has landed

David Baron's history sniffing fix has landed in the trunk repository (VCS nerds, click here for details)! This means you can grab one of our nightly builds and try out the fix for yourself -- but be warned, these nightlies aren't always stable, since they're rapidly changing.

While the fix isn't in the final version of Firefox yet, it should be in the next feature revision (3.7, or whatever major comes up next), and is shipping in alpha releases starting with 1.9.3a4. We're hoping to use the incubation time in nightlies, alphas, and probably a beta or two, to make sure the fix works and get feedback from some users. If you are skeptical about our fix or just want to test drive it, grab a nightly build and let me know what you think!

Wednesday, March 31, 2010

turning off the :visited privacy leak

Since I started at Mozilla, I've been trying to increase momentum on fixing the history sniffing privacy leak. I've been able to get lots of people interested, and David Baron has worked hard to come up with a fix. This is a hard problem, and the stars have finally aligned: the Firefox source code, our thinking, research, and a need have come together to get this done.

David has nearly finished an implementation of a plug for the leak, and it's a pretty nice solution that strikes a balance between privacy and utility. In the end, we're going to have to break the web, but only a little bit, and in ways we believe can be recreated with other techniques.

The fix has three parts:
  1. :visited is restricted to color changes. Any size or other types of layout/loading effects are disabled. This is foreground, background, border, SVG outline and stroke colors.
  2. getComputedStyle and similar functions will lie: all links will appear unvisited to the web site, but you'll still see the visitedness when the page is rendered.
  3. The layout code has been restructured to minimize the difference in code path for laying out visited and unvisited links. This should minimize timing attacks (though it can't remove them all).

I don't think web sites should be able to extract your browsing history without your consent; this is one of the bits of the web that rubs me the wrong way, and I'm excited we've made some progress towards removing this from the web. If it rubs you the wrong way too, and you just can't wait for our upcoming fix, you can turn off all visited links in Firefox 3.5 and newer. This breaks the web even more, but is an immediate hack if you want to hide from the sniffers.

Over the last few years, I've been collecting a list of sites that show how this trick can be abused. Hopefully all of them will stop working with the new fix!

More reading:

Friday, January 29, 2010

cookies by many different names

Cookies are great, and everyone loves them (chocolate chip are my favorite) but if we leave the Internet to its own device it could potentially drive itself into a state of udder deception where other technologies are secretly used in place of cookies for tracking and identification purposes.

Spending the past two days submerged in various privacy discussions, I've started again deeply thinking about cookies and tracking. The fundamental privacy concerns about HTTP cookies (and other varieties like Flash LSOs) come from the fact that such a technology gives a web server too much power to connect my browsing dots. Third-party cookies exacerbate this problem -- as do features like DOM storage, google gears, etc.

Come to think of it, cookies aren't unique in their utility as dot-connectors: browsing history can also be used. A clever site can make guesses at a user's browsing history to learn things such as which online bank was recently visited. This is not an intended feature of browsing history, but it came about because such a history exists.

But wait, cookies, Flash LSOs, DOM storage, and browsing history aren't uniquely useful here either! Your browser's data cache can be used like cookies too! Cleverly crafted documents can be injected into your cache and then re-used from the cache to identify you.

In fact, all state data created or manipulated in a web browser by web sites has the potential to be a signal for tracking or other dot-connecting purposes. Even if the state change seems to be write-only there could be other features that open up the other direction (e.g., the CSS history snooping trick mentioned above -- or timing attacks).

Stepping Back and thinking about these dot-connecting "features" in the context of the last couple days' privacy discussions has got me wondering if there's not a way we can better understand client-side state changes in order to holistically address the arbitrary spewing of identifying information. I think the first step towards empowering users to protect themselves better online is to understand what types of data is generated by or transmitted by the browser, and what can be used for connecting the dots. After we figure that out, maybe we can find a way to reflect this to users so they can put their profile on a leash.

But while we want to help users maintain the most privacy possible while browsing, we can't forget that many of these dot-connecting features are incredibly useful and removing them might make the Web much less awesome. I like the Web, I don't want it to suck, but I want my privacy too. Is there a happy equilibrium?

How Useful is the web with cookies, browsing history and plug-ins turned off? Can we find a way to make it work? There are too many questions and not enough answers...