the wild web: transparency

Showing posts with label transparency. Show all posts

Monday, June 03, 2019

Data Transparency: Revisited

With this academic year behind me, I had some time to think and reflect on what has brought me where I am. In 2012, I had an opportunity to participate in what served to energize me and mold what ended up being my short but intense career in web tech.

In April 2012, the Wall Street Journal hosted a Data Transparency Weekend to bring together technologists, activists, journalists, and inventors from across the globe to work on the lack of transparency about how people are watched, profiled, and targeted online. This NYC event connected me with allies and mentors who all were doing amazing work in online privacy. From the wicked smart reporting (and organizing) of Julia Anguin and Jennifer Valentino-DeVries, to folks like Dan Kaminsky, Danny Weitzner, Ed Felten, Alessandro Acquisti, Chris Hoofnagle, Peter Eckersley, and of course Ashkan Soltani whose work has repeatedly inspired my own. I cannot hope to name all the amazing folks who were there, and thinking back it was incredible we all ended up in the same spot at the same time. To all of you who spent this time with me: thank you.

Since 2012, the level of conversation about online data and tracking has skyrocketed, but not much has changed about how I'm tracked and targeted online; if anything, it has intensified.

Our everyday lives are being invaded by what I consider multi-modal harassment: we are all barraged with unwanted solicitations, phone calls, text messages, emails, and display advertisements. We're being force-fed product info for things that "annoying brother" thinks you want. Some of us pay for TV and our programming still gets interrupted with ads. The web is full of "free" sites, where you pay by allowing them to force-feed you ideas of other things you are supposed to want. We end up spending money externally (on things we don't actively seek) instead of those things we seek and intentionally use. To me, it feels like I'm always walking up the street to my favorite pub, but against the wind of a severe storm with driving rain of advertisements at my face.

We also face a data collection problem: organizations like Amazon, Facebook, Google, and others are accumulating massive profiles of data on individuals. They are often "innovative" (reckless) with the data once it is collected. Secondary use is commonplace for "experimentation", and can lead to unanticipated violations of consumers' privacy. Tools keep emerging that enable more collection and processing of data. Facial recognition (FR) and machine learning (ML) are new shiny things that everyone wants, and while they do interesting things, the reaching impact and in fact the degree of "correctness" of using these tools is not widely understood. ML and FR can be used to make dumb decisions (like connecting porn stars to social media profiles or widespread tracking used for assigning a "social credit score" in China).

How do we know who to trust with information about us when it's not obvious when they're collecting that data? How can we even make choices about who *to* trust? This is outright information theft when someone observes and measures me for their own un-shared profit. It's worse when there are no incentives to protect gathered data since it exposes the data subjects like me to unanticipated risk.

When Cyber becomes Physical

Our online presence is monitored and tracked in cyberspace using means that would not be tolerated in the physical world. I'm not only concerned with the risk we're exposed to due to this collection, but as connected devices become so pervasive, tracking in physical space becomes much more feasible. This crossing-over of collection from cyber- to physical- realms also brings with it all the risks of the online data free-for-all.

Most of this "innovation" in tracking and data warehousing is driven by marketing. I used to ruthlessly argue that the right solution was a collaborative effort between marketing firms and consumers. After having seen the rise and fall of Do Not Track, I no longer believe collaboration can happen. I now realize that the incentives are all wrong: ad tech cares only about the bottom line and there is little cost in getting ad spreads in front of consumers. This is wildly different from the physical world where space, audience, and construction costs pressure ad firms to be much more careful about who and how they target.

Where do we go from here?

We need to solve two giant problems: advertisement inundation and reckless data collection.

For years I've heard of promises that we'll see better ads (and fewer of them!) if we allow firms to track us. Neither of these has happened; I get crap calls and see crap ads online, and my eyes and ears are tired of it. Consumers need more signal and less noise. Disconnect, callblock, and adblockfast (all promising brain children of Brian Kennish) help attenuate noise. While disappointing that we need stuff like this, noise attenuation should be a feature of *all* mainstream software, and not an add-on. Consumers also need to get over the fear of directly paying for web sites and services like we happily do with phone apps. For those of us who want free stuff and will tolerate ads (like with broadcast TV), a fairer marketing scheme is critical, but that requires some big changes like the ones Brave is trying out.

In the long run, we need to think more about the consumers of our technology and train responsible engineers and architects. These are the people who *must* consider societal impacts of their work beyond what is fastest, or generates the quickest dollar, which includes being transparent and respectful with how we treat people's data. If we are to involve consumers in the trade of their data, the first necessary step remains the same as it was in 2012: Data Transparency. Lets start with that.

Thursday, December 27, 2012

what is privacy?

Often times when I find myself in a conversation about Privacy, there's a lack of clarity around what exactly we're discussing. It's widely accepted that people who are experts on privacy all speak the same language and have the same goals.

I'm not so sure this is true.

This came up in a discussion with Jishnu yesterday, and we needed a common starting place. So I'd like to take a little time to lay out what I'm thinking when I talk about Privacy, especially since I'm mainly focused on empowering individuals with control over data sharing and not so much on keeping secrets.

Privacy is the ability for an individual to have transparency, choice, and control over information about themselves.

At the risk of sounding too cliché, I'm gonna use a pyramid to explain my thinking. There are three parts to establishing privacy:

First, an organization's (or individual's) collection, sharing and use of data must be transparent. This is crucial because choice and control cannot be realized without honesty and fairness.

Second, individuals must be provided choice. This means data subjects (those people whose data is being collected, used or shared) must be able to understand what's going to happen with their data and have the ability to provide dissent or consent.

Third, when it's clear what's happening and individuals have an understanding about what they want, they must be given control over collection, sharing or use of the data in question.

This means control depends on choice which depends on transparency. You cannot make decisions unless you're given the facts. You cannot make your desires reality unless you've decided what you want.

For the engineers out there (like me), this dependencies can be modeled as such:

[Transparency] = Awareness of Data Practices
[Choice] = [Transparency] + Individual's Wants
[Control] = [Choice] + Organizational Cooperation

Control is the goal, but it requires Transparency and Choice to work -- as well as some additional inputs. Privacy is the whole thing: all three pieces acting together with support from both data controllers and data subjects to empower individuals with a say in how their data is used.

The privacy perception gap is a symptom of ineffective transparency and choice; it is the result of peoples' inability to really understand what's going on so they have no chance to establish positions about what is okay. When transparency and choice are built into a system, the gap shrinks and people have most of what they need to regain control over their privacy.

What is privacy to you?

Thursday, October 11, 2012

ownership and transparency in social media

Les Writes:

"You don’t own the spaces you inhabit on Facebook. You’re enjoying a party at someone’s house, and you barely know the guy. In fact, your content is the currency that pays for the booze (ie. the privilege of using their servers). That’s why it’s free-as-in-beer: You’ve given them what you post, instead of money. That’s valuable stuff, if they can ever quite figure out how to sell it." [link]

It's not completely fair to expect that FB users realize the data about them that they so generously contribute to FB no longer belongs to them. My hypothesis is that many people feel that no matter who has facts about you and prints them, they're still *yours*. After all, companies have trademarks, can't things about me be mine and reserved for me?

On a smaller scale, the monetization of facts about me is not surprising; I give an interview to a magazine, they print it, it gets syndicated, no surprise. On a large scale (lots of data collection, frequently) I think people lose track of with whom they are communicating and get immersed in the task at hand. Is it my FB friends, or is it FB, who is helpfully telling my friends things? This system is flexible, crazy, complex, shiny and distracting! Can I use it to video chat with my friends? That's neat. Oh, geez, I forgot FB is in the middle of all this communication...

People who sign up for FB are not signing up to contribute their life to this stranger throwing a party. They sign up assuming it is a tool they can use to communicate with their friends; it is a machine they've "bought" (for free, heh) to help them communicate. Nobody reads the terms of service. Nobody reads the privacy policy. They accept them since other people have and only read what their friends write. Many are in denial or do not realize that what they contribute to the site is just that: a contribution.

I think there is shared responsibility here; consumers should be a little bit wary--but this isn't their area of expertise. As such, the site operator also has a duty to be more forthcoming with what's going on. My communications tool is supposed to be a communications tool. If you market it as a "free communications tool that sells my data," I am better informed than if it's just marketed as a "communications tool."