Monday, June 03, 2019

Data Transparency: Revisited

With this academic year behind me, I had some time to think and reflect on what has brought me where I am.  In 2012, I had an opportunity to participate in what served to energize me and mold what ended up being my short but intense career in web tech. 

In April 2012, the Wall Street Journal hosted a Data Transparency Weekend to bring together technologists, activists, journalists, and inventors from across the globe to work on the lack of transparency about how people are watched, profiled, and targeted online.  This NYC event connected me with allies and mentors who all were doing amazing work in online privacy.  From the wicked smart reporting (and organizing) of Julia Anguin and Jennifer Valentino-DeVries, to folks like Dan Kaminsky, Danny Weitzner, Ed Felten, Alessandro AcquistiChris Hoofnagle, Peter Eckersley, and of course Ashkan Soltani whose work has repeatedly inspired my own.  I cannot hope to name all the amazing folks who were there, and thinking back it was incredible we all ended up in the same spot at the same time. To all of you who spent this time with me: thank you.

Since 2012, the level of conversation about online data and tracking has skyrocketed, but not much has changed about how I'm tracked and targeted online; if anything, it has intensified.

Our everyday lives are being invaded by what I consider multi-modal harassment: we are all barraged with unwanted solicitations, phone calls, text messages, emails, and display advertisements.  We're being force-fed product info for things that "annoying brother" thinks you want.  Some of us pay for TV and our programming still gets interrupted with ads.  The web is full of "free" sites, where you pay by allowing them to force-feed you ideas of other things you are supposed to want.  We end up spending money externally (on things we don't actively seek) instead of those things we seek and intentionally use. To me, it feels like I'm always walking up the street to my favorite pub, but against the wind of a severe storm with driving rain of advertisements at my face.

We also face a data collection problem: organizations like Amazon, Facebook, Google, and others are accumulating massive profiles of data on individuals.  They are often "innovative" (reckless) with the data once it is collected.  Secondary use is commonplace for "experimentation", and can lead to unanticipated violations of consumers' privacy.  Tools keep emerging that enable more collection and processing of data.  Facial recognition (FR) and machine learning (ML) are new shiny things that everyone wants, and while they do interesting things, the reaching impact and in fact the degree of "correctness" of using these tools is not widely understood.  ML and FR can be used to make dumb decisions (like connecting porn stars to social media profiles or widespread tracking used for assigning a "social credit score" in China).

How do we know who to trust with information about us when it's not obvious when they're collecting that data?  How can we even make choices about who *to* trust? This is outright information theft when someone observes and measures me for their own un-shared profit.  It's worse when there are no incentives to protect gathered data since it exposes the data subjects like me to unanticipated risk.

When Cyber becomes Physical

Our online presence is monitored and tracked in cyberspace using means that would not be tolerated in the physical world.  I'm not only concerned with the risk we're exposed to due to this collection, but as connected devices become so pervasive, tracking in physical space becomes much more feasible.  This crossing-over of collection from cyber- to physical- realms also brings with it all the risks of the online data free-for-all.

Most of this "innovation" in tracking and data warehousing is driven by marketing.  I used to ruthlessly argue that the right solution was a collaborative effort between marketing firms and consumers.  After having seen the rise and fall of Do Not Track, I no longer believe collaboration can happen.  I now realize that the incentives are all wrong: ad tech cares only about the bottom line and there is little cost in getting ad spreads in front of consumers.  This is wildly different from the physical world where space, audience, and construction costs pressure ad firms to be much more careful about who and how they target.

Where do we go from here?  

We need to solve two giant problems: advertisement inundation and reckless data collection. 

For years I've heard of promises that we'll see better ads (and fewer of them!) if we allow firms to track us.  Neither of these has happened; I get crap calls and see crap ads online, and my eyes and ears are tired of it.  Consumers need more signal and less noise.  Disconnect, callblock, and adblockfast (all promising brain children of Brian Kennish) help attenuate noise.  While disappointing that we need stuff like this, noise attenuation should be a feature of *all* mainstream software, and not an add-on.  Consumers also need to get over the fear of directly paying for web sites and services like we happily do with phone apps.  For those of us who want free stuff and will tolerate ads (like with broadcast TV), a fairer marketing scheme is critical, but that requires some big changes like the ones Brave is trying out.

In the long run, we need to think more about the consumers of our technology and train responsible engineers and architects.  These are the people who *must* consider societal impacts of their work beyond what is fastest, or generates the quickest dollar, which includes being transparent and respectful with how we treat people's data.  If we are to involve consumers in the trade of their data, the first necessary step remains the same as it was in 2012: Data Transparency.  Lets start with that.