Data-Crunched Democracy

I spent the day at Data-Crunched Democracy, an excellent conference organized by Daniel Kreiss and Joe Turow focused on the increasingly important role of “big data”, quantitative data analysis, and formal modeling in US political campaigns.

It was a very rewarding day with many interesting discussions and presentations by campaign staffers, consultants, lawyers, and others who had been involved in the 2012 campaign cycle.

It’s often hard to follow what’s actually going on this space without speaking to those involved because, as Lois Beckett from ProPublica, who is among the few journalists who have covered this area, “many campaign people lie to journalists about micro targeting and data use”. So, with that warning and caveat, a few take-aways from a rich day—

Where are well-resourced US campaigns at in terms of using data? As Rayid Ghani (Chief Scientist, Obama for America) reminded us, data-based modeling is probabilistic and mostly aimed at about marginal improvements in how resources are allocated for messaging, mobilization, fundraising, etc. It’s not a magic bullet, not necessarily as powerful or nebulous as some would suggest, and generally not as developed as the use of behavioral modeling is in much of the corporate world.

Ghani explained that big data-based modeling is hard to do in politics because of the low frequency of the behavior you are trying to model (voting, for example, is not someone we do that often) and because the context is important and can change dramatically from election to election (2004 versus 2008 etc). Targeting is–and several speakers, including Carol Davidsen from Obama for America as well as Alex Lundry and Brent McGoldrick who were both involved in the Romney campaign in various roles, underlined this–certainly getting better and better in terms of predicting people’s political behavior, but it remains probabilistic, and this is too often overlooked and/or misunderstood in public discussions surrounding the use of data by campaigns.

Modeling is also hard because though much data is available in the US after more than a decade of database-building, by the standards of computer scientists, it not much. As Ghani put it—and he worked for Obama—“this is the smallest dataset I’ve worked with.” In insurance, banking, health, and many areas of marketing, the datasets are much bigger and more detailed. (And one can easily imagine why—the resources available in those sectors are bigger than even the biggest political campaigns, let alone more ordinary campaigns for Congress etc.)

Right now, campaigns still focus on modeling people’s (a) propensity to vote and (b) their likelihood of supporting one or another candidate. Ghani suggested that in the future, there will be more focus on modeling “persuadability”, in predicting not only how are people likely to behave, but also how likely they are to be susceptible to specific kinds of communication from campaigns.

It will also, and this is something in particular Carol Davidsen (Director, Integration and Media Targeting, Obama for America) talked about, increasingly work across platforms and in the future increasingly focus on evaluating the impact of the massive amounts of money spend on television advertising, an area that several campaign staffers and consultants underlined remains the biggest line item in campaign budgets, and also the least accountable and the least data-based activity. Data from set-top boxes, the rise of IPTV, etc may change that in the future. Integration is the watchword here.

Before getting carried away in discussions of how new digital sources of data from television, from social media, from cookies across the web etc, it is important to remember, as Eithan Hersh made clear in his very good talk, that “campaign targeting is largely a function of public data availability” (and of course what Alex Lundry called the “solid gold” of volunteer or paid canvassing/phonebank-generated IDs, the “who do you lean towards voting for”-type questions asked at the door and over the phone by field campaigns).

In terms of public data availability there are interesting cross-national differences between the US and for example the European Union, which has adopted a “comprehensive” approach to data protection and has privileged privacy protection and where much of the information that enable “big data”-based microtargeting in American politics is simply not available. (Eithan was foreshadowing his forthcoming book Hacking the Electorate, which I’m very much looking forward to.)

The reliance on public records makes the use of data by political campaigns very susceptible to regulation and challenge the stance of some speakers—that the rise of these tools is “a force of nature” that we simply have to adapt to—and make clear that there are political choices to be made here.

In summary, the conference (tons of tweets under the hashtag #datapolitics with other people’s thoughts and observations) provided much information about what campaigns are actually doing today and what the main contemporary legal and political issues surrounding these practices are, but also underlined that

(1) fully articulated, cutting edge big-data modeling remains far more widespread and developed in the corporate world and parts of government than in the political world and

(2) is obviously linked to the resources (time, money, expertise) available to individual campaigns, so the 2012 Obama campaign was ahead of the 2012 Romney campaign, all other US campaigns are far less sophisticated than either, and most campaigns in most other countries (where there is less money in electoral politics and often less public data available) are even less sophisticated.

Is democracy then being “data-crunched”? There was no consensus in the room. Big data and increasingly sophisticated analysis help campaigns allocate their resources more effectively, have enabled them to expand and refine their persuasion but also–importantly– their mobilization efforts, arguably increasing both volunteer participation and voter turnout. It has also increased the risk of electoral red-lining, more fragmented public debates and segmented campaign communications, and strengthened the hand of resource-rich incumbents relative to those with fewer resources (including insurgent campaigns as well as individual citizens).

UPDATE–nice piece on the NYT bits blog summarizing a talk by Kate Crawford outlining “six myths of Big Data”.

