Sun-ha Hong: Big Data's promise to solve society's problems falls short

October 26, 2020

Sun-ha Hong's new book Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society is available now. The introduction and chapter 1 are free to read via the NYU Press. Learn more about the new book in the Q+A below.

Sun-ha Hong

Assistant Professor
Personal Website

What happens to our smart machines when they turn out to be not as smart as they were promised?

How did the book first come about?

In 2013, I was a grad student watching the Snowden leaks play out. It was really an unprecedented moment of public debate and reflection around new technologies and the civic, political repercussions of what we do with these technologies. It suddenly seemed like a serious reckoning of the real world implications of big data was at hand – what happens to real people, their lives and their rights, when we start to turn them into data and judge them by their data.

But there was also a fair amount of skepticism: wouldn’t we just forget about this after a couple of months? After every privacy uproar, every dystopian scandal, we tend to go back and drink deeply from the industry kool-aid. We believe that new technologies are inevitable, and that they’ll optimise everything about our lives: harder, better, faster, stronger. Indeed, even as we were raisig the alarm about data-driven surveillance, we were seeing the exploding popularity of smart machines, self-tracking, and the “Internet of Things”, introducing more and more of this tech into our homes and our own bodies.

So I decided to try connecting the dots. How were these technologies changing what counts as knowledge, as truth, about us? And what happens to our own sense of who we are, and our own sense of the world around us? How are we being required to change who we are in order to become more machine-readable, and to ‘count’ in the right way for the database?

What got you interested in society’s obsession with data?

Across the many different debates we have about new technologies, a lot of the familiar arguments really boils down to what we imagine data is and what we imagine it can do. And a lot of it is a kind of aspirational faith, that the machines will provide objectivity to a society and humanity that seems so woefully lacking in it.

For example, this summer the UK government used a very faulty algorithm to generate students’ university entrance exam scores. This resulted in endless cases of students losing offers, and massive protests ultimately forced a U-turn on the part of the government. There’s an underlying belief there that if you throw algorithms and data at a complex, human problem, it’s going to do provide more accurate and objective results than human decision-making. Or when we hear regular (unsubstantiated) claims that we can use AI to ‘predict’ criminality , it’s riffing off the underlying fantasy that people’s emotional lives, their intentions, their life trajectories, can be reduced into a series of statistical correlations.

The problem is that, inevitably, these assumptions run into a reality that contradicts them, and there arises the violence. Sometimes, it’s just too hard to gather high quality data, leading to inappropriate proxies and unreliable assumptions. In the book, I talk about how difficult it proved to get the kind of data that could precisely predict individual cases of terrorist attack – and so, intelligence agencies would turn to hazy, prejudice-loaded ideas about what constitutes ‘suspicious’ behaviour. At one point you’d even see questions like whether you’ve grown a beard recently, or whether you’ve been subject to assault, as part of an FBI assessment of whether you’re more likely to commit terror attacks. And when you have the immense political pressure, and a broader societal faith, that says you can use data to predict these things…

In their landmark book Objectivity, Lorraine Daston and Peter Galison say that “all epistemology begins in fear” – fear that the world is too complicated to be “threaded by reason”. Perhaps our obsession with data today, or what I call in the book ‘data hunger’, is part of that fear.

What are, if any, misuses of technology and data the general public might not know about?

With the pandemic, we are already seeing an acceleration of data-driven, predictive technologies spreading across workplaces, schools, and other areas of our society. A lot of these tools tend to be very hastily built houses of cards, with a lot of unanswered questions behind the fancy websites – just look at Kiwibot, a startup of automated delivery robots, except it turned out that they were being remotely controlled by outsourced workers in Columbia. But with the pandemic creating a lot of understandable panic and urgency, people are scrambling for solutions, and so we’re seeing a lot of badly built surveillance tech being taken up.

In the book, I talk about ‘control creep’, where technology created for a specific purpose is quickly co-opted by other interests. For example, I examine how Fitbit, the exercise-tracking wristband, originally became popular as part of a broader movement towards self-tracking, where the idea is that you own your data, you control it, and you use it to figure out a better life pattern that suits you as an individual. But over time, Fitbit starts finding new ways to leverage the data it gains about us. Since around 2015, it’s been working with insurance companies, who are really keen to hand out discounted Fitbits and Apple Watches – and then collect that data for their own uses. The potential horizon of use there is clear - I would expect to see such data used to calculate premiums and gate access in the future.

At the end of the day, the majority of what you see branded as an ‘AI’ solution probably doesn’t actually feature any cutting-edge technologies, and the kind of tech we have probably isn’t even appropriate for the particular problem at hand. Princeton computer scientist Arvind Narayanan has a lovely overview of this kind of AI snake oil. Technologies of datafication tend to perform best within tightly prescribed parameters, where they are given very specific tasks to solve. But too often, we are sold the fantastic promise that algorithms and code can be injected into any social problem with the same positive results.

At the end of the day, AI and big data cannot fix chronicly underfunded schools, cure systemic racism at a workplace, or make people wear masks. And when we are led to believe that they can, it is this overreliance on technology that can deepen those problems and inequalities.   

Sun-ha Hong's new book Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society is available now. The introduction and chapter 1 are free to read via the NYU Press.

What else can you tell us about the book?

The book essentially asks: what tends to happen when big data and AI’s promises of objectivity and predictivity inevitably fall short? What do intelligence agencies do when the surveillance systems can’t quite predict the next terrorist attack? What happens to our smart machines when they turn out to be not as smart as they were promised? And the key argument of this book is that when the technology falls short, we often make up the gap with speculations, simulations, and fantasies.

So the irony here is that so much of our technologies are predicated on the promise that they’ll replace human decision-making, and all the uncertainty and guesswork and prejudice associated with that, with something that is more objective, more calculative, and therefore, with greater certainty. But it is precisely this overreach that creates the gaps between what our machines do and the problems we want those machines to address. Incomplete and uncertain data get cobbled together to fabricate a sense of reliable ‘predictions’ and objective ‘insights’.

The problem is that such speculations are themselves neither neutral nor coincidental. The book looks at two broad categories of data-driven surveillance: the first is state surveillance, focusing on the Snowden affair and the American government efforts to predict and prevent terror attacks. The second is self-surveillance, or how the growth of smart machines and IoT tech is producing a new ecosystem for data collection that seeps very deeply into our personal lives – the home camera, the voice assistant, the wristband, and so on. And across both of these sites, the book shows how the ‘facts’ about our lives – everything from what counts as suspicious behaviour, to what counts as more optimal in our exercise, our work, our sex lives – is being reconfigured around these technologies and all their flaws.

What is one thing you’d like people to take away from the book?

That when we encounter data-driven technologies, when we see something that purports to be objective data or a predictive analysis, we should ask: what counts?

If predictive policing companies claim to predict high-crime areas through data, and use that to direct police patrols – what do they count as a predictor of crime?

If an online proctoring company promises that they can catch cheating students using indicators of ‘suspicious behaviour’ –  what are they counting as suspicious?

This question of ‘what counts’ is inseparable, though, from the follow-up: what counts, and for whom? Because it is no coincidence that these flawed, inaccurate, judgmental technologies of datafication tend to be forced upon the tenants, the workers, the students, the migrants, the poor, the vulnerable.

In the book, we find some folks who find this excess of data to be exciting and empowering. In certain circumstances, it can be an opportunity to know yourself better, and to experiment with yourself into a cool, posthuman future. But for many others, often the most vulnerable among us, to appear correctly in databases is the unhappy obligation on which our lives depend.

Recent blogposts and interviews:

Sun-ha Hong NYU Press blogpost, June 23, 2020
Presentation at the Museum of the Moving Image, (starts: 21:42), August 26, 2020
Interview with Art in America, September 20, 2019