March 20, 2024

By Deborah Acheampong

Computing Science professor, Nicholas Vincent earned a 2024 ACM SIGCHI Outstanding Dissertation Award for his excellent research in human-computer interaction (HCI).

Based on evaluations of technical depth, the significance of their research contributions, potential impact on the field, and the quality of presentation, the award is given annually to a maximum of five recent doctoral recipients worldwide.

"I was extremely honored and humbled to learn that my work received this recognition. The award was a wonderful reminder that the research in my dissertation was deeply collaborative, drawing on efforts from so many mentors and collaborators. I particularly enjoyed revisiting the Acknowledgments section of my dissertation, which includes some comments about how research relying on collaborative efforts resonates with the idea that impact on a machine learning model can only be achieved by a group acting together (a key theme across the dissertation chapters)."

Vincent’s research focuses on studying the relationship between human-generated data and modern computing technologies, including systems often referred to as “AI.” The overarching goal of this research agenda is to work towards an ecosystem of widely beneficial, highly capable AI technologies that mitigate inequalities in wealth and power rather than exacerbate them. His work touches on concepts such as “data dignity,” “data as labor,” “data leverage,” and “data dividends. 

We spoke with Professor Vincent about his research and what the future towards equitable AI looks like. 

Tell us about your research agenda and its relationship between human-generated data and AI technologies.

My research agenda involves projects focused on several avenues for supporting a healthy “ecosystem for AI data.” One approach involves conducting studies that seek to measure the value of specific datasets and data sources to inform people about the existing value they provide and potential leverage they might bargain with. For instance, this involves work aiming to estimate how much a platform like Wikipedia contributes to the success of a search engine or large language model. Another approach involves building tools that empower individuals and communities to have more agency over their data. Examples on this front include projects that simulate collective action, (what if a large group of people withheld their data from a company or redirected their data to a new tech company?) and work in progress on social platforms for data sharing. Finally, a third approach involves understanding different policy regimes; what might happen if new laws changed how data is collected or retained, especially if those laws enforce stronger notions of data consent? 

Underpinning all these solutions-focused avenues, I also work on projects that aim to define new frameworks for thinking about data, for instance defining the “dimensions of data labor.” 

How do concepts like "data dignity," "data as labor," "data leverage," and "data dividends" shape your work?

Data dignity, to quote from the RadicalxChange Foundation’s web page on the topic, can be captured by the following argument: ‘Technology companies wield highly concentrated power over the way peoples’ data is used and make enormous profits from it. They can do this because we “bargain” for Big Tech services as if we were all isolated individuals, with “personal” datasets. In fact, the data we produce is always deeply social. Sharing it affects our friends, families, and communities as much as it affects us. People should be able to exert democratic collective bargaining power over their data, make joint decisions controlling its use, and negotiate appropriate compensation.’ My projects are related to supporting a data dignity paradigm. 

The data as labor concept suggests we should think of all kinds of data-creating activities – using social media, writing blogs, uploading photos, etc. – as a form of labor for tech companies. It can be thought of as a precursor to data dignity. 

Data leverage is the bargaining power available to any data creators that derives from their potential ability to withhold or redirect data in the future and, even delete past data. 

Data dividends are one approach for redistributing the economic winnings of AI systems to data creators. We could imagine people using data leverage to bargain for some direct or indirect payments, to avoid a future in which a small group of people collect all the profits of AI (even though those AI systems rely on our vast collective efforts). 

What inspired your research area? Can you share examples of how your research contributes to fostering equitable AI technologies? 

When I began my research career, I was extremely excited about the benefits that new AI progress could bring but concerned about the potential for concentration of wealth and power, and the resulting negative societal effects that could cause (e.g., destabilization of many institutions). 

I believe that helping data creators (i.e., all members of the public!) to bargain with technology companies, it is possible to directly involve the public in governing AI. I view this as a complementary means of participation, alongside other approaches like traditional political participation, internal employee activism, and other kinds of collective action. 

How do you envision academia, industry, and policymakers collaborating to promote equitable AI technologies? 

I believe that supporting a healthy data ecosystem is strongly in the interests of all these parties. A major concern with the advent of generative AI is that new technologies may undermine the online platforms where valuable data creation occurs. The effect that something like ChatGPT will have on online question and answer platforms parallels early concerns that Google search might hurt Wikipedia. But I believe that informing people about their collective data value and empowering them to exert agency over their data is in everyone's interests here. A comparison might be made to how labor organizing and labor rights in many ways helped many industries become more sustainable in the long-term. Excessive power concentration hurts everyone except the very few power holders overall, and much of my work is about applying this idea to the AI context. 

What projects are you pursuing to advance this research agenda? 

One ongoing project involves building new platforms for users to opt in to sharing data that can help improve future large language models. Another project involves building new interfaces that tell people about the potential power afforded by their data contributions (if they can act collectively). This work can also be helpful for other situations in which people want to organize collective action online. I am also working on several projects that seek to understand who is likely to experience economic harm first from LLMs, and how the tech industry might think about economic well-being more broadly.

What does the ACM SIGCHI recognition mean for your research?

"The award highlights a major shift in public attention towards new tensions between data creators and new AI systems. I appreciate the award committee noting that these topics are likely going to rise to the forefront of AI and society discussions – which means there’s a lot more work to do!"