Saturday, November 29, 2025

Data Feminism - The Importance of Being Earnest


Statistics and economics are two sciences which are adept at explaining the past than predicting the future. While Nobel Laureates of all science streams are awarded for their studies that transform the future, the Sveriges Riksbank Prize in Economic Sciences invariably goes to an economist who explains the past. Statistics is definitely not as vast as economics and, therefore, could be diagnosed with a lesser effort for the maladies that plague it. Data Feminism is a delightful step in this direction. Authored by two American academics, Catherine D'Ignazio and Lauren Klein, this book is eminently readable and shorn of the intimidating academic-style writing. Books written by academics are a semantic nightmare. Their convoluted jargon compulsively prioritise the writers’ erudition over readers’ understanding.  This book is more a passionate plea for action than a lofty intellectual pursuit.


At the outset, the choice of the title is thoughtful and suggestive. Feminism is used as a metaphor for fair opportunities and treatment. The book, like feminism, is both a problem-statement and a potential response. The allegories don’t end there. The book begins with story of Christine Darden, the famed black NASA Aeronautical Scientist, who broke the barriers of race and gender through the American race for space supremacy. In an era where computers were still in their infancy, most complex mathematical problems were solved by humans. Particularly, women. More particularly, three black women - Katherine Johnson, Dorothy Vaughan, and Mary Jackson.  It might look incredulous but these human computers were termed as unskilled manpower and were refused higher positions. Their arresting yet inspiring journey, is subject of a bestseller and a blockbuster, The Hidden Figures. The authors state that, “It is because of both her contributions to data science and her advocacy for women that we have chosen to begin our book, Data Feminism, with Darden’s story”


A decade ago, data sets that lacked integrity were primarily a problem concerning the Government. More precisely, the welfare state. The quintessential welfare state desperately sought cues from data to formulate policies that could maximise the impact of its alleviation initiatives. But the increase in the power of computation and decrease in the cost of computation, both exponentially, made Artificial Intelligence a ubiquitous presence in less than a decade. Not just the Government, but corporations and commoners are nibbling the fruit which is longer forbidden. Falling costs and easy access will soon ensure that AI would be as essential as electricity. As AI seeks to play crucial role in both decision making and service delivery, faulty data set trained AIs are contagions that could get unleashed in the most unsuspecting manner.


Sample this. A father discovered his teenage daughter’s pregnancy when he found a drab promotional discount coupon from a Fortune 500 retail-chain behemoth in his letterbox. The fiasco was an unintended result of an innocuous request from the marketing team to identify customers who are pregnant. The team was guided by another piece of research, most probably from another data set, which revealed that pregnancy is a major life event to convert a casual shopper into a life-time customer. The data scientist working for the Company devised pregnancy detection model which gave a pregnancy prediction score to each customer based on the purchasing patterns. The young lady’s purchasing patterns ticked off the pregnancy detection model and the marketing division religiously courted her with discount coupons.


The entire episode, needless to say, swirled into PR tornado that caught the corporation completely unawares. While most look at such incidents as a privacy breach, a closer scrutiny would reveal a latent gender angle to it. Would such triggers be set off if an underage boy bought liquor or contraceptives? What we see is a data collection system that adversely affects the privacy of a particular gender. These are the issues that Data Feminism explores. The inequality in privileges and oppression that data sets create, perpetuate and accentuate. The result is a sub-standard data set that neither serves the purpose of the corporation or its customer.


The book studies the pitfalls of the current data sets through the framework of Intersectional Studies and explores actionable solutions. Coined in 1989 by Kimberlé Williams Crenshaw, an American civil rights advocate and a scholar of critical race theory, Intersectional Studies analyze how different combinations of social and political identities result in varying levels of oppression and privileges. Intersection Studies look at not just the data, but the intent, context and process embedded in it. Are the right questions asked for collecting data? For example, it is common to find surveys asking women the kind of contraception they prefer. But they are seldom asked if they have access to abortion. A bad data set leads to flawed policy response. It neither wins the Government votes nor improves the lives of citizens.


The good thing about bad data sets is that they can be improved. What is worse is data sets that don’t exist. In a democracy, if you don’t get counted, you don’t exist. Governments are biggest collectors of data and in democracies it is natural that such efforts are driven by electoral prospects. Missing data threaten to leave vulnerable sections of people out, particularly in the first-past-pole system. For example, which one would take precedence? Data on child abuse or data on unemployment? It wouldn't be surprising if the unemployment count be done by Government while child abuse data be left to an NGO.


While I would reserve the reasons for the bad data sets and other related stuff for the next post, I would leave you with the wonderful work of a Nigerian-American artist Mimi Ọnụọha which has been highlighted in this book. She created The Library of Missing Datasets which contains files with compelling titles such as “Public list of citizens on domestic surveillance lists", “People excluded from public housing because of criminal records”, “Poverty and employment statistics that include people who are behind bars” etc. Just that when you open the file, it is just empty.


Simple yet profound.