The myth of neutral data

Name: The Data Storyteller's Handbook
Author: Kat Greenbrook

Kat Greenbrook
Apr 23
3 min read

Updated: 3 days ago

In meetings, you'll hear phrases like "the data speaks for itself" or "we're just looking at the facts." These statements carry a particular kind of authority. They suggest that data sits above interpretation, above bias, and above the messiness of human judgement.

But data comes from a world already shaped by history, institutions, and power, which have consistently advantaged some groups and disadvantaged others. When data reflects that world, it carries those patterns with it.

Cartoon figure in red pants walks towards the Wellington beehive building, set against a white background.

What data actually reflects

Data captures outcomes. The group differences seen when measuring education, health, income, and incarceration reflect the conditions people live in and the decisions that created those conditions.

But analysts and policymakers commonly interpret these differences as the result of personal choices. Health, for example, is often framed as individual responsibility—even when across almost every condition, the same patterns emerge: people living in under-resourced communities experience worse health outcomes.

This is one way that bad data communication can reinforce inequality. The data without context points audiences toward explanations that leave existing structures undisturbed.

The choices that look like neutrality

Every dataset is the product of decisions. Someone decided what to measure and what to leave out. Someone decided how to categorise people. Someone decided which questions were worth asking.

These decisions often reflect the priorities and assumptions of whoever was doing the measuring. This has historically been dominated by particular groups, institutions, and worldviews. The result is that even apparently neutral analytical choices can be grounded in what are known as dominant-group norms.

Consider how people are grouped. In the United Kingdom, organisations long used the category "BAME" (Black, Asian, and Minority Ethnic) to group everyone who wasn't white. The category was defined by what people were not, rather than by shared characteristics or self-identification. The UK government stopped using it in 2021, recognising that grouping diverse communities under "not white" erased the distinct experiences and needs of each group.

Or consider what gets counted at all. Landlords use screening companies to track tenant payment history and evictions. Prospective tenants can be rejected based on this data. But there's no equivalent database tenants can use to check whether a landlord ignores repairs or violates housing codes. The data exists to help landlords screen tenants, not to help tenants screen landlords, because landlords have more power in the housing market.

These are examples of what happens when the systems producing data don't prioritise (or don't see) certain communities.

What happens when context is stripped away

When data is shared without the context of how it was produced, the most familiar interpretation fills the gap. And the most familiar interpretation is usually the dominant one—the one that has been reinforced by institutions, media, and policy over time.

In other words, presenting data "neutrally" (without explaining the conditions that shaped it) just makes the dominant perspective invisible. And when a perspective becomes invisible, it becomes the default. Analysts and audiences absorb it without realising they've made an interpretive choice.

This is why context makes data communication honest.

What this means for data practitioners

Data professionals need a willingness to ask harder questions about the data before communicating it.

A few worth starting with:

Who decided this data was worth creating, and why?
Who is included, and whose exclusion was treated as acceptable?
Whose experience is being treated as the default, and who decided that?
What historical decisions created the patterns I'm seeing?

These questions don't always have easy answers. Sometimes they don't have answers at all. But naming that uncertainty is itself a form of honest communication. Your goal is to be clear about what the data can and cannot say, and to resist the pull toward the most familiar interpretation when a more accurate one is available.

This is the focus of my current research and writing. If you'd like to follow along as these ideas develop, the best place to do that is my newsletter.

Kat Greenbrook is a data storytelling consultant, author, and workshop facilitator based in Wellington, New Zealand. She is the founder of Rogue Penguin and the author of The Data Storyteller's Handbook.