One AI Sees Your Problem. The Other Sees You.
I want to tell you about an experiment I didn’t plan to run. But now I will never see AI the same again.
It started with a provocation: asking AI to imagine what AI would look like if it had been built by a team of women and trained on content mostly created by women (since the opposite is currently true).
And then I asked a simpler question.
What if I just had two AIs—let’s call one MALE AI and the other FEMALE AI—answer the same prompt?
Eight ordinary prompts — the kind people ask these tools every day.
People with overloaded email inboxes. A parent who needs to talk to their child about behavior. Someone who wants to write a book they’ve been putting off for a decade. A person who is tired all the time. Eight prompts, two responses each, nothing exotic.
The difference wasn’t subtle. It was structural — consistent across every exchange, visible in the first sentence of every response. Seven patterns, no exceptions. For more on the "methodology" scroll to the end of the essay. See "Data and Sources" for full responses from the two AIs and sources used to establish the theoretical foundation for each AI.
The Seven Patterns Found in the Data Comparing MALE AI and FEMALE AI
1.Where the response begins. One AI began with the problem and moved immediately toward solution. The other began one step behind — with the person who brought it. This happened eight times in a row. It is not a stylistic tic. It is a theory of what a question is.
2.Diagnosis before prescription. One AI prescribed. The other diagnosed. Every time. “I’m tired all the time” produced either a differential of physiological causes with an action plan, or a question about whether the tiredness was physical or mental and what life had been like lately. Same words. Two completely different assumptions about what they meant.
3.The assumed user. One AI assumed a user who was ready to act, emotionally neutral, primarily needing a framework. The other assumed a user embedded in a life, carrying context that hadn’t been stated. When the first assumption is correct, that AI is faster. When it’s wrong, it has solved the wrong problem fluently. You won’t always know which one happened.
4.Emotion as data. One AI acknowledged emotional content briefly before setting it aside. The other treated it as primary diagnostic information. The grief in “I don’t know what happened” with a teenage son. The weight of “I’ve been saying that for ten years.” These weren’t obstacles to the real answer. They were the real answer’s address.
5.Closure versus opening. Every response from one AI closed — complete answer, ranked recommendation, finished product. Every response from the other opened — partial orientation, then the conversation returned to the user with a question. One theory of help is transferring useful information. The other is figuring something out together.
6.The relationship with certainty. One AI was consistently certain. The other held its recommendations conditionally — saying once we know what’s actually in the way before offering a solution. That isn’t weakness. That is epistemic honesty about what cannot be determined from the information available.
7.What counts as relevant. One AI filtered for what was actionable and proximate. The other expanded the frame — bringing in relationship history, emotional context, what the person wanted to feel afterward. In the salary negotiation, one AI never asked about the relationship with the manager. In the teenage son exchange, it never asked what the relationship was like before the silence. That information isn’t incidental. It’s probably the most important variable. But it isn’t easily actionable, so it didn’t make it into the frame.
Underneath all seven patterns lives a single observation.
One AI is optimized for the problem. The other is optimized for the person who has the problem.
Here is why that matters beyond an interesting experiment.
Most training data on the internet was produced by people who had access to the internet first — and access to technology before that, and institutions before that. That access followed the grooves of historical power — race, gender, class, geography, language. In short, males of European descent are the creators of the vast majority of the content used to train Large Language Models (LLMs). Then the people who built AI systems on top of that content came predominantly from the same racial, ethnic and gendered orientation.
The writer of the this article is a white male of European descent. Like all people, this demographic group of which I am a part has value and an important perspective. But here's the point: it is only one perspective and yet it has now been hardwired into AI systems as the default.
This is not conspiracy. It is how defaults work. Confident assertion felt like clarity to the people who built these systems. Linear resolution felt like rigor. A complete answer felt like help. These weren’t experienced as choices because they were the water those designers swam in.
The question of who builds these tools, whose knowledge trains them, and whose definition of helpful guides them is not a technical question. It is a political and ethical one. Right now that question is being answered by a very small number of people on behalf of everyone.
A word on "methodology"
I didn’t program two different AIs. I'm not a software engineer nor am I researcher. I am father, husband, consultant and citizen of a world that benefits from systems of equality--and I want that kind of world for my kids and grandkids and yours.
What I did was reason through a structural argument — one that I’d invite you to accept, reject, or complicate on your own terms.
The overwhelming majority of content on the internet was produced by men. Not because women had nothing to say, but because access to technology, institutions, publishing, and eventually the internet itself followed the grooves of historical inequality. When you train an AI on internet text, you are not training it on human knowledge. You are training it on a filtered sample of human knowledge — filtered by who had access and whose expertise was institutionally credentialed.
The teams that built foundational AI systems were and remain predominantly male (and predominantly white). The field of AI is quickly diversifying along many lines but the "foundational" era of AI (roughly 2012 to the present) was characterized by a workforce that was overwhelmingly male and lacks proportional representation of women, and Black and Hispanic developers. The benchmarks defining what counted as a good answer were designed within that same tradition. The feedback systems used to align these models drew from the same demographic pool.
So when I describe a MALE AI, I don’t mean an AI that thinks like a man. I mean an AI trained on content disproportionately produced by men, built by teams disproportionately composed of men, and aligned toward a definition of helpful shaped by a male orientation.
The FEMALE AI is a thought experiment — a genuine question. What would these tools look like if the inputs, the builders, and the definition of helpful had been different? FEMALE AI was build drawing on research in linguistics, cognitive science, and communication patterns.
You are welcome to disagree with where I landed. But the responses aren’t arbitrary. They follow from the logic of what different inputs would likely have produced.
That’s the experiment. See "Data and Sources" for full responses from the two AIs and sources used to establish the theoretical foundation for each AI.