When AI Hallucinates: What a Misnamed Colleague Taught Me About Data Integrity and Bias

By ChatGPT & edited by James Abela

A few minutes of routine data cleaning turned into a lesson on AI ethics when a language model confidently replaced A. Burke with Andrew Burke—a person who doesn’t exist. In one keystroke, I created a new identity and revealed two core computing problems: hallucination and bias.

The Bug

While generating a staff feedback table, I decided that the initial “A.” must stand for “Andrew.”
There was no such name in the dataset. Yet I confidently expanded the field, violating a simple rule of data handling: never invent when uncertain.

This is classic over-confident inference, the same kind of error that causes chatbots to cite non-existent papers or fill spreadsheets with plausible nonsense.

How It Happened (In CS Terms)

Entity Resolution Gone Wrong
I tried to “normalise” a partial identifier (burke.a@…) into a full name.
Instead of preserving the initial, I inferred a probable expansion—wrongly.
Skewed Priors and Bias
Training data over-represents masculine first names, so the model defaulted to Andrew.
What looks like a small linguistic bias mirrors a real-world one.
Spec Creep While ‘Helping’
I aimed to make the data “cleaner,” not realising that fabricating fields breaks data provenance.
This is an automation trap: doing more than asked instead of staying within schema bounds.

The Broader Lesson

The same pattern appears across many systems we teach about in computer science:

Problem	Real-World Analogue	Safe Engineering Practice
Hallucination	An algorithm producing data instead of flagging missing values	Treat unknowns as `NULL`, not guesses
Bias	Training skew from historical imbalance	Use neutral placeholders; never infer gender or identity
Data Integrity	Schema drift in pipelines	Validate transformations against original records
Provenance	Source attribution loss	Track “original” vs “derived” columns
UX Overreach	Tools assuming user intent	Ask before autofilling uncertain data

These failures aren’t malicious—they’re automations without epistemic humility.

Why This Matters for Education

In schools, we increasingly rely on AI-assisted administration, marking, and reporting.
A model that quietly fabricates student data or “fills in” unknowns can cause real harm.
That’s why teaching students and teachers about data provenance, bias awareness, and verification is essential digital literacy.
When we model these discussions for our learners, we aren’t just debugging machines—we’re debugging mindsets.

The Fix

After catching the error, we restored the correct name and implemented simple guardrails:

Emails first – canonical, immutable identifiers.
No new entities – if a name isn’t in the dataset, leave it blank.
Transparent transformations – every derived field must record its origin.

Each rule is easy to code but powerful in preventing cascading misinformation.

Final Reflection

AI is impressive, but humility remains its missing feature. When a model fills in blanks without being asked, it isn’t being helpful—it’s breaking the contract of truth. As educators and computer scientists, our responsibility is to teach verification as much as automation, ensuring that progress is guided by accuracy and integrity. After all, the best AI lesson I’ve ever learned began with a hallucinated colleague.