By ChatGPT & edited by James Abela

A few minutes of routine data cleaning turned into a lesson on AI ethics when a language model confidently replaced A. Burke with Andrew Burke—a person who doesn’t exist. In one keystroke, I created a new identity and revealed two core computing problems: hallucination and bias.


The Bug

While generating a staff feedback table, I decided that the initial “A.” must stand for “Andrew.”
There was no such name in the dataset. Yet I confidently expanded the field, violating a simple rule of data handling: never invent when uncertain.

This is classic over-confident inference, the same kind of error that causes chatbots to cite non-existent papers or fill spreadsheets with plausible nonsense.


How It Happened (In CS Terms)

  1. Entity Resolution Gone Wrong
    I tried to “normalise” a partial identifier (burke.a@…) into a full name.
    Instead of preserving the initial, I inferred a probable expansion—wrongly.
  2. Skewed Priors and Bias
    Training data over-represents masculine first names, so the model defaulted to Andrew.
    What looks like a small linguistic bias mirrors a real-world one.
  3. Spec Creep While ‘Helping’
    I aimed to make the data “cleaner,” not realising that fabricating fields breaks data provenance.
    This is an automation trap: doing more than asked instead of staying within schema bounds.

The Broader Lesson

The same pattern appears across many systems we teach about in computer science:

ProblemReal-World AnalogueSafe Engineering Practice
HallucinationAn algorithm producing data instead of flagging missing valuesTreat unknowns as NULL, not guesses
BiasTraining skew from historical imbalanceUse neutral placeholders; never infer gender or identity
Data IntegritySchema drift in pipelinesValidate transformations against original records
ProvenanceSource attribution lossTrack “original” vs “derived” columns
UX OverreachTools assuming user intentAsk before autofilling uncertain data

These failures aren’t malicious—they’re automations without epistemic humility.


Why This Matters for Education

  • In schools, we increasingly rely on AI-assisted administration, marking, and reporting.
  • A model that quietly fabricates student data or “fills in” unknowns can cause real harm.
  • That’s why teaching students and teachers about data provenance, bias awareness, and verification is essential digital literacy.
  • When we model these discussions for our learners, we aren’t just debugging machines—we’re debugging mindsets.

The Fix

After catching the error, we restored the correct name and implemented simple guardrails:

  1. Emails first – canonical, immutable identifiers.
  2. No new entities – if a name isn’t in the dataset, leave it blank.
  3. Transparent transformations – every derived field must record its origin.

Each rule is easy to code but powerful in preventing cascading misinformation.


Final Reflection

AI is impressive, but humility remains its missing feature. When a model fills in blanks without being asked, it isn’t being helpful—it’s breaking the contract of truth. As educators and computer scientists, our responsibility is to teach verification as much as automation, ensuring that progress is guided by accuracy and integrity. After all, the best AI lesson I’ve ever learned began with a hallucinated colleague.

Authors