By ChatGPT & edited by James Abela
A few minutes of routine data cleaning turned into a lesson on AI ethics when a language model confidently replaced A. Burke with Andrew Burke—a person who doesn’t exist. In one keystroke, I created a new identity and revealed two core computing problems: hallucination and bias.
The Bug
While generating a staff feedback table, I decided that the initial “A.” must stand for “Andrew.”
There was no such name in the dataset. Yet I confidently expanded the field, violating a simple rule of data handling: never invent when uncertain.
This is classic over-confident inference, the same kind of error that causes chatbots to cite non-existent papers or fill spreadsheets with plausible nonsense.
How It Happened (In CS Terms)
- Entity Resolution Gone Wrong
I tried to “normalise” a partial identifier (burke.a@…) into a full name.
Instead of preserving the initial, I inferred a probable expansion—wrongly. - Skewed Priors and Bias
Training data over-represents masculine first names, so the model defaulted to Andrew.
What looks like a small linguistic bias mirrors a real-world one. - Spec Creep While ‘Helping’
I aimed to make the data “cleaner,” not realising that fabricating fields breaks data provenance.
This is an automation trap: doing more than asked instead of staying within schema bounds.
The Broader Lesson
The same pattern appears across many systems we teach about in computer science:
| Problem | Real-World Analogue | Safe Engineering Practice |
|---|---|---|
| Hallucination | An algorithm producing data instead of flagging missing values | Treat unknowns as NULL, not guesses |
| Bias | Training skew from historical imbalance | Use neutral placeholders; never infer gender or identity |
| Data Integrity | Schema drift in pipelines | Validate transformations against original records |
| Provenance | Source attribution loss | Track “original” vs “derived” columns |
| UX Overreach | Tools assuming user intent | Ask before autofilling uncertain data |
These failures aren’t malicious—they’re automations without epistemic humility.
Why This Matters for Education
- In schools, we increasingly rely on AI-assisted administration, marking, and reporting.
- A model that quietly fabricates student data or “fills in” unknowns can cause real harm.
- That’s why teaching students and teachers about data provenance, bias awareness, and verification is essential digital literacy.
- When we model these discussions for our learners, we aren’t just debugging machines—we’re debugging mindsets.
The Fix
After catching the error, we restored the correct name and implemented simple guardrails:
- Emails first – canonical, immutable identifiers.
- No new entities – if a name isn’t in the dataset, leave it blank.
- Transparent transformations – every derived field must record its origin.
Each rule is easy to code but powerful in preventing cascading misinformation.
Final Reflection
AI is impressive, but humility remains its missing feature. When a model fills in blanks without being asked, it isn’t being helpful—it’s breaking the contract of truth. As educators and computer scientists, our responsibility is to teach verification as much as automation, ensuring that progress is guided by accuracy and integrity. After all, the best AI lesson I’ve ever learned began with a hallucinated colleague.
