Brand voice checks, hallucination detection, quality scoring, and pre-send checklists for teams and freelancers using ChatGPT or Claude.
Somewhere between "AI wrote this in 8 seconds" and "this is ready to send to a client," something has to happen. That something is review — and most people skip it, rush it, or don't have a consistent system for it.
The result: emails that sound generic, blog posts with made-up statistics, proposals with hallucinated feature lists, social posts that don't sound like the brand. Not because AI is bad. Because AI output without a review process is a first draft presented as a final one.
This guide is a practical system for reviewing AI-generated content before it goes out — for individuals, freelancers, and teams. Not abstract advice. Actual checklists.
AI models write with the same confident tone whether they're right or wrong. A paragraph explaining a real process and a paragraph inventing a process that doesn't exist read identically. That's not a bug that will be fixed — it's a fundamental property of how language models work.
Your review system has to compensate for this. It can't assume the model is right just because the output reads well.
A review process that catches these four things catches most of what matters.
Before checking facts, check voice. If the content doesn't sound like you, the facts don't matter — you'll rewrite it anyway.
The fastest way to do this is a brand voice rubric: a one-page document that defines what your brand sounds like in concrete terms.
| Dimension | Example (Agentcy.services) | Your Brand |
|---|---|---|
| Tone | Direct, no-filler, slightly dry | |
| Vocabulary level | Plain English; no jargon unless client uses it | |
| Sentence length | Short to medium; long sentences only for complex ideas | |
| What we never say | "Leverage synergies," "game-changer," "holistic approach" | |
| POV | Second person (you/your); we = the agency | |
| Energy level | Calm authority; not excited; not corporate |
If AI output violates two or more of these dimensions, rewrite before fact-checking. You're going to change it anyway.
Score each piece 0–1 on these 15 dimensions. Anything below 10/15 goes back for revision.
| # | Dimension | Score (0–1) |
|---|---|---|
| 1 | Answers the brief exactly (no scope drift) | ___ |
| 2 | Leads with the reader's problem, not your solution | ___ |
| 3 | Specific over generic (names, numbers, examples) | ___ |
| 4 | Correct brand voice throughout | ___ |
| 5 | All facts verified or flagged | ___ |
| 6 | No hallucinated citations or tools | ___ |
| 7 | No generic filler (could only we have written this?) | ___ |
| 8 | Call to action is specific and clear | ___ |
| 9 | Length is appropriate — no padding, no abrupt cutoff | ___ |
| 10 | No contradictions within the document | ___ |
| 11 | Audience-appropriate vocabulary (not too technical, not too simple) | ___ |
| 12 | Formatting matches platform conventions | ___ |
| 13 | Nothing legally or reputationally risky | ___ |
| 14 | Reads naturally aloud (no robotic rhythm) | ___ |
| 15 | Passes the "would I sign my name to this?" test | ___ |
| Total | ___ / 15 | |
For client-facing emails specifically — proposals, updates, outreach:
If you're regularly getting outputs that fail the checklist, the problem is almost always the prompt — not the model. The model does what you tell it to. Vague prompts produce vague output.
| Bad Prompt Pattern | Better Version |
|---|---|
| "Write a blog post about AI automation" | "Write a 1,000-word blog post for small business owners who've never used automation. Lead with the problem of manual follow-up costing them leads. Use plain language, no jargon. End with a CTA to book a free audit." |
| "Write a proposal for this client" | "Write a 400-word proposal executive summary for a dental practice with 3 locations. Their problem: no-show rate is 18%, costing ~$8,400/month. Our solution: automated reminders via SMS + email. Outcome: reduce no-shows by 40–60%." |
| "Make this sound better" | "Edit this for clarity and directness. Remove filler words, passive voice, and corporate jargon. Sentences should average under 20 words. Keep all specific numbers and facts. Don't add new claims." |
When output fails on the same dimension repeatedly (always too long, always off-brand, always generic), add a constraint to the prompt that addresses that specific failure.
For agencies or teams where multiple people are producing AI-assisted content:
Set aside 20 minutes per week to review a random sample of AI-generated content that went out. Not to fix it — to learn from it. What patterns of error keep appearing? What prompts are producing consistently strong output? What content types still need more human time?
Most teams skip this. The ones that do it consistently get better results from AI faster than the ones who don't — because they're compounding learning, not just compounding volume.
All 7 tools in one pack: brand voice rubric template, hallucination detection checklist, 15-point quality scorecard, email pre-send checklist, prompt improvement guide with before/after examples, weekly audit template, and AI usage policy template for teams.
AI Output Review System — $27 →