AI SOP Template: Customer Support Handling

Most support teams already use AI somewhere in the stack. The problem is the AI runs without rules. One agent uses ChatGPT for refund language, another lets Intercom Fin auto-resolve, a third just copies the canned macro because they do not trust the bot. The result is wildly inconsistent CSAT, leaked PII into public model APIs, and a mess of escalations that never should have happened.

A real AI SOP fixes that. Below is a battle-tested template you can drop into Notion, ClickUp, or Waybook today and adapt for your team in under an hour.

Definition

An AI customer support SOP is a written standard operating procedure that defines exactly when, how, and by whom AI is used across the support workflow, including model selection, prompt patterns, human review gates, and escalation triggers.

TL;DR

A working AI support SOP covers four pillars: triage, drafting, resolution, and quality review
Tier 1 deflection with a properly governed AI agent typically resolves 35 to 55 percent of tickets without a human
Every AI step needs a named owner, a frequency, an exact prompt or tool config, and a fallback rule
The single most-skipped section is the "do not let AI do this" list — write it first
Plan for one quality audit per 100 AI-handled tickets minimum, weekly until accuracy stabilizes above 92 percent

Purpose, scope, and ownership

The first page of the SOP is boring on purpose. It tells anyone reading what this document covers and who is on the hook.

Purpose: Standardize the use of AI across all customer support channels (email, live chat, in-app, social DMs, voice transcription) to ensure brand-consistent, compliant, and accurate responses.
Scope: Applies to every ticket touched by the support org, including those handled fully autonomously by AI agents, partially drafted by AI for human approval, and human-only escalations where AI is used only for summarization.
Owner: Head of Customer Support (accountable). Support Operations Manager (responsible for maintenance). AI Governance Lead (responsible for model and prompt approvals). Reviewed quarterly.

A RACI table belongs here. The most common failure I see is "AI Operations" being a phantom role nobody actually does. Pick a real person.

Tools and model approval list

This is where you stop the chaos. List the only AI tools allowed in the workflow, what each is approved for, and who sets them up.

Intercom Fin or Zendesk AI Agents — Tier 1 ticket deflection. Approved for password resets, order status, refund eligibility checks, shipping ETAs.
Claude Sonnet 4.5 via internal proxy — Reply drafting, ticket summarization, sentiment tagging. Never connected to PII without the redaction layer.
GPT-5.1 via Azure tenant — Translation and tone rewriting. Allowed because data does not leave the tenant.
Gong or Fireflies — Voice transcription with PII redaction enabled.
Internal RAG knowledge base — Required source for any product-specific answer. Hallucinations = automatic ticket re-open.

Forbidden: free public ChatGPT, Gemini consumer, or any browser extension that ships ticket text to third parties. Violations are a documented HR matter.

Triage workflow with AI

When a ticket lands, the SOP defines a deterministic path. The AI does the boring sorting; the human decides who owns it.

Auto-classify within 30 seconds. The classifier model tags channel, language, intent (billing, technical, account, complaint, feature request), urgency (P0 to P3), and customer tier (free, paid, enterprise).
Auto-route based on tags. P0 enterprise tickets bypass AI deflection entirely and ping the on-call lead in Slack. Free-tier P3 questions go straight to the AI agent.
Auto-draft for the assigned human. Even tickets routed to a person get a Claude-drafted reply attached, with the citation trail from the knowledge base inline.

Owner: Support Ops. Frequency: continuous. Failure mode: if classification confidence is below 0.7, the ticket defaults to human triage.

The standard prompt patterns

Centralize prompts in a single repo (Git, Notion, or your prompt management tool of choice). Forbid agents from inventing their own. Here are the four patterns every team needs.

Reply drafter prompt skeleton:

You are a support specialist for ACME. Your job is to draft a reply to the customer message below. Use only facts from the SOURCES block; if the answer is not there, write "ESCALATE: insufficient context" and stop. Tone: warm, direct, no exclamation marks. Length: under 120 words unless the issue is technical. End with one clear next step.

Summarizer prompt skeleton:

Summarize this ticket thread in three sections: Customer issue (1 sentence), Actions taken (bullet list), Outstanding questions (bullet list). Flag PII references with [PII] tags.

Sentiment and risk tagger:

Score the customer message on (1) frustration 0 to 10, (2) churn risk 0 to 10, (3) escalation likelihood 0 to 10. Return strict JSON.

Knowledge gap detector:

Compare the agent's reply to the cited sources. List any claim in the reply that is not supported by a source. If none, return "OK".

Tip

Version every prompt. Treat them like code: pull request, review, changelog. When CSAT swings, the first question is "what changed in the prompt last week?" — and you need a real answer.

Human review gates

AI-drafted replies do not just send themselves. The SOP defines exactly when a human must approve.

Auto-send allowed: Tier 1 deflection bot answers where confidence above 0.85 AND the customer is on a free or starter plan AND the ticket category is on the pre-approved list (password reset, order status, basic how-to).
Mandatory human review: Any reply containing a refund commitment, any P0 or P1 ticket, any enterprise customer, any reply where the AI flagged "ESCALATE", any negative-sentiment ticket above 7 frustration.
Senior review: Anything mentioning legal, regulatory, security incident, data breach, public statement, or media. The SOP names a single Slack channel where these go and requires a human reply within 30 minutes.

Owner: Support Lead per shift. Frequency: every queued ticket. Audit: random 5 percent sample reviewed weekly by Support Ops.

Escalation matrix

Without a written escalation path the AI either ducks issues or hands them off to nobody. Spell it out.

Trigger	Owner	SLA	Channel
Refund above $500	Support Lead	1 hour	Slack #refunds
Enterprise P0	On-call Engineer + CSM	15 minutes	PagerDuty
Security or data concern	Security on-call	30 minutes	Slack #sec-incidents
Threatened legal action	General Counsel	2 hours	Email + Slack
Press or social viral	Comms Director	30 minutes	Slack #comms-war-room
AI hallucination caught post-send	Support Ops	4 hours	Recall, apologize, log

Quality assurance loop

Trust requires evidence. Every week, a Support Ops analyst pulls a random sample of 100 AI-handled tickets and scores them on five dimensions: factual accuracy, brand tone, completeness, compliance (PII handling), and customer outcome.

Below 92 percent accuracy: pause auto-send, switch the affected category to human-review-required, root-cause within 48 hours.
Above 95 percent for two consecutive weeks: expand AI scope to one new category.
Any single hallucination that produced customer-visible damage: incident report within 24 hours, prompt or knowledge base update within 72 hours.

Track CSAT, AHT (average handle time), first-contact resolution, and AI deflection rate. Publish the weekly scorecard publicly inside the company. Hidden metrics rot.

What AI should never do (the explicit blocklist)

This is the most-skipped section and the most important. Spell out, in writing, every action AI is forbidden from taking.

Never issue a refund autonomously above $50 without a human approval click
Never confirm or deny security incidents
Never make pricing or contract commitments
Never apologize on behalf of the company in writing for systemic outages without comms approval
Never close a ticket marked "complaint" without a human read
Never send the customer a link to anything that was not in the approved sources list
Never store, log, or transmit raw card numbers, full SSNs, or health records, even temporarily

Print this list and put it above every support desk. It saves careers.

Onboarding and training plan

A fresh agent should be productive on the SOP in their first week.

Day 1: Read the SOP end-to-end. Shadow three live AI-deflected tickets. Write one summary using the standard prompt.
Day 2 to 5: Handle 20 tickets with mandatory peer review on every reply. Spend 30 minutes per day reviewing the prompt repo.
Week 2: Solo with random review on 20 percent of tickets. Pass a five-question SOP quiz.
Monthly thereafter: 30-minute "what changed" briefing whenever the SOP is updated.

The SOP itself is updated quarterly minimum, plus immediately after any incident.

FAQs

Should the AI agent reply directly to customers or always go through a human?

It depends on category and risk. Pre-approved low-risk categories (order status, password reset, shipping ETA) can auto-send when the model confidence is above 0.85. Anything touching money, security, enterprise accounts, or a frustrated customer should always have a human gate. Most teams land on roughly 35 to 55 percent fully automated and the rest human-reviewed.

How do we keep customer PII out of public AI models?

Three layers. First, route all model calls through an internal proxy that strips emails, phone numbers, card data, and SSNs before the prompt leaves your network. Second, only allow tenancy-isolated APIs (Azure OpenAI, AWS Bedrock, Anthropic enterprise) for any payload that might contain regulated data. Third, the SOP must explicitly forbid pasting ticket content into consumer ChatGPT, Gemini, or browser extensions, with HR consequences for violations.

What metrics prove the AI SOP is working?

Four numbers, tracked weekly. Deflection rate (target 35 to 55 percent), AI reply accuracy from the QA sample (target above 92 percent), AHT reduction on human-handled tickets (target 25 to 40 percent improvement vs. pre-AI baseline), and CSAT held flat or improved. If deflection climbs but CSAT drops, the bot is bullying customers into closure rather than resolving.

How often should we update the prompts and the SOP itself?

Prompts are reviewed monthly and updated whenever a hallucination or accuracy dip is traced back to prompt language. The SOP document is reviewed quarterly and updated immediately after any incident. Both live in version control with a visible changelog so anyone can see what changed and when.

Who owns the SOP if we do not have a dedicated AI Operations role?

Default it to the Support Operations Manager with the Head of Customer Support as accountable owner. If you have a Director of CX or VP Support, they sign off on changes. Avoid making the SOP "everyone's job" — that means it is no one's job and it will go stale within two months.