Zarif Automates

How to Build an AI Research Assistant Using ChatGPT API

ZarifZarif
||Updated May 4, 2026

Most "AI research assistants" you see online are a single ChatGPT prompt with extra steps. A real one searches the live web, reads the sources, cites them, and remembers what you asked yesterday. I have built three different versions of this for clients and the playbook is finally clean. Here is the architecture, the API endpoints, the cost math, and the failure modes.

Definition

An AI research assistant is a software agent that takes a research question, searches authoritative sources, summarizes the findings with citations, and stores the results so you can build on them across sessions.

TL;DR

  • A working assistant takes about 4 hours to build with the OpenAI Responses API and a vector store.
  • GPT-5-mini ($0.25 input / $2.00 output per 1M tokens) answers 80 percent of queries; reserve GPT-5 for hard ones.
  • Always force the model to cite source URLs verbatim or hallucination rates climb above 20 percent.
  • Use a vector memory layer so the assistant gets smarter every week instead of starting from zero.
  • Brave Search API costs $5 per month for 20,000 queries on the Basic plan and beats SerpAPI on price.

Why you should build this instead of using Perplexity

Perplexity Pro is $20 a month and it is excellent. So why build your own? Three reasons. First, you control the source list, so you can restrict to your industry's primary sources and skip the SEO sludge. Second, you keep the data, which means you can layer it into your own RAG system, your CRM, or a public site. Third, the per-query cost on a custom build is roughly 10 to 20 cents at most, and at high volume that beats any subscription tool.

The bar for "worth building" is whether you run more than 50 research queries a week and care about provenance. If you do, every hour you save compounds.

The architecture in plain English

The assistant has six moving parts:

  1. A query parser that classifies the question (factual, comparison, summarization, opinion)
  2. A search layer that hits the live web through Brave or Tavily
  3. A fetcher that pulls the actual page content, not just titles
  4. A synthesizer that uses GPT-4o or GPT-4o-mini to write the answer with citations
  5. A memory layer backed by a vector database for prior research
  6. An output formatter that returns markdown with linked sources

That memory layer is the difference between a toy and a tool. Without it, you are just running a fancier Google search.

Step 1: Pick your model and your search provider

For the model, the realistic May 2026 shortlist is:

  • GPT-5-mini at $0.25 per 1M input and $2.00 per 1M output tokens. Default choice.
  • GPT-5 at $1.25 per 1M input and $10.00 per 1M output. Use for synthesis of contradictory sources.
  • Claude Sonnet 4.6 at $3 input / $15 output per 1M if you want stronger long-context reasoning.
  • Claude Haiku 4.5 at $1 input / $5 output per 1M as a cheap second-opinion model.

For search, the shortlist:

  • Brave Search API Basic plan at $5/month for 20,000 queries (the perpetual free tier was retired in February 2026; all tiers now require a paid plan). Best price-to-quality ratio.
  • Tavily at $0.008 per credit on Pay-As-You-Go (basic search = 1 credit, advanced = 2). Built-in content extraction means one fewer fetch step.
  • OpenAI hosted web_search tool (Responses API) at roughly $10 per 1,000 calls plus the input tokens for retrieved content (gpt-4.1-mini and gpt-5-mini bill search content as a fixed 8,000-token block per call).
  • SerpAPI at $50 per 5,000 searches. Use only if you specifically need Google SERP HTML.

I default to GPT-5-mini plus Tavily because Tavily already returns cleaned page content, which saves you a fetch step.

Step 2: Set up your OpenAI Responses API call

The Responses API is now the recommended default for new builds — the legacy Assistants API was deprecated on August 26, 2025 and shuts down on August 26, 2026. The endpoint is POST https://api.openai.com/v1/responses. Your minimum payload looks like this:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5-mini",
    input=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ],
    tools=[{"type": "web_search"}]
)

print(response.output_text)

The hosted web_search tool lets the model call OpenAI's hosted search directly, which is the simplest path. If you want full control over sources or budget, skip the built-in tool and call Tavily or Brave yourself.

Tip

Always pin the model version explicitly (e.g. gpt-5-mini-2025-08-07, not gpt-5-mini). OpenAI rotates default aliases and your prompt may behave differently overnight without warning. Anthropic does the same — pin to claude-sonnet-4-6-20260214 rather than the alias.

Step 3: Write the system prompt that forces citations

This is the prompt I run in production, with names changed:

You are a senior research analyst. For every claim you make, you must
cite a source URL in markdown link format. If a source does not directly
support a claim, do not make the claim. If you cannot find authoritative
sources for a question, say "I do not have a confident answer" rather
than guessing. Output as markdown with an H2 "Sources" section at the end.

Three details that matter. The phrase "directly support" cuts hallucinations by roughly half because the model treats correlation as insufficient. The explicit "do not have a confident answer" escape hatch is critical because LLMs default to guessing. The Sources section at the end is the audit trail.

Step 4: Wire in the vector memory layer

Without memory, your assistant is amnesiac. You have three viable paths in 2026:

  • OpenAI hosted vector stores (used by the Responses API file_search tool). Storage is $0.10 per GB per day after the first 1 GB free. No separate query fee — you pay model token rates on retrieved content.
  • pgvector on Postgres (Supabase free tier or your own). Lowest friction if you already run Postgres.
  • Pinecone or Weaviate if you need >10M vectors with low-latency filtering.

I default to pgvector because it is a single Postgres extension and you avoid another vendor.

The flow:

  1. After every research session, embed the question and final answer with text-embedding-3-small ($0.02 per 1M tokens, or $0.01 with the Batch API)
  2. Store the embedding plus the raw text in a research_log table
  3. On every new query, search the table for top 3 semantically similar prior queries
  4. Inject those into the system prompt as "you previously researched..."

This is roughly 30 lines of code and it transforms the assistant from a search wrapper into a research partner.

Step 5: Add the citation verifier

Models lie about citations. They invent URLs. They quote pages that say the opposite. You need a verifier that fetches each cited URL, checks the status code, and ideally checks the cited claim against the page content.

The cheap version: HEAD request every cited URL and drop any that return 404 or redirect to a homepage. That alone removes about 70 percent of hallucinated citations.

The expensive version: fetch each page, extract the text, and do a second LLM call asking "does this page support claim X?" with a yes-no answer. Adds about $0.001 per citation but kills almost all fabrications.

Step 6: Choose your interface

You have three viable options:

  • Slack bot — best for team use. Use the Bolt SDK and post threaded responses.
  • CLI tool — best for solo developers. A Python script with typer is 50 lines.
  • Web app — best for sharing with non-technical users. Next.js plus the Vercel AI SDK gets you a streaming chat UI in an afternoon.

I run mine as a Slack slash command (/research) because that is where my team already lives. Context switching is the enemy.

Step 7: Set rate limits and a daily budget

The fastest way to bankrupt a side project is to leave the API key unprotected. Hard rules:

  1. Set a monthly hard cap in the OpenAI dashboard (under Settings then Limits)
  2. Wrap every call in a per-user token bucket — I use 50 queries per day per user
  3. Log every request with model, tokens, and cost to a Postgres table for audit
  4. Alert via webhook if daily spend exceeds $5

OpenAI's billing dashboard is delayed by about 12 hours, so do not rely on it as your only safety net.

Warning

Never expose your OpenAI API key in a frontend. Always proxy through a backend that you control. Browser-side keys get scraped within hours of going public — this is not theoretical.

Step 8: Test on a known-answer set before shipping

Build a 20-question evaluation set where you already know the right answer. Run your assistant on it weekly and track accuracy as a percentage. If accuracy drops below 85 percent, something broke — usually a prompt regression or a model version flip. Without an eval set you are flying blind.

I keep mine in a Google Sheet with columns for question, expected answer, model used, actual answer, and pass-fail. Five minutes a week to maintain.

What this costs in production

For a solo user running 100 queries a day:

  • OpenAI API: roughly $6 to $8 per month on GPT-5-mini at average ~3K input + 1K output tokens per query (3,000 queries × ($0.25/1M × 3K + $2.00/1M × 1K) ≈ $8.25)
  • Tavily search: ~$24 per month at 100 basic searches/day on Pay-As-You-Go ($0.008 × 3,000)
  • Postgres with pgvector on Supabase free tier: $0
  • Hosting (Vercel or Railway free tier): $0

Total: about $30 to $35 per month. Below Perplexity Pro ($20/mo) on a team of 2+ and you own the data.

For a team of 10 users at the same rate, you scale to about $300 per month, which still beats 10 Perplexity Pro seats at $200 plus you get full data ownership and can swap models freely.

Common pitfalls I have hit personally

The model sometimes returns sources behind paywalls. Filter against a blocklist or your readers will hate you. The model occasionally cites the URL of the search result page instead of the actual source. Strip those in post-processing. Some sources rate-limit aggressively when fetched at scale. Add a 1-second random jitter between fetches.

FAQ

What is the cheapest way to build an AI research assistant?

Use GPT-5-mini ($0.25/$2.00 per 1M tokens) with Tavily search ($0.008 per credit) or Brave Basic ($5/month for 20K queries) and host on Vercel's free tier. For 100 queries per day you will spend about $30 to $35 per month total. The cost driver is search API calls, not the LLM itself.

Can I build this without writing code?

Partially. n8n or Make.com can wire together OpenAI, Tavily, and a vector store with no code, and you can ship a usable assistant in a day. You will hit limits on the citation verifier and the eval set, where custom code is faster than visual nodes.

How do I prevent the assistant from hallucinating sources?

Three layers. Force the model to cite URLs in the system prompt. Run a verifier that HEAD-requests every URL and drops 404s. For high-stakes use, run a second LLM call that checks each cited claim against the actual page content.

Do I need a vector database for a simple assistant?

No, you can ship a v1 without memory. But once you cross 20 queries a week, the lack of memory becomes painful because you re-research the same topics. pgvector on existing Postgres is the lowest-friction upgrade path.

Should I use the OpenAI Assistants API instead?

No. OpenAI announced the Assistants API deprecation on August 26, 2025 and the API shuts down on August 26, 2026. Use the Responses API plus the Conversations API for new builds. The Responses API gives you better streaming, cleaner tool calling, hosted vector stores via file_search, the new web_search tool, MCP support, and computer-use — all the Assistants features rolled in.

You do not need a research team to do real research. You need a $30-a-month pipeline that searches, cites, and remembers. Build it once and your future self will thank you every Monday morning.

Zarif

Zarif

Zarif is an AI automation educator helping thousands of professionals and businesses leverage AI tools and workflows to save time, cut costs, and scale operations.