All case studies
Agentic AI ยท Tax Technology

Building a Tax Copilot: Trustworthy AI for Tax Questions

Earnr

UK tax and finance app for the self-employed and their accountants

RAG + HYDE

Retrieval grounded in hypothetical answers, not raw ambiguous questions

Tool augmentation

Calculators as tools: structured outputs replace freehand model arithmetic

Modern language models can converse fluently, but trustworthy answers to tax questions require more than eloquence. This case study outlines a pragmatic, production-oriented approach to building a tax copilot that answers questions and performs calculations reliably. It is based on the approach we implemented for Earnr, a finance and tax app for the self-employed and their accountants in the UK. It focuses on the technology and the philosophy behind the system, not specific implementation details.

Why a Tax Copilot?

Tax rules are nuanced. People ask natural, messy questions. They combine multiple incomes, special allowances, edge cases, and what-ifs. A tax copilot should:

Architecture at a Glance

At a high level, the copilot either executes a tool (calculator) when it detects a computational task, or retrieves relevant context and explains the answer, optionally using a browsing LLM for live web-grounded responses.

User Question Rephrase Question Decide Path Hypothetical Answer (HyDE) Function Calling (Calculators) Vector Search Full-text Search Ensemble Rank Semantic search
(Google) Web LLM
(Perplexity) Context Builder LLM Generation Answer Streamed Web Needed? RETRIEVAL
Figure 1 โ€” Overall architecture: the copilot routes between tool execution and retrieval-augmented generation depending on intent.

Retrieval-Augmented Generation (RAG)

RAG reduces hallucinations by giving the model a curated context. Documents (help articles, tax guides, FAQs) are embedded into vectors and stored in a vector database. At question time, the most similar snippets are retrieved to build a context prompt, and the model is asked to answer using that context and cite references where possible.

Practical implementation choices:

Hypothetical Document Embeddings (HYDE)

HYDE improves retrieval by embedding not just the question but also a hypothetical answer for vector search. The question is rephrased to be standalone, a short hypothetical answer is generated, and documents similar to that hypothetical answer are retrieved.

Documents in "answer space" are often more coherent than documents matched to short, ambiguous questions that a typical customer is likely to ask. HYDE is a technique from research, implemented here using standard LLMs for the rephrase and answer steps.

Vector Database and Full-Text Search

Optional Web-Grounded Answers

Sometimes the best answer is on the open web, for example brand-new guidance relevant for the next tax year. Here we either call a browsing LLM that fetches sources and synthesises an answer, or run a lightweight search, fetch, and extract pipeline over a curated set of trusted domains.

This path is used selectively:

Tool-Augmented Generation (Calculators)

For tax, doing the math reliably is crucial. Relying on the model to calculate is risky. Instead, calculators are defined as tools (functions) with strict JSON schemas covering inputs, constraints, and descriptions. The model chooses and calls tools when the user asks for a computation, and returns structured outputs (band splits, tax subtotals, totals) which are then post-processed into human explanations.

Example tools (illustrative):

The model is instructed to explain results from tool outputs only, without resorting to freehand arithmetic. Inputs are validated against schemas, errors are handled gracefully, and clarifying questions are asked when required inputs are missing or ambiguous.

Orchestration Loop

  1. Rephrase the user's question and remove chat dependencies.
  2. If the question implies a calculation, select an appropriate tool and execute it.
  3. If the question seeks explanation, retrieve context using HYDE and hybrid search.
  4. If web freshness is necessary, consider a web-grounded path.
  5. Generate and stream the answer, including citations when available.
  6. If anything is missing or ambiguous, ask for clarification.

Observability, Guardrails, and Costs

Privacy and Security

End-to-End Flow

User Orchestrator Retrieval Vector + FTS Web LLM optional Tools Calculators Model Ask a tax question Rephrase + detect intent ALT [ Needs calculation ] Call calculator with JSON args > O : Structured result (dashed return) --> Structured result, bands, totals Format explanation from tool output [ Needs explanation ] HyDE retrieval, vector + FTS + ensemble > O : Top context snippets --> Top context snippets OPT [ Freshness needed ] Web-grounded answer request > O : Live citations + synthesis --> Live citations + synthesis Generate with context and citations > O : Stream tokens --> Stream tokens > U : Streamed answer + follow-ups --> Streamed answer + follow-ups User Orchestrator Retrieval Web LLM Tools Model
Figure 2 โ€” End-to-end sequence: the orchestrator routes between calculator tools and retrieval-augmented generation, streaming the final answer to the user.

Closing Thoughts

A credible tax copilot blends generative language with grounded knowledge and reliable tools. RAG reduces hallucination, HYDE improves retrieval, hybrid search increases recall, and calculators turn natural language into precise math. Add careful orchestration, observability, and principled guardrails, and you have an assistant that is helpful, trustworthy, and fast enough to feel responsive.

HYDE refers to Hypothetical Document Embeddings, a technique from research in which any modern LLM generates the hypothetical answer used for retrieval. A browsing LLM denotes an API that integrates search and page fetching during generation.