Can we just use Microsoft Copilot or Google’s version?

You can, and sometimes that is the right call. The trade-off is their terms, their data-residency choices, and their pricing roadmap. If compliance, confidentiality or cost-predictability is load-bearing for you, private deployment is the cleaner answer.

Back to VP Lab

Private AI · how it actually works

The shape of a real private AI engagement.

Q: What happens when a better model is released next month?

You swap it. That is a day’s work, not a migration. Hardware and integration stay the same — only the model artefact changes. Open weights mean you never chase a proprietary vendor’s roadmap.

No marketing fluff. This page walks through the architecture, the stack, the timeline and the honest cost of deploying the same class of AI you tried on VP Lab — except running entirely on your own hardware.

Most of the firms I deploy for are London SMBs in legal practice, accountancy, financial services, insurance or healthcare— businesses where a client document sitting in a public AI tool is a regulatory problem, not a productivity win. If that’s you, you’re in the right place.

01 · Architecture

Seven layers, your hardware, your network.

Each layer below sits entirely inside your infrastructure — there’s no point in the stack where data hops to a third-party API.

Your team / integrations

Staff access the system via a web UI, chat interface, or direct API calls from existing tools (Outlook, Slack, your CRM).

Access & identity

Zero-Trust access, SSO with your existing identity provider (Azure AD / Entra / Google Workspace / Okta). No inbound ports exposed to the internet.

Application layer

Next.js / Node / Python apps tailored to your workflows. Document Q&A, contract review, meeting-notes assistant — whichever VP Lab demos map to your day-to-day.

Your documents, searchable by the AI

Your contracts, policies and procedures get indexed into a searchable store that lives next to the model. When someone asks a question, the system pulls the relevant passages from your own material and hands them to the AI as context — so the answers are grounded in your business, not the internet.

The techOpen-source vector database (Qdrant or Postgres with pgvector); local embedding models — no external indexing APIs.

The AI model itself

Runs the same class of model behind ChatGPT-style quality, chosen to fit your hardware. When a better model lands next month, we swap it in a day — your hardware and integrations don’t change. No vendor lock-in, no proprietary roadmap to chase.

The techOpen-weight models such as Llama 3.3, Mistral, Qwen or DeepSeek, served locally and sized to your GPU.

Reproducible deployment

Every piece of the stack is packaged so it starts the same way every time. You can rebuild the whole system from scratch on new hardware in under a day if you ever need to. Version-pinned, easy rollback when an update misbehaves.

The techDocker + docker-compose for most deployments; only scaled-out orchestration where the size of the estate demands it.

Your hardware

Anywhere from a single GPU workstation in your server room to a rack of H100s. Spec is sized to the models you need, not picked from a catalogue.

02 · Timeline

4 to 8 weeks. Here’s where the time goes.

Weeks 1–2
Discovery & spec
Workshop your use-cases, pick the model(s) that fit, spec the hardware accordingly. Written proposal, fixed fee, no surprises.
Weeks 2–4
Infrastructure
Hardware install, OS & Docker, model runtime, security hardening, internal networking. Your team sees progress in your own environment.
Weeks 4–6
Integration
Ingest your documents into the RAG layer, wire the app to your SSO, connect to the tools your team actually uses (Outlook, SharePoint, Slack, CRM).
Weeks 6–8
Handover
Training workshops, runbooks, full documentation. You walk away able to run and swap models yourself — no lock-in.

Smaller deployments (a single team, one use-case) can land in 4 weeks. Larger ones (multi-department, complex integrations) lean toward 8+.

03 · Cost

Honest ballparks — not cat-and-mouse quoting.

Exact pricing depends on your model size, data volume, and integrations. These ranges cover the typical London SMB engagement.

Hardware (one-off)

£3,000 – £15,000

A single GPU workstation for a 7B–13B model runs ~£3–5k. A small server with an H100 for 70B–class models climbs to £10–15k. Bring your own hardware to drop this to zero.

Setup & integration (project)

£5,000 – £20,000

Simple document-Q&A on your files: lower end. Multi-team deployment with SSO, custom app, and ingestion pipelines: higher end. Fixed-fee; no hourly.

Ongoing retainer (optional)

£600 – £2,000 / month

Model updates, RAG re-ingestion, performance tuning, on-call. Optional — you can absolutely run it yourselves after handover.

Running cost (per month)

£0 in API fees

The point of private AI. No per-token billing, no usage caps, no surprise invoice if your team hammers the system. Electricity → your existing overheads.

05 · The usual questions

Straight answers to the things everyone asks.

Is this less capable than ChatGPT / Claude?+

For general-knowledge trivia, slightly. For your own documents, procedures and terminology — which is what you actually care about — no. An open-weight 70B model fine-tuned on your corpus out-performs ChatGPT out-of-the-box for internal tasks almost every time.

What happens when a better model is released next month?+

You swap it. That’s a day’s work, not a migration. Hardware and integration stay the same — only the model artefact changes. Open weights mean you never chase a proprietary vendor’s roadmap.

What about scaling if the team grows?+

Add more GPUs for parallel inference, or add nodes. The stack is built to horizontally scale from day one — you’re not re-architecting later.

We don’t have a server room. Can we still do this?+

Yes. A private AI can live in a colocation facility, a UK-based sovereign cloud, or a secure server room we source. The "private" bit is legal/data-control, not literally about your building.

Can we just use Microsoft Copilot / Google’s version?+

You can, and sometimes that’s the right call. The trade-off: their terms, their data-residency choices, their pricing roadmap. If compliance, confidentiality, or cost-predictability is load-bearing for you, private deployment is the cleaner answer.

Ready to scope yours?

Book a free 20-minute call. Bring a rough idea of your use-case; walk away with a sensible spec, a budget range, and a realistic timeline — whether or not you end up working with me.

Book a free scoping call Try the demos first