Coming soon

Every machine,
one mesh.

A single binary turns your Apple Silicon Macs, NVIDIA workstations, and spare hardware into a shared LLM cluster. Zero config, zero cloud, zero data leaks.

Get started →

macOS  ·  Linux  ·  Windows

Live mesh
macmini@192.168.1.63
mistral:7b
deepseek-r1:8b
workstation@192.168.1.51
gemma3:12b
nemotron:4b
ubuntu@192.168.1.42
qwen3-coder:30b
llama3.1:8b
Request flow
macmini@192.168.1.63 — parley status
$parley status
parley mesh: ok
  node:  macmini@192.168.1.63
  peers: 2

  macmini@192.168.1.63     [ok] inflight=0
    * mistral:7b
    - deepseek-r1:8b
  workstation@192.168.1.51 [ok] inflight=1
    * gemma3:12b
    - nemotron-3-nano:4b
  ubuntu@192.168.1.42      [ok] inflight=0
    * qwen3-coder:30b
    - llama3.1:8b

license: free tier
  nodes:      3/3
  concurrent: 2

aliases:
  parley:code   best available code model
  parley:fast   smallest loaded model
  parley:best   largest / most capable model
  parley:reason best available reasoning model

Real output. Ships a web UI for mesh status and chat — any OpenAI-compatible frontend works too.

0
nodes free forever
0
config files needed
<1ms
mesh discovery
0
data stays local

Why Parley

Your machines already
have the hardware.

Parley connects them into a single inference pool. Every model on every node is reachable from every desk. No orchestration to maintain, no containers to deploy.

01 — Mesh
One pool, not N machines
Models loaded on any node are usable from every node. Peer-to-peer model transfer means one download serves the whole team.
02 — Routing
Smart routing
Requests go to the node that already has the model in memory. If two nodes have it, the one with the shortest queue wins.
03 — Engine
Spindll, batteries included
Ships its own llama.cpp-based engine with GGUF, HuggingFace pulls, Metal on Apple Silicon (MLX backend on M-series), CUDA / Vulkan on NVIDIA. Nothing else to install.
04 — API
Drop-in compatible
OpenAI and Ollama API compatible. Set one env var and your existing tools, scripts, and IDE integrations just work.
05 — Scheduling
Fair scheduling
Per-user request queuing. No single script or developer can monopolize the cluster. The user with the fewest in-flight requests goes next.
06 — Security
Encrypted by default
TLS between every node. Per-machine certificates generated on first boot. Identity-pinned peers — changed keys trigger warnings, not silent compromise.
07 — Aliases
Address capability, not models
Call parley:code or parley:best. The mesh resolves to whatever is loaded and capable, so you don't break when a model is unloaded.
08 — Tools
Drops into your IDE
One command points Cursor, codex, opencode and friends at the mesh: parley launch codex. Their traffic stays on your network.
parley:code
Best available code model
parley:fast
Smallest loaded model
parley:best
Largest / most capable
parley:reason
Best reasoning model

Use cases

Built for teams that
already own the hardware.

The shapes where Parley fits cleanly — offices with mixed compute, regulated environments, and any team that wants to stop maintaining N copies of the same model.

Engineering teams
Point IDEs at one mesh
The Mac Studio in the corner hosts a 70B coding model on its unified memory; every laptop on the floor uses it without copying weights. New hires install one binary and they're productive.
Privacy-sensitive R&D
Inference inside the perimeter
Customer code, contracts, clinical notes, and internal docs never leave the network. The same models you'd hit via a hosted API run on machines you own — compliance becomes a question of physical hardware.
Mixed-hardware shops
Use what you already have
Mac Minis, NVIDIA workstations, and the old laptop in someone's drawer all join one cluster. Metal where it helps, CUDA where it helps, CPU for small models. The mesh routes to whichever has capacity.
Air-gapped environments
One download, peer transfers after
Once a single node has a model, every other node pulls it peer-to-peer across the LAN. No outbound connections required — works for defence, finance, and shop floors with no internet egress.

Comparison

How Parley fits in.

Other tools solve single-machine inference or cloud API routing. Parley is the missing layer: multi-machine pooling for teams that own their hardware.

Parley Ollama LM Studio Hosted APIs
Multi-machine mesh
Auto-discovery
Fair scheduling varies
Peer model transfer
OpenAI API compatible
Built-in web UI varies
TLS encryption
Data stays local
Team licensing business

FAQ

Common questions.

Ollama runs on one machine. Parley pools many machines into one cluster with smart routing, fair scheduling, an API gateway, a web UI, and team-grade trust and licensing. If Ollama is already installed on a node, Parley can use it as a backend; otherwise, Parley ships its own inference engine.
LM Studio is a polished single-machine desktop app. Parley is for teams — it pools multiple machines into one shared cluster with fair scheduling, automatic discovery, and encrypted inter-node communication. The moment a second machine enters the picture, you need Parley.
Any modern machine. Apple Silicon Macs (M1 through M4 Ultra) are first-class — Metal-accelerated with unified memory that lets you run larger models than discrete GPUs. NVIDIA workstations with CUDA work great. CPU-only machines work for smaller models. Mix freely.
Yes. Set OPENAI_BASE_URL to any Parley node and your OpenAI SDK code routes to the cluster. Ollama API clients work too — set OLLAMA_HOST. Coding tools like Cursor, codex, and opencode can be connected with parley launch <tool>.
Your data never leaves your network. Parley doesn't phone home. No telemetry, no usage reporting, no outbound connections except model downloads. All inference happens on machines you own. For air-gapped environments, peer-to-peer model transfer means you can operate with zero internet access after the initial download.
LAN latency is negligible compared to inference time. A 7B response that takes 4 seconds to generate doesn't notice a 1ms hop. The mesh prefers nodes that already have the model loaded, so cross-machine routing only happens when there's a clear capacity reason.
No. Parley detects existing Ollama installs and can use them as a backend; otherwise Parley's embedded Spindll engine handles inference. parley import --from-ollama reuses models already on disk so you don't re-download.
A complex request can be split across multiple models — code work routed to a code model, reasoning to a reasoning model — and the results merged before streaming back. The mesh handles coordination so the caller sees a single response. Opt out per-request with orchestrate=false.

One binary.
Every machine.

Install on each machine that has a GPU or fast CPU. They find each other automatically.*

macOS DMG
Apple Silicon (M1–M4 Ultra) and Intel. Drag to Applications — Parley appears in your menu bar. Signed and notarized.
Linux Binary
Single static binary for x86_64. Run parley serve, join the mesh.
Windows NSIS
CUDA and Vulkan GPU builds with per-architecture artifacts. System tray launcher matching the macOS menu bar app.

* Nodes must be on the same LAN or subnet. No internet required.

Or from the command line
# start the mesh (web UI + inference + gateway)
parley serve

# pull a model on any node (others get it peer-to-peer)
parley pull qwen3-coder:30b

# point your existing OpenAI SDK at the mesh
export OPENAI_BASE_URL=http://localhost/v1

# or drop your IDE in:
parley launch codex --model parley:code

# see who's online
parley status