A single binary turns your Apple Silicon Macs, NVIDIA workstations, and spare hardware into a shared LLM cluster. Zero config, zero cloud, zero data leaks.
$parley status parley mesh: ok node: macmini@192.168.1.63 peers: 2 macmini@192.168.1.63 [ok] inflight=0 * mistral:7b - deepseek-r1:8b workstation@192.168.1.51 [ok] inflight=1 * gemma3:12b - nemotron-3-nano:4b ubuntu@192.168.1.42 [ok] inflight=0 * qwen3-coder:30b - llama3.1:8b license: free tier nodes: 3/3 concurrent: 2 aliases: parley:code best available code model parley:fast smallest loaded model parley:best largest / most capable model parley:reason best available reasoning model
Real output. Ships a web UI for mesh status and chat — any OpenAI-compatible frontend works too.
Why Parley
Parley connects them into a single inference pool. Every model on every node is reachable from every desk. No orchestration to maintain, no containers to deploy.
parley:code
or parley:best.
The mesh resolves to whatever is loaded and capable, so you don't break when a model is unloaded.parley launch codex.
Their traffic stays on your network.Use cases
The shapes where Parley fits cleanly — offices with mixed compute, regulated environments, and any team that wants to stop maintaining N copies of the same model.
Comparison
Other tools solve single-machine inference or cloud API routing. Parley is the missing layer: multi-machine pooling for teams that own their hardware.
| Parley | Ollama | LM Studio | Hosted APIs | |
|---|---|---|---|---|
| Multi-machine mesh | ✓ | — | — | — |
| Auto-discovery | ✓ | — | — | — |
| Fair scheduling | ✓ | — | — | varies |
| Peer model transfer | ✓ | — | — | — |
| OpenAI API compatible | ✓ | ✓ | ✓ | ✓ |
| Built-in web UI | ✓ | — | ✓ | varies |
| TLS encryption | ✓ | — | — | ✓ |
| Data stays local | ✓ | ✓ | ✓ | — |
| Team licensing | ✓ | — | business | ✓ |
FAQ
OPENAI_BASE_URL to any Parley node and your OpenAI SDK
code routes to the cluster. Ollama API clients work too — set OLLAMA_HOST. Coding tools
like Cursor, codex, and opencode can be connected with parley launch <tool>.parley import --from-ollama
reuses models already on disk so you don't re-download.orchestrate=false.
Get started
Install on each machine that has a GPU or fast CPU. They find each other automatically.*
* Nodes must be on the same LAN or subnet. No internet required.