macmini@192.168.1.63 — parley status

$parley status
parley mesh: ok
  node:  macmini@192.168.1.63
  peers: 2

  macmini@192.168.1.63     [ok] inflight=0
    * mistral:7b
    - deepseek-r1:8b
  workstation@192.168.1.51 [ok] inflight=1
    * gemma3:12b
    - nemotron-3-nano:4b
  ubuntu@192.168.1.42      [ok] inflight=0
    * qwen3-coder:30b
    - llama3.1:8b

license: free tier
  nodes:      3/3
  concurrent: 2

aliases:
  parley:code   best available code model
  parley:fast   smallest loaded model
  parley:best   largest / most capable model
  parley:reason best available reasoning model

Real output. Ships a web UI for mesh status and chat — any OpenAI-compatible frontend works too.

Why Parley

Your machines already
have the hardware.

Big Tech is burning through $700 billion a year on AI infrastructure, and free cash flow is collapsing under the weight. Parley connects the machines you already own into a single inference pool — every model on every node, reachable from every desk. No orchestration to maintain, no cloud bill to watch climb.

01 — Mesh

One pool, not N machines

Models loaded on any node are usable from every node. Peer-to-peer model transfer means one download serves the whole team.

02 — Routing

Smart routing

Requests go to the node that already has the model in memory. If two nodes have it, the one with the shortest queue wins.

03 — Engine

Spindll, batteries included

Ships its own Rust inference engine. GGUF via llama.cpp matches Ollama tok/s on every OS with 10–15% faster time-to-first-token. On Apple Silicon, the native MLX backend pushes 20%+ higher throughput — a speed tier Ollama can't reach. CUDA and Vulkan on NVIDIA. Nothing else to install.

04 — API

Drop-in compatible

OpenAI and Ollama API compatible. Set one env var and your existing tools, scripts, and IDE integrations just work.

05 — Scheduling

Fair scheduling

Per-user request queuing. No single script or developer can monopolize the cluster. The user with the fewest in-flight requests goes next.

06 — Security

Encrypted by default

TLS between every node. Per-machine certificates generated on first boot. Identity-pinned peers — changed keys trigger warnings, not silent compromise.

07 — Aliases

Address capability, not models

Call parley:code or parley:best. The mesh resolves to whatever is loaded and capable, so you don't break when a model is unloaded.

08 — Tools

Drops into your IDE

One command points Cursor, codex, opencode and friends at the mesh: parley launch codex. Their traffic stays on your network.

parley:code

Best available code model

parley:fast

Smallest loaded model

parley:best

Largest / most capable

parley:reason

Best reasoning model

Use cases

Built for teams that
already own the hardware.

The shapes where Parley fits cleanly — offices with mixed compute, regulated environments, and any team tired of watching cloud AI bills grow faster than revenue.

Engineering teams

Point IDEs at one mesh

The Mac Studio in the corner hosts a 70B coding model on its unified memory; every laptop on the floor uses it without copying weights. New hires install one binary and they're productive.

Privacy-sensitive R&D

Inference inside the perimeter

Customer code, contracts, clinical notes, and internal docs never leave the network. The same models you'd hit via a hosted API run on machines you own — compliance becomes a question of physical hardware.

Mixed-hardware shops

Use what you already have

Mac Minis, NVIDIA workstations, and the old laptop in someone's drawer all join one cluster. Metal where it helps, CUDA where it helps, CPU for small models. The mesh routes to whichever has capacity.

Air-gapped environments

One download, peer transfers after

Once a single node has a model, every other node pulls it peer-to-peer across the LAN. No outbound connections required — works for defence, finance, and shop floors with no internet egress.

Cost control

Inference costs that don't scale with usage

Inference now accounts for 55% of AI infrastructure spending and runs 15–20× training costs over time. With Parley, your inference runs on hardware you already paid for — usage goes up, the bill stays flat.

AI ROI

Ship AI without the CFO conversation

Cloud AI spending is growing 60–80% year-over-year while revenue grows 16%. Parley eliminates the largest line item by turning existing machines into the platform — no per-request pricing, no surprise invoices.

Comparison

How Parley fits in.

Cloud API costs scale with every request. Single-machine tools cap out at one GPU. Parley is the missing layer: multi-machine pooling that turns hardware you already own into a shared inference platform — flat cost, no ceiling.

	Parley	Ollama	LM Studio	Hosted APIs
Multi-machine mesh	✓	—	—	—
Auto-discovery	✓	—	—	—
Fair scheduling	✓	—	—	varies
Peer model transfer	✓	—	—	—
OpenAI API compatible	✓	✓	✓	✓
Built-in web UI	✓	—	✓	varies
TLS encryption	✓	—	—	✓
Data stays local	✓	✓	✓	—
Team licensing	✓	—	business	✓

Pricing

Simple, transparent pricing.

Free

Forever

Up to 3 nodes in your mesh
2 concurrent requests
Local LLM inference
Automatic model discovery
Community support

Get started

Custom

for your scale

Custom nodes
Higher concurrency limits
Priority support
Custom SLAs

FAQ

Common questions.

Ollama runs on one machine. Parley pools many machines into one cluster with smart routing, fair scheduling, an API gateway, a web UI, and team-grade trust and licensing. Parley ships its own inference engine — it doesn't use Ollama as a backend.

LM Studio is a polished single-machine desktop app. Parley is for teams — it pools multiple machines into one shared cluster with fair scheduling, automatic discovery, and encrypted inter-node communication. The moment a second machine enters the picture, you need Parley.

Any modern machine. Apple Silicon Macs (M1 through M4 Ultra) are first-class — Metal-accelerated with unified memory that lets you run larger models than discrete GPUs. NVIDIA workstations with CUDA work great. CPU-only machines work for smaller models. Mix freely.

Yes. Set OPENAI_BASE_URL to any Parley node and your OpenAI SDK code routes to the cluster. Ollama API clients work too — set OLLAMA_HOST. Coding tools like Cursor, codex, and opencode can be connected with parley launch <tool>.

Your data never leaves your network. Parley doesn't phone home. No telemetry, no usage reporting, no outbound connections except model downloads. All inference happens on machines you own. For air-gapped environments, peer-to-peer model transfer means you can operate with zero internet access after the initial download.

LAN latency is negligible compared to inference time. A 7B response that takes 4 seconds to generate doesn't notice a 1ms hop. The mesh prefers nodes that already have the model loaded, so cross-machine routing only happens when there's a clear capacity reason.

No. Parley uses its own inference engine, Spindll, and doesn't depend on Ollama at all. parley import --from-ollama reuses models already on disk so you don't re-download.

Get started

One binary.
Every machine.

Install on each machine that has a GPU or fast CPU. They find each other automatically.*

macOS DMG

Apple Silicon (M1–M4 Ultra) with native Metal + MLX. Drag to Applications — Parley appears in your menu bar.

Download — Metal + MLX

            CPU-only — Intel
            Lite — Apple Silicon
            Lite — Intel
          

Windows NSIS

One installer for every NVIDIA GPU — GTX 10 series through RTX 50 series. System tray launcher matching the macOS menu bar app.

Download — CUDA

            CPU-only
            Lite
          

Linux AppImage

One AppImage for every NVIDIA GPU — GTX 10 series through RTX 50 series. chmod +x and run; nothing else to install.

Download — CUDA

            CPU-only — x86_64
            Lite — x86_64
            Lite — arm64
          

GPU (CUDA / Metal + MLX) — full local inference, hardware-accelerated. CPU-only — full local inference on the processor, for Xeon/EPYC-class servers or machines without a supported GPU. Lite — tiny client with no local inference: it joins the mesh and routes every request to your other machines — perfect for old laptops, Raspberry Pi, and Intel Macs.

* Nodes must be on the same LAN or subnet. No internet required.

Or from the command line

            # start the mesh (web UI + inference + gateway)
            
            parley serve
            
            # pull a model on any node (others get it peer-to-peer)
            
            parley pull qwen3-coder:30b
            
            # point your existing OpenAI SDK at the mesh
            
            export OPENAI_BASE_URL=http://localhost/v1
            
            # or drop your IDE in:
            
            parley launch codex --model parley:code
            
            # see who's online
            
            parley status

Every machine,one mesh.

Your machines alreadyhave the hardware.

Built for teams thatalready own the hardware.

How Parley fits in.

Simple, transparent pricing.

Common questions.

One binary.Every machine.

Get in touch

Every machine,
one mesh.

Your machines already
have the hardware.

Built for teams that
already own the hardware.

One binary.
Every machine.