Manual API Reference Help & Troubleshooting

Manual

Getting Started

Parley is a distributed LLM inference mesh. Run the same binary on multiple machines to create a cluster.

Start a node

parley serve

Opens web UI at http://localhost (or http://localhost:3000)

Multi-node mesh

Just run parley serve on each machine. They auto-discover via mDNS.

Model Management

Pull a model

parley pull llama3.1:8b
parley pull TheBloke/Mistral-7B-Instruct-v0.2-GGUF --quant Q4_K_M

List models

parley list

Delete a model

parley rm llama3.1:8b

Configuration

Edit ~/.parley.yaml:

default_model: llama3.1:8b
mesh_port: 4000

aliases:
  parley:code: [qwen2.5-coder:32b]
  parley:fast: [qwen2.5:0.5b]

Web UI

Routing & Fair Scheduling

Mesh Mode vs Local Mode

Mesh mode (default): Uses cluster if available, falls back to local

Local mode: Uses only this machine (no network overhead)

Keyboard Shortcuts

Security