Manual API Reference Help & Troubleshooting

Manual

Getting Started

Parley is a distributed LLM inference mesh. Run the same binary on multiple machines to create a cluster.

Start a node

parley serve

Opens web UI at http://localhost (or http://localhost:3000)

Multi-node mesh

Just run parley serve on each machine. They auto-discover via mDNS.

Model Management

Pull a model

parley pull llama3.1:8b
parley pull TheBloke/Mistral-7B-Instruct-v0.2-GGUF --quant Q4_K_M

List models

parley list

Delete a model

parley rm llama3.1:8b

Configuration

Edit ~/.parley.yaml:

default_model: llama3.1:8b
mesh_port: 4000

aliases:
  parley:code: [qwen2.5-coder:32b]
  parley:fast: [qwen2.5:0.5b]

Web UI

Graph tab — Visualize mesh topology and models
Chat tab — Send prompts and see responses
Model selector — Choose which model to use
Gear icon — Download/manage models
Status indicator — Shows mesh/local mode

Routing & Fair Scheduling

Requests route to the node with the model loaded in memory
Per-user fairness: multiple users won't starve each other

Mesh Mode vs Local Mode

Mesh mode (default): Uses cluster if available, falls back to local

Local mode: Uses only this machine (no network overhead)

Keyboard Shortcuts

Enter — Send message
Shift+Enter — New line
Cmd/Ctrl+C — Cancel streaming response

Security

Mesh communication encrypted by default
Self-signed certificates generated automatically
All data stays on your network — nothing leaves your infrastructure