Parley is a distributed LLM inference mesh. Run the same binary on multiple machines to create a cluster.
parley serve
Opens web UI at http://localhost (or http://localhost:3000)
Just run parley serve on each machine. They auto-discover via mDNS.
parley pull llama3.1:8b parley pull TheBloke/Mistral-7B-Instruct-v0.2-GGUF --quant Q4_K_M
parley list
parley rm llama3.1:8b
Edit ~/.parley.yaml:
default_model: llama3.1:8b mesh_port: 4000 aliases: parley:code: [qwen2.5-coder:32b] parley:fast: [qwen2.5:0.5b]
Mesh mode (default): Uses cluster if available, falls back to local
Local mode: Uses only this machine (no network overhead)
Enter — Send messageShift+Enter — New lineCmd/Ctrl+C — Cancel streaming response