Help & Troubleshooting
Common Issues
"No model selected" error
Click the gear icon and download a model (e.g., llama3.1:8b). Takes 1-5 minutes depending on model size.
Mesh shows "local mode" instead of "connected"
- Are other nodes running on the same network?
- Check firewall allows port 45892 (mDNS)
- Ensure all nodes use the same cookie: check
~/.parley.yaml
- Try restarting nodes:
pkill parley then parley serve
Response is very slow
- Model may still be loading (first inference loads model into memory)
- Check node status in graph view
- Run smaller model: use
parley:fast alias
- If using remote node, check network latency
High memory usage
- Each loaded model consumes memory (7B model ≈ 4-6GB, 13B ≈ 8-10GB)
- Unload unused models via the models panel (gear icon)
- Check
/api/ps to see what's loaded
- Use smaller models or quantized versions
"Connection refused" when accessing from another machine
- By default, Parley binds to localhost only
- To access from network, it listens on your machine's IP (shown on startup)
- Check firewall allows port 80 or 3000
- Try direct IP:
http://192.168.x.x:3000
Model download fails or is stuck
- Check internet connection and GitHub/HuggingFace access
- Cancel and retry: click the "cancel" button in the models panel
- Try a smaller model first to test download
- Check disk space (models can be 5-50GB each)
Performance Tips
- GPU acceleration — Use CUDA (NVIDIA) or Metal (Apple Silicon) for faster inference
- Model quantization — Q4 models are 3-4x faster than FP16, minimal quality loss
- Keep models local — Avoid repeated downloads by using
parley import --from-ollama
- Monitor loaded models — Mesh graphs show which models are on which nodes
Getting Help
- Check documentation — See the docs pages above
- View logs —
~/.parley/logs/
- GitHub Issues — Report bugs at github.com/Iito/parley
- Status endpoint —
curl http://localhost:4000/health
Keyboard Shortcuts
Enter — Send message
Shift+Enter — Newline in message
Cmd/Ctrl+C — Cancel streaming response