Manual API Reference Help & Troubleshooting

Help & Troubleshooting

Common Issues

"No model selected" error

Click the gear icon and download a model (e.g., llama3.1:8b). Takes 1-5 minutes depending on model size.

Mesh shows "local mode" instead of "connected"

Are other nodes running on the same network?
Check firewall allows port 45892 (mDNS)
Ensure all nodes use the same cookie: check ~/.parley.yaml
Try restarting nodes: pkill parley then parley serve

Response is very slow

Model may still be loading (first inference loads model into memory)
Check node status in graph view
Run smaller model: use parley:fast alias
If using remote node, check network latency

High memory usage

Each loaded model consumes memory (7B model ≈ 4-6GB, 13B ≈ 8-10GB)
Unload unused models via the models panel (gear icon)
Check /api/ps to see what's loaded
Use smaller models or quantized versions

"Connection refused" when accessing from another machine

By default, Parley binds to localhost only
To access from network, it listens on your machine's IP (shown on startup)
Check firewall allows port 80 or 3000
Try direct IP: http://192.168.x.x:3000

Model download fails or is stuck

Check internet connection and GitHub/HuggingFace access
Cancel and retry: click the "cancel" button in the models panel
Try a smaller model first to test download
Check disk space (models can be 5-50GB each)

Performance Tips

GPU acceleration — Use CUDA (NVIDIA) or Metal (Apple Silicon) for faster inference
Model quantization — Q4 models are 3-4x faster than FP16, minimal quality loss
Keep models local — Avoid repeated downloads by using parley import --from-ollama
Monitor loaded models — Mesh graphs show which models are on which nodes

Getting Help

Check documentation — See the docs pages above
View logs — ~/.parley/logs/
GitHub Issues — Report bugs at github.com/Iito/parley
Status endpoint — curl http://localhost:4000/health

Keyboard Shortcuts

Enter — Send message
Shift+Enter — Newline in message
Cmd/Ctrl+C — Cancel streaming response