Mac 上的本地 LLM

本文档的完整中文翻译正在进行中。

概述

在 Mac 上设置本地 LLM。

快速开始

bash

hermes help
hermes config
hermes skills

获取帮助

如需帮助，请运行 hermes doctor 或访问 GitHub Issues。

原文档内容：

Run Local LLMs on Mac

This guide walks you through running a local LLM server on macOS with an OpenAI-compatible API. You get full privacy, zero API costs, and surprisingly good performance on Apple Silicon.

We cover two backends:

Backend	Install	Best at	Format
llama.cpp	`brew install llama.cpp`	Fastest time-to-first-token, quantized KV cache for low memory	GGUF
omlx	omlx.ai	Fastest token generation, native Metal optimization	MLX (safetensors)

Both expose an OpenAI-compatible /v1/chat/completions endpoint. Hermes works with either one — just point it at http://localhost:8080 or http://localhost:8000.

Apple Silicon only

This guide targets Macs with Apple Silicon (M1 and later). Intel Macs will work with llama.cpp but without GPU acceleration — expect significantly slower performance.

Choosing a model

For getting started, we recommend Qwen3.5-9B — it's a strong reasoning model that fits comfortably in 8GB+ of unified memory with quantization.

Variant	Size on disk	RAM needed (128K context)	Backend
Qwen3.5-9B-Q4_K_M (GGUF)	5.3 GB	~10 GB with quantized KV cache	llama.cpp
Qwen3.5-9B-mlx-lm-mxfp4 (MLX)	~5 GB	~12 GB	omlx

Memory rule of thumb: model size + KV cache. A 9B Q4 model is ~5 GB. The KV cache at 128K context with Q4 quantization adds ~4-5 GB. With default (f16) KV cache, that balloons to ~16 GB. The quantized KV cache flags in llama.cpp are the key trick for memory-constrained systems.

For larger models (27B, 35B), you'll need 32 GB+ of unified memory. The 9B is the sweet spot for 8-16 GB machines.

Option A: llama.cpp

llama.cpp is the most portable local LLM runtime. On macOS it uses Metal for GPU acceleration out of the box.

Install

bash

brew install llama.cpp

This gives you the llama-server command glo...

[完整翻译即将推出]

Mac 上的本地 LLM ​

概述 ​

快速开始 ​

相关链接 ​

获取帮助 ​

Run Local LLMs on Mac ​

Choosing a model ​

Option A: llama.cpp ​

Install ​