How I Fixed the “No module named 'llama_cpp_binaries'” Error in text-generation-webui on macOS (Apple Silicon)

If you’re using text-generation-webui on a Mac with Apple Silicon (M1, M2, or M3), and you’re trying to run a GGUF model via the llama.cpp loader, chances are you’ve hit this error:

ModuleNotFoundError: No module named 'llama_cpp_binaries'

This bug blocked me for hours—and I’m writing this so you don’t lose your mind too.

🧠 What’s Going On?

Here’s what’s really behind the issue:

The llama_cpp_binaries module is only built for Windows and Linux.
On macOS, you’re supposed to use llama-cpp-python, which provides native bindings for Apple Silicon.
Unfortunately, some builds of text-generation-webui still try to route llama.cpp through a loader that depends on llama_cpp_binaries. That’s where things break.

✅ The Fix (Step-by-Step)

The solution is to remap the loader so it uses llama-cpp-python directly, bypassing the incompatible server-based function.

1. Open `modules/models.py`

Find this block:

load_func_map = {
    'llama.cpp': llama_cpp_server_loader,
    # ...
}

2. Replace It

Change it to:

load_func_map = {
    'llama.cpp': llama_cpp_loader,  # Point to a new local loader
    # ...
}

3. Add the New Loader Function

Paste this somewhere in the same file (outside any class or existing function):

def llama_cpp_loader(model_name):
    try:
        from llama_cpp import Llama
    except ImportError:
        raise ImportError("llama-cpp-python is not installed. Please install it with 'pip install llama-cpp-python'.")

    from pathlib import Path
    import modules.shared as shared
    from modules.logging_colors import logger

    path = Path(f'{shared.args.model_dir}/{model_name}')
    if path.is_file():
        model_file = path
    else:
        model_file = sorted(Path(f'{shared.args.model_dir}/{model_name}').glob('*.gguf'))[0]

    logger.info(f"llama.cpp weights detected: '{model_file}' (using llama-cpp-python)")
    try:
        model = Llama(model_path=str(model_file), n_ctx=shared.args.ctx_size, n_threads=shared.args.threads)
        return model, model  # Return model as both model and tokenizer
    except Exception as e:
        logger.error(f"Error loading the model with llama-cpp-python: {str(e)}")
        return None, None

This loader directly uses llama-cpp-python, which works natively on macOS.

4. Restart the Web UI

Once you’ve saved everything:

python3 server.py

Then reload the model in the UI like you normally would.

💡 Why This Works

You’re using the correct backend for macOS (llama-cpp-python).
You’re bypassing the default llama_cpp_server_loader, which expects llama_cpp_binaries (a Linux/Windows-only module).
The fix is native, clean, and future-proof as long as the bindings stay compatible.

💜 Final Checks

Make sure llama-cpp-python is installed and up to date:

pip install --upgrade llama-cpp-python

Still stuck? Double-check:

The model path is correct.
You’re using .gguf models.
Your n_ctx and n_threads values in shared.args make sense for your machine.

🎉 Wrapping Up

This workaround finally got my GGUF model up and running on Apple Silicon using text-generation-webui. Hopefully it saves you time too.

If it helped, feel free to share it or drop a note on GitHub discussions where others might be blocked.

Happy prompting, Mac warriors.
– Walter

Comments

One response to “How I Fixed the “No module named ‘llama_cpp_binaries’” Error in text-generation-webui on macOS (Apple Silicon)”

💻 Local LLM Limitations: Why They’re Not Ready for Real Logic – Walter Clayton Blog

May 23, 2025

[…] I ran all models using llama-cpp-python with GPU acceleration and quantized GGUF files to save memory. If you’re on macOS and hit the annoying No module named ‘llama_cpp_binaries’ error, I wrote a quick fix here:👉 How I Fixed the “No module named ‘llama_cpp_binaries’” Error in text-generation-webui o… […]

How I Fixed the “No module named ‘llama_cpp_binaries’” Error in text-generation-webui on macOS (Apple Silicon)

🧠 What’s Going On?

✅ The Fix (Step-by-Step)

1. Open `modules/models.py`

2. Replace It

3. Add the New Loader Function

4. Restart the Web UI

💡 Why This Works

💜 Final Checks

🎉 Wrapping Up

For deep thinkers, creators, and curious minds. One post. Zero noise.

Comments

One response to “How I Fixed the “No module named ‘llama_cpp_binaries’” Error in text-generation-webui on macOS (Apple Silicon)”

Leave a Reply Cancel reply

Follow

How I Fixed the “No module named ‘llama_cpp_binaries’” Error in text-generation-webui on macOS (Apple Silicon)

🧠 What’s Going On?

✅ The Fix (Step-by-Step)

1. Open modules/models.py

2. Replace It

3. Add the New Loader Function

4. Restart the Web UI

💡 Why This Works

💜 Final Checks

🎉 Wrapping Up

For deep thinkers, creators, and curious minds. One post. Zero noise.

Comments

One response to “How I Fixed the “No module named ‘llama_cpp_binaries’” Error in text-generation-webui on macOS (Apple Silicon)”

Leave a Reply Cancel reply

1. Open `modules/models.py`