General
Install Meta-Llama-3.1–8B-Instruct locally on your Macbook
Apr 9, 2025
There are multiple ways to install Meta-Llama-3.1–8B-Instruct. In this article, we’ll focus on installing the model via Hugging Face.
I’ll be using the Llama-3.1–8B-Instruct model instead of the base Llama-3.1–8B, as my use case is more conversational. However, from an installation standpoint, the process is the same for both models.
Hugging Face
Assuming you already have a Hugging Face account and access to the gated repository for the Meta-Llama-3.1–8B-Instruct model is approved — great! If not, create an account and request access here.
Setting up the Mac
Install Xcode Command Line Tools (if not already installed)
These tools are essential for compiling packages from source.
Install Homebrew and Required Tools (if not already installed)
Homebrew is the package manager for macOS (and Linux) that makes it easy to install and manage software from the command line.
Then restart your terminal or run:
Once Homebrew is set up, install the following:
coreutils — will provide nproc and other GNU tools
pyenv — is a Python version manager — it lets you easily install, switch between, and manage multiple Python versions.
Install python and the needed libraries (if you don’t have it)
I am using the pyenv to install the python, you can also do it via homebrew or official installer (from python.org)
I am using python version 3.11.8 because the sentencepiece package is currently compatible with versions lower than 3.13
This should install python 3.11.8, you can verify the version:
Install the required Python packages:
Download model programatically
Create a python file named install-llama-3.1–8b.py file with following code:
Log in to your Hugging Face account and generate an access token here with user and repository read permissions.
Run the script:
Upon successful execution, the script will:
Download the model from hugging face repository into local cache (/Users/<username>/.cache). Next run onwards the model will be loaded from the local cache.
Send a prompt to the model and display the response
Conclusion
In this guide, you’ve learned how to set up and run the Meta-LLaMA 3.1 8B Instruct model locally on a macOS machine using Hugging Face Transformers, PyTorch. Running LLMs locally gives you more control, privacy, and customisation power.
If you’ve followed the steps successfully, you should now be able to:
Load and run LLaMA 3.1 using a simple Python script
Handle large models efficiently with quantization
Generate text responses using instruct-tuned prompts
Next Steps
Build a chatbot or command-line assistant using this model
Explore prompt engineering to optimize results
Experiment with multi-turn conversations