Services

Insights

Company

Careers

Services

Insights

Company

Careers

General

Install Meta-Llama-3.1–8B-Instruct locally on your Macbook

Apr 9, 2025

Starting and Growing a Career in Web Design
Starting and Growing a Career in Web Design
Starting and Growing a Career in Web Design

There are multiple ways to install Meta-Llama-3.1–8B-Instruct. In this article, we’ll focus on installing the model via Hugging Face.

I’ll be using the Llama-3.1–8B-Instruct model instead of the base Llama-3.1–8B, as my use case is more conversational. However, from an installation standpoint, the process is the same for both models.

Hugging Face

Assuming you already have a Hugging Face account and access to the gated repository for the Meta-Llama-3.1–8B-Instruct model is approved — great! If not, create an account and request access here.

Setting up the Mac

Install Xcode Command Line Tools (if not already installed)

These tools are essential for compiling packages from source.

xcode-select --install
Install Homebrew and Required Tools (if not already installed)

Homebrew is the package manager for macOS (and Linux) that makes it easy to install and manage software from the command line.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then restart your terminal or run:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

Once Homebrew is set up, install the following:

brew install pkg-config coreutils cmake pyenv xz

coreutils — will provide nproc and other GNU tools
pyenv — is a Python version manager — it lets you easily install, switch between, and manage multiple Python versions.

Install python and the needed libraries (if you don’t have it)

I am using the pyenv to install the python, you can also do it via homebrew or official installer (from python.org)

pyenv install 3.11.8

I am using python version 3.11.8 because the sentencepiece package is currently compatible with versions lower than 3.13

This should install python 3.11.8, you can verify the version:

python --version

Install the required Python packages:

pip install --upgrade pip setuptools wheel
pip install sentencepiece accelerate torch torchvision torchaudio transformers


Download model programatically

Create a python file named install-llama-3.1–8b.py file with following code:

from huggingface_hub import login
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Login to Hugging Face
access_token_read = "<your hugging face access token>"
login(token=access_token_read)

# Model ID
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

# Load model (simpler version, no quantization)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16  # Use bfloat16 or float16 if supported
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Create text generation pipeline
text_gen = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    pad_token_id=tokenizer.eos_token_id
)

# Test the pipeline
response = text_gen("what is the capital of France", max_new_tokens=100)
print(response[0]['generated_text'])

Log in to your Hugging Face account and generate an access token here with user and repository read permissions.

Run the script:

python install-llama-3.1-8b.py

Upon successful execution, the script will:

  • Download the model from hugging face repository into local cache (/Users/<username>/.cache). Next run onwards the model will be loaded from the local cache.

  • Send a prompt to the model and display the response

Conclusion

In this guide, you’ve learned how to set up and run the Meta-LLaMA 3.1 8B Instruct model locally on a macOS machine using Hugging Face Transformers, PyTorch. Running LLMs locally gives you more control, privacy, and customisation power.

If you’ve followed the steps successfully, you should now be able to:

  • Load and run LLaMA 3.1 using a simple Python script

  • Handle large models efficiently with quantization

  • Generate text responses using instruct-tuned prompts

Next Steps

  • Build a chatbot or command-line assistant using this model

  • Explore prompt engineering to optimize results

  • Experiment with multi-turn conversations

We engineer reliable, scalable, and intelligent digital systems that help businesses modernize, automate, and grow

A40, ITHUM Towers, B-308,

Sector 62 Noida-201301

+91 8750701919

I Cube Systems • All Rights Reserved 2025

We engineer reliable, scalable, and intelligent digital systems that help businesses modernize, automate, and grow

A40, ITHUM Towers, B-308,

Sector 62 Noida-201301

+91 8750701919

I Cube Systems • All Rights Reserved 2025

We engineer reliable, scalable, and intelligent digital systems that help businesses modernize, automate, and grow

A40, ITHUM Towers, B-308,

Sector 62 Noida-201301

+91 8750701919

I Cube Systems • All Rights Reserved 2025