On April 18th, 2024, Meta released Llama 3 Large Language Models (LLMs), pretrained and instruction tuned generative text models in the 8B and 70B parameter sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases. The first version of the Llama models was released in February 2023 as one of the first open weight large language models. Subsequently, it launched the second version in July 2023.
The Llama LLMs from Meta are one of the first open weight large language models. Open weight models are fundamentally different from open source models. Meta has not released the source code, or the dataset used to train the models. It only published the weights and the inference code in the public domain. However, the open weight model combined with the permissive distribution policy ( https://llama.meta.com/llama3/license/ ) for commercial use makes these models attractive to ML researchers seeking to create new variants.
Open weight models are fundamentally different from open source models, which include training source code, weights that can be compared to software executables or binaries, and inference code that allows developers to use the model.
Llama 3 model attributes
Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Meta’s Llama 3 model comes in two sizes, 8B and 70B parameters, with the models trained on over 15 Trillion tokens from publicly available sources. Both the 8B and 70B versions use Grouped-Query Attention (GQA) for improved inference scalability. These are static models trained on an offline dataset with the knowledge cutoff in March 2023 for 8B, and December 2023 for the 70B model.
Figure 1: Model architecture (Source: Meta)
The Llama 3 models accept input text only, and the Output models generate text and code only. Llama 3 supports a context length of 8K tokens. Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
Llama 3 has been evaluated with CyberSecEval, Meta’s cybersecurity safety eval suite, measuring Llama 3’s propensity to suggest insecure code when used as a coding assistant, and Llama 3’s propensity to comply with requests to help carry out cyber-attacks, where attacks are defined by the industry standard MITRE ATT&CK cyber-attack ontology. Llama 3 behaved in the same range or safer than models of equivalent coding capability.
Llama 3 models were trained on Nvidia’s H100-80GB GPUs. The two variants consumed about 7.7 Million GPU hours, approximating $16+ Million to train these models.
Figure 2: Resource Consumption (Source: Meta)
Llama 3 Model Benchmarks
A base pretrained model is a transformer-based model architecture that has been pretrained on a vast corpus of text data to understand and generate human-like text. These pretrained models serve as excellent starting points for various natural language processing (NLP) tasks, including text generation, summarization, translation, and more.
Figure 3: Base pretrained model (Source: Meta)
An instruction-tuned model for a Large Language Model (LLM) refers to a model that has been fine-tuned or adapted for a specific task or domain using instruction-based data. Instruction-based data could include examples, guidelines, rules, or specific directions tailored to a particular use case or application.
Figure 4: Instruction Tuned Model (Source: Meta)