Post

LLM 용어 꼭 알아 두어야 할 것들..!

LLM Lingo: Must-Know Terms ( Created By: Aishwarya Naresh Reganti )

Part 1 : Baseline

LLM Lingo TypeDescription 
1
Foundation Model
LLM designed to generate and understand human-like text across a wide range of use-cases 
1
Transformer
A popular LLM design known for its attention mechanism and parallel processing abilities 
1
Prompting
Providing carefully crafted inputs to an LLM to generate desired outputs 
1
Context-Length
Maximum number of input words/tokens an LLM can consider when generating an output 
1
Few-Shot Learning
Providing very few examples to an LLM to assist it in performing a specific task. 
1
Zero-Shot Learning
Providing only task instructions to the LLM relying solely on its preexisting knowledge 
1
RAG
Retrieval-Augmented Generation. Appending retrieved information to improve LLM response 
1
Knowledge Base(KB)
Collection of documents from which relevant information is retrieved in RAG 
1
Vector Database
Stores vector representations of the KB, aiding the retrieval of relevant information in RAG, 
1
Fine-Tuning
Adapting an LLM to a specific task or domain by further training it on task-specific data. 
1
Instruction Tuning
Adjusting an LLM’s behavior during fine-tuning by providing specific guidelines/directives 
1
Hallucination
Tendency of LLMs to sometimes generate incorrect or non-factual information. 

Part 2 : Fine Tuning Edition

LLM Lingo TypeDescription 
1
In-Context Learning
Integrating task examples into prompts, enabling LLMs to handle new tasks without fine-tuning. 
1
SFT
Supervised Fine-Tuning. Updating a pre-trained LLM with labeled data to perform a specific task. 
1
Contrastive Learning
Fine-tuning method that improves LLM by teaching it to discern data similarity and differences. 
1
Transfer Learning
Applying pre-trained knowledge from large datasets to improve LLM performance on smaller, task specific data. 
1
Reward Modeling 
Designing objectives to reward LLM outputs during the reinforcement learning process. 
1
Reinforcement Learning
Training LLMs through trial and error, with rewards/penalties based on its generated outputs. 
1
RLHF
Reinforcement Learning from Human Feedback. Human feedback is used as reward/penalty for LLM. 
1
PEFT
Parameter-Efficient Fine-Tuning updates only few parameters of LLMs and is hence both compute and cost efficient. 
1
Quantization
Reducing the precision of LLM parameters to save computational resources without sacrificing performance. 
1
Pruning
Trimming surplus connections or parameters to make LLMs smaller and faster yet performant. 
1
LoRA
Low-Rank Adaption is a PEFT method that inserts a smaller set of new weights to the LLM & trains only those. 
1
Freeze Tuning
Fine-tune with most of the LLM’s weights frozen, except for some layers, generally, the task specific layers 

Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models : https://arxiv.org/abs/2102.06794

LLM Performance and Speed

LLM Performance and Speed

🤔 What is Multi-Token Prediction Training?

  • 💡Multi-token prediction is a training approach for LLMs like GPT and Llama, which typically use a next-token prediction loss during training.
  • 💡Instead of predicting only the next token in a sequence, multi-token prediction involves training the language model to predict multiple future tokens simultaneously.
  • 💡Specifically, at each position in the training data, the model is tasked with predicting the next n tokens using n independent output heads, which operate on top of a shared model trunk.
  • 💡This method aims to improve sample efficiency by providing the model with more context and allowing it to anticipate multiple tokens ahead.

📖 Some more insights:

  • ⛳ The proposed architecture involves a shared transformer trunk for producing a latent representation of the context, followed by independent output heads for predicting future tokens. This allows for the computation of the cross-entropy loss for multi-token prediction.
  • ⛳ A challenge in training multi-token predictors is reducing GPU memory utilization. The paper proposes a memory-efficient implementation by carefully managing the forward and backward operations, significantly reducing peak
  • ⛳ The authors note that multi-token prediction leads to higher sample efficiency, enabling models to solve more problems with the same computational budget. Models trained with multi-token prediction achieve faster inference times, especially with self-speculative decoding methods.
  • ⛳ Multi-token prediction promotes learning longer-term patterns, which is particularly beneficial for tasks like byte-level tokenization.
  • ⛳ Performance gains of this method are especially pronounced on generative benchmarks like coding, where models consistently outperform strong baselines by several percentage points. The 13B parameter models solve 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models.
  • ⛳ Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities.
  • ⛳ As an additional benefit, models trained with 4-token prediction are up to 3× faster at inference, even with large batch sizes.
This post is licensed under CC BY 4.0 by the author.