Post

LLM 용어 꼭 알아 두어야 할 것들..!

LLM Lingo: Must-Know Terms ( Created By: Aishwarya Naresh Reganti )

Part 1 : Baseline

LLM Lingo TypeDescription 
Foundation ModelLLM designed to generate and understand human-like text across a wide range of use-cases 
TransformerA popular LLM design known for its attention mechanism and parallel processing abilities 
PromptingProviding carefully crafted inputs to an LLM to generate desired outputs 
Context-LengthMaximum number of input words/tokens an LLM can consider when generating an output 
Few-Shot LearningProviding very few examples to an LLM to assist it in performing a specific task. 
Zero-Shot LearningProviding only task instructions to the LLM relying solely on its preexisting knowledge 
RAGRetrieval-Augmented Generation. Appending retrieved information to improve LLM response 
Knowledge Base(KB)Collection of documents from which relevant information is retrieved in RAG 
Vector DatabaseStores vector representations of the KB, aiding the retrieval of relevant information in RAG, 
Fine-TuningAdapting an LLM to a specific task or domain by further training it on task-specific data. 
Instruction TuningAdjusting an LLM’s behavior during fine-tuning by providing specific guidelines/directives 
HallucinationTendency of LLMs to sometimes generate incorrect or non-factual information. 

Part 2 : Fine Tuning Edition

Curiosity: What insights can we retrieve from this? How does this connect to innovation in the field?

LLM Lingo TypeDescription 
In-Context LearningIntegrating task examples into prompts, enabling LLMs to handle new tasks without fine-tuning. 
SFTSupervised Fine-Tuning. Updating a pre-trained LLM with labeled data to perform a specific task. 
Contrastive LearningFine-tuning method that improves LLM by teaching it to discern data similarity and differences. 
Transfer LearningApplying pre-trained knowledge from large datasets to improve LLM performance on smaller, task specific data. 
Reward Modeling Designing objectives to reward LLM outputs during the reinforcement learning process. 
Reinforcement LearningTraining LLMs through trial and error, with rewards/penalties based on its generated outputs. 
RLHFReinforcement Learning from Human Feedback. Human feedback is used as reward/penalty for LLM. 
PEFTParameter-Efficient Fine-Tuning updates only few parameters of LLMs and is hence both compute and cost efficient. 
QuantizationReducing the precision of LLM parameters to save computational resources without sacrificing performance. 
PruningTrimming surplus connections or parameters to make LLMs smaller and faster yet performant. 
LoRALow-Rank Adaption is a PEFT method that inserts a smaller set of new weights to the LLM & trains only those. 
Freeze TuningFine-tune with most of the LLM’s weights frozen, except for some layers, generally, the task specific layers 

Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models : https://arxiv.org/abs/2102.06794

LLM Performance and Speed

LLM Performance and Speed

🤔 What is Multi-Token Prediction Training?

  • 💡Multi-token prediction is a training approach for LLMs like GPT and Llama, which typically use a next-token prediction loss during training.
  • 💡Instead of predicting only the next token in a sequence, multi-token prediction involves training the language model to predict multiple future tokens simultaneously.
  • 💡Specifically, at each position in the training data, the model is tasked with predicting the next n tokens using n independent output heads, which operate on top of a shared model trunk.
  • 💡This method aims to improve sample efficiency by providing the model with more context and allowing it to anticipate multiple tokens ahead.

📖 Some more insights:

  • ⛳ The proposed architecture involves a shared transformer trunk for producing a latent representation of the context, followed by independent output heads for predicting future tokens. This allows for the computation of the cross-entropy loss for multi-token prediction.
  • ⛳ A challenge in training multi-token predictors is reducing GPU memory utilization. The paper proposes a memory-efficient implementation by carefully managing the forward and backward operations, significantly reducing peak
  • ⛳ The authors note that multi-token prediction leads to higher sample efficiency, enabling models to solve more problems with the same computational budget. Models trained with multi-token prediction achieve faster inference times, especially with self-speculative decoding methods.
  • ⛳ Multi-token prediction promotes learning longer-term patterns, which is particularly beneficial for tasks like byte-level tokenization.
  • ⛳ Performance gains of this method are especially pronounced on generative benchmarks like coding, where models consistently outperform strong baselines by several percentage points. The 13B parameter models solve 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models.
  • ⛳ Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities.
  • ⛳ As an additional benefit, models trained with 4-token prediction are up to 3× faster at inference, even with large batch sizes.
This post is licensed under CC BY 4.0 by the author.