LLM 용어 꼭 알아 두어야 할 것들..!

Posted May 7, 2024 Updated Nov 15, 2025

Rankings of model performance change drastically depending on which LLM is used as the judge on KILT-NQ

By Fodev JEO 4 min read

LLM Lingo: Must-Know Terms ( Created By: Aishwarya Naresh Reganti )

Part 1 : Baseline

LLM Lingo Type Description

1
Foundation Model

LLM designed to generate and understand human-like text across a wide range of use-cases

1
Transformer

A popular LLM design known for its attention mechanism and parallel processing abilities

1
Prompting

Providing carefully crafted inputs to an LLM to generate desired outputs

1
Context-Length

Maximum number of input words/tokens an LLM can consider when generating an output

1
Few-Shot Learning

Providing very few examples to an LLM to assist it in performing a specific task.

1
Zero-Shot Learning

Providing only task instructions to the LLM relying solely on its preexisting knowledge

1
RAG

Retrieval-Augmented Generation. Appending retrieved information to improve LLM response

1
Knowledge Base(KB)

Collection of documents from which relevant information is retrieved in RAG

1
Vector Database

Stores vector representations of the KB, aiding the retrieval of relevant information in RAG,

1
Fine-Tuning

Adapting an LLM to a specific task or domain by further training it on task-specific data.

1
Instruction Tuning

Adjusting an LLM’s behavior during fine-tuning by providing specific guidelines/directives

1
Hallucination

Tendency of LLMs to sometimes generate incorrect or non-factual information.

Part 2 : Fine Tuning Edition

Curiosity: What insights can we retrieve from this? How does this connect to innovation in the field?

LLM Lingo Type Description

1
In-Context Learning

Integrating task examples into prompts, enabling LLMs to handle new tasks without fine-tuning.

1
SFT

Supervised Fine-Tuning. Updating a pre-trained LLM with labeled data to perform a specific task.

1
Contrastive Learning

Fine-tuning method that improves LLM by teaching it to discern data similarity and differences.

1
Transfer Learning

Applying pre-trained knowledge from large datasets to improve LLM performance on smaller, task specific data.

1
Reward Modeling

Designing objectives to reward LLM outputs during the reinforcement learning process.

1
Reinforcement Learning

Training LLMs through trial and error, with rewards/penalties based on its generated outputs.

1
RLHF

Reinforcement Learning from Human Feedback. Human feedback is used as reward/penalty for LLM.

1
PEFT

Parameter-Efficient Fine-Tuning updates only few parameters of LLMs and is hence both compute and cost efficient.

1
Quantization

Reducing the precision of LLM parameters to save computational resources without sacrificing performance.

1
Pruning

Trimming surplus connections or parameters to make LLMs smaller and faster yet performant.

1
LoRA

Low-Rank Adaption is a PEFT method that inserts a smaller set of new weights to the LLM & trains only those.

1
Freeze Tuning

Fine-tune with most of the LLM’s weights frozen, except for some layers, generally, the task specific layers

Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models : https://arxiv.org/abs/2102.06794

LLM Performance and Speed

🤔 What is Multi-Token Prediction Training?

💡Multi-token prediction is a training approach for LLMs like GPT and Llama, which typically use a next-token prediction loss during training.
💡Instead of predicting only the next token in a sequence, multi-token prediction involves training the language model to predict multiple future tokens simultaneously.
💡Specifically, at each position in the training data, the model is tasked with predicting the next n tokens using n independent output heads, which operate on top of a shared model trunk.
💡This method aims to improve sample efficiency by providing the model with more context and allowing it to anticipate multiple tokens ahead.

📖 Some more insights:

⛳ The proposed architecture involves a shared transformer trunk for producing a latent representation of the context, followed by independent output heads for predicting future tokens. This allows for the computation of the cross-entropy loss for multi-token prediction.
⛳ A challenge in training multi-token predictors is reducing GPU memory utilization. The paper proposes a memory-efficient implementation by carefully managing the forward and backward operations, significantly reducing peak
⛳ The authors note that multi-token prediction leads to higher sample efficiency, enabling models to solve more problems with the same computational budget. Models trained with multi-token prediction achieve faster inference times, especially with self-speculative decoding methods.
⛳ Multi-token prediction promotes learning longer-term patterns, which is particularly beneficial for tasks like byte-level tokenization.
⛳ Performance gains of this method are especially pronounced on generative benchmarks like coding, where models consistently outperform strong baselines by several percentage points. The 13B parameter models solve 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models.
⛳ Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities.
⛳ As an additional benefit, models trained with 4-token prediction are up to 3× faster at inference, even with large batch sizes.

LLM, Lingo

This post is licensed under CC BY 4.0 by the author.