Post

5 techniques to fine-tune LLMs, explained visually!

Fine-tuning large language models traditionally involved adjusting billions of parameters, demanding significant computational power and resources.

However, the development of some innovative methods have transformed this process.

Hereโ€™s a snapshot of five cutting-edge techniques for finetuning LLMs, each explained visually for easy understanding.

LoRA:

  • Introduce two low-rank matrices, A and B, to work alongside the weight matrix W.
  • Adjust these matrices instead of the behemoth W, making updates manageable.

LoRA-FA (Frozen-A):

  • Takes LoRA a step further by freezing matrix A.
  • Only matrix B is tweaked, reducing the activation memory needed.

VeRA:

  • All about efficiency: matrices A and B are fixed and shared across all layers.
  • Focuses on tiny, trainable scaling vectors in each layer, making it super memory-friendly.

Delta-LoRA:

  • A twist on LoRA: adds the difference (delta) between products of matrices A and B across training steps to the main weight matrix W.
  • Offers a dynamic yet controlled approach to parameter updates.

LoRA+:

  • An optimized variant of LoRA where matrix B gets a higher learning rate. This tweak leads to faster and more effective learning.

Credits to Avi Chawla for great visualisation! ๐Ÿ‘

 Visualization 5 fine-tune LLMs

Translate to Korean

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋ ค๋ฉด ์ „ํ†ต์ ์œผ๋กœ ์ˆ˜์‹ญ์–ต ๊ฐœ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์กฐ์ •ํ•ด์•ผ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋‹นํ•œ ๊ณ„์‚ฐ ๋Šฅ๋ ฅ๊ณผ ๋ฆฌ์†Œ์Šค๊ฐ€ ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ๋ช‡ ๊ฐ€์ง€ ํ˜์‹ ์ ์ธ ๋ฐฉ๋ฒ•์˜ ๊ฐœ๋ฐœ๋กœ ์ด ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋ฐ”๋€Œ์—ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ LLM์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•œ 5๊ฐ€์ง€ ์ตœ์ฒจ๋‹จ ๊ธฐ์ˆ ์„ ๊ฐ„๋žตํ•˜๊ฒŒ ์„ค๋ช…ํ•œ ๊ฒƒ์œผ๋กœ, ๊ฐ ๊ธฐ๋ฒ•์€ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์‹œ๊ฐ์ ์œผ๋กœ ์„ค๋ช…๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋กœ๋ผ:

  • ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ W์™€ ํ•จ๊ป˜ ์ž‘๋™ํ•˜๋„๋ก ๋‘ ๊ฐœ์˜ ๋‚ฎ์€ ์ˆœ์œ„ ํ–‰๋ ฌ A์™€ B๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฑฐ๋Œ€ W ๋Œ€์‹  ์ด ํ–‰๋ ฌ์„ ์กฐ์ •ํ•˜์—ฌ ์—…๋ฐ์ดํŠธ๋ฅผ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

LoRA-FA(๋ƒ‰๋™-A):

  • ํ–‰๋ ฌ A๋ฅผ ๋™๊ฒฐํ•˜์—ฌ LoRA๋ฅผ ํ•œ ๋‹จ๊ณ„ ๋” ๋ฐœ์ „์‹œํ‚ต๋‹ˆ๋‹ค.
  • ๋งคํŠธ๋ฆญ์Šค B๋งŒ ์กฐ์ •๋˜์–ด ํ•„์š”ํ•œ ํ™œ์„ฑํ™” ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.

๋ฒ ๋ผ:

  • ํšจ์œจ์„ฑ์— ๊ด€ํ•œ ๋ชจ๋“  ๊ฒƒ: ํ–‰๋ ฌ A์™€ B๋Š” ๊ณ ์ •๋˜์–ด ์žˆ๊ณ  ๋ชจ๋“  ๊ณ„์ธต์—์„œ ๊ณต์œ ๋ฉ๋‹ˆ๋‹ค.
  • ๊ฐ ๋ ˆ์ด์–ด์—์„œ ์ž‘๊ณ  ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์Šค์ผ€์ผ๋ง ๋ฒกํ„ฐ์— ์ค‘์ ์„ ๋‘์–ด ๋ฉ”๋ชจ๋ฆฌ ์นœํ™”์ ์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

๋ธํƒ€-๋กœ๋ผ:

  • LoRA์˜ ํŠธ์œ„์ŠคํŠธ: ํ›ˆ๋ จ ๋‹จ๊ณ„์—์„œ ํ–‰๋ ฌ A์™€ B์˜ ๊ณฑ ๊ฐ„์˜ ์ฐจ์ด(๋ธํƒ€)๋ฅผ ์ฃผ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ W์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ์— ๋Œ€ํ•œ ๋™์ ์ด๋ฉด์„œ๋„ ์ œ์–ด๋œ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋กœ๋ผ+:

  • ํ–‰๋ ฌ B๊ฐ€ ๋” ๋†’์€ ํ•™์Šต๋ฅ ์„ ์–ป๋Š” LoRA์˜ ์ตœ์ ํ™”๋œ ๋ณ€ํ˜•์ž…๋‹ˆ๋‹ค. ์ด ์กฐ์ •์€ ๋” ๋น ๋ฅด๊ณ  ํšจ๊ณผ์ ์ธ ํ•™์Šต์œผ๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค.
This post is licensed under CC BY 4.0 by the author.