“The How of Parameter-Efficient Fine-Tuning with LoRA: Exploring the Inner Workings”

Source: ChatGPT+

Part 1 delves into the concept and necessity of fine-tuning pre-trained large models for specialized tasks. It introduces the conventional method of fine-tuning, where only the top layers of the model are adjusted, and highlights its limitations, particularly in terms of computational and storage demands. To address these challenges, the article shifts focus to Parameter-Efficient Fine-Tuning (PEFT) methods, specifically the use of adapter modules, as proposed by Houlsby and colleagues. These adapters are small, inserted layers that allow for task-specific training without altering the entire model, significantly reducing computational and storage costs.

The article then explores the concept of a model’s intrinsic dimension, as discussed in works by Li et al. and Aghajanyan et al., suggesting that LLMs can be effectively fine-tuned with a surprisingly small subset of parameters. This leads to the introduction of Low-Rank Adaptation (LoRA), a method that hypothesizes the possibility of decomposing adapters into low-rank matrices for efficient fine-tuning. This is followed by a discussion on principles of low-rank matrix approximation and its application in LoRA.

Set the stage — Data, Model, Library, Pre-training

Huggingface has developed the peft (parameter efficient fine-tuning techniques) library to facilitate the parameter efficient adaptation of pre-trained language models for various downstream applications without fine-tuning all of the model’s parameters. The peft library supports multiple fine-tuning methods one of which is LoRA (Low Rank Adapters) and it can be applied to various model types, not limited to transformers.