Is LLM fine tuning the same as transfer learning?

−0

TL;DR: Fine-tuning is not the same as transfer learning, but is often a part of transfer learning.

The goal of transfer learning is to "transfer" information from one model to some other model. Typically, the goal is to reuse parts of a good-performing, existing model to create a new model (possibly for a completely different task).

Transfer learning is commonly applied to neural networks because they are known as feature extraction models, i.e. they are able to extract useful features from raw data automatically and do not require that much feature engineering. This means that we can take the first layers of an existing network to reuse the features that were learned by the network. The only thing that is necessary to "transfer" this knowledge is to train a small network to solve the task of mapping these (hopefully) powerful features to the desired prediction.

Fine-tuning, on the other hand, is about performing a few additional training steps on an already (fully) trained model. This can be useful to "correct" the model when the input distribution changes or if it turns out to have some sort of bias.

Fine-tuning is also commonly used in the context of transfer learning with neural networks. The key idea is that the part of the network that is being reused to produce features is not optimal for the (potentially completely different) task at hand. Therefore, it can make sense to also update the feature extraction network itself instead of only training the small prediction network. By training both networks after the transfer, the first layers are effectively being fine-tuned for the current task. Maybe it is also useful to point out that there is also a chance that some existing features from the original network can be destroyed by fine-tuning.

In the context of LLMs, fine-tuning is often actually transfer learning (if we stick to the explanations I provided). Typically, the first step is to take some (very large) pre-trained model that has been trained on an unsupervised task to extract powerful features. Then you train a model to use these representations to solve the task, typically without touching the massive amount of parameters in the pre-trained model. Thus far, this is transfer learning without fine-tuning. This is the point where most people will probably stop and have a model that solves their problems reasonably well. It is possible to also fine-tune the full language model, but that typically requires an unreasonable amount of resources for a modest or no improvement. There are more advanced methods (e.g. LoRA and friends) that claim to make fine-tuning large LLMs feasible. However, these models typically introduce additional parameters to the LLM to update the representations, but the original weights remain untouched. Therefore, it is debatable whether this is fine-tuning or just transfer learning. In any case, when using LLMs it is not possible to do fine-tuning without transfer learning, because these models have been trained in an unsupervised way and thus require additional layers to solve a task.

This turned out rather long, but I hope it manages to explain the differences sufficiently well.

posted over 1 year ago

CC BY-SA 4.0

mr Tsjolder‭

11 9 3

Copy Link

Raw

Markdown

History

Communities

Is LLM fine tuning the same as transfer learning? Question

0 comment threads

1 answer

0 comment threads