Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Incubator Q&A

Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.

Is LLM fine tuning the same as transfer learning? Question

+2
−0

When people talk about fine tuning LLMs, it sounds a lot like glorified transfer learning. Is it the same thing, or is it a distinct technique? What are the differences?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+2
−0

TL;DR: Fine-tuning is not the same as transfer learning, but is often a part of transfer learning.

The goal of transfer learning is to "transfer" information from one model to some other model. Typically, the goal is to reuse parts of a good-performing, existing model to create a new model (possibly for a completely different task).

Transfer learning is commonly applied to neural networks because they are known as feature extraction models, i.e. they are able to extract useful features from raw data automatically and do not require that much feature engineering. This means that we can take the first layers of an existing network to reuse the features that were learned by the network. The only thing that is necessary to "transfer" this knowledge is to train a small network to solve the task of mapping these (hopefully) powerful features to the desired prediction.

Fine-tuning, on the other hand, is about performing a few additional training steps on an already (fully) trained model. This can be useful to "correct" the model when the input distribution changes or if it turns out to have some sort of bias.

Fine-tuning is also commonly used in the context of transfer learning with neural networks. The key idea is that the part of the network that is being reused to produce features is not optimal for the (potentially completely different) task at hand. Therefore, it can make sense to also update the feature extraction network itself instead of only training the small prediction network. By training both networks after the transfer, the first layers are effectively being fine-tuned for the current task. Maybe it is also useful to point out that there is also a chance that some existing features from the original network can be destroyed by fine-tuning.

In the context of LLMs, fine-tuning is often actually transfer learning (if we stick to the explanations I provided). Typically, the first step is to take some (very large) pre-trained model that has been trained on an unsupervised task to extract powerful features. Then you train a model to use these representations to solve the task, typically without touching the massive amount of parameters in the pre-trained model. Thus far, this is transfer learning without fine-tuning. This is the point where most people will probably stop and have a model that solves their problems reasonably well. It is possible to also fine-tune the full language model, but that typically requires an unreasonable amount of resources for a modest or no improvement. There are more advanced methods (e.g. LoRA and friends) that claim to make fine-tuning large LLMs feasible. However, these models typically introduce additional parameters to the LLM to update the representations, but the original weights remain untouched. Therefore, it is debatable whether this is fine-tuning or just transfer learning. In any case, when using LLMs it is not possible to do fine-tuning without transfer learning, because these models have been trained in an unsupervised way and thus require additional layers to solve a task.

This turned out rather long, but I hope it manages to explain the differences sufficiently well.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »