How does fine tuning work?

LLMs are trained on a lot of text. At its heart, an LLM is a glorified autocomplete engine. You give it text, and it adds more text that matches the pattern.

You need to "direct" the autocomplete, so that it's useful. For example, when given a question, you don't want it to simply keep rephrasing the question and saying how confused it is in different ways. You want it to complete with an answer. I understand that this part is accomplished by fine tuning. I know that models are provided a set of QA pairs, but also these are a much smaller set than the training corpus. Fine tuning also does alignment, where the model is taught to give useful and "good" answers.

How exactly does fine tuning work, at a high level? I am asking only for an overview of common/popular techniques, used by models like GPT, Claude, Bard, Llama and their open source relatives. I don't need a comprehensive list of every single technique - I understand it is a complex topic and there's a lot of variation. However, I would like enough mention of technical detail, that I can follow up on individual subtopics in my future research.

If there is a good, accessible summary available in the literature (like a review paper), feel free to mention it! However, I would expect an answer that at least summarizes or paraphrases it, rather than a link-only answer.

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

community not specified

This question is missing a tag for an active community proposal. If the proposal this question is part of exists, please add the tag and the question can be reopened. If the proposal does not yet exist, please start an entry in the Descriptions category to describe the intended community. Descriptions can be fleshed out over time.

Communities

How does fine tuning work? Question

0 comment threads

0 answers