Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.
How does fine tuning work? Question
LLMs are trained on a lot of text. At its heart, an LLM is a glorified autocomplete engine. You give it text, and it adds more text that matches the pattern.
You need to "direct" the autocomplete, so that it's useful. For example, when given a question, you don't want it to simply keep rephrasing the question and saying how confused it is in different ways. You want it to complete with an answer. I understand that this part is accomplished by fine tuning. I know that models are provided a set of QA pairs, but also these are a much smaller set than the training corpus. Fine tuning also does alignment, where the model is taught to give useful and "good" answers.
How exactly does fine tuning work, at a high level? I am asking only for an overview of common/popular techniques, used by models like GPT, Claude, Bard, Llama and their open source relatives. I don't need a comprehensive list of every single technique - I understand it is a complex topic and there's a lot of variation. However, I would like enough mention of technical detail, that I can follow up on individual subtopics in my future research.
If there is a good, accessible summary available in the literature (like a review paper), feel free to mention it! However, I would expect an answer that at least summarizes or paraphrases it, rather than a link-only answer.
0 comment threads