Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Incubator Q&A

Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.

Are you here to participate in a specific proposal? Click on the proposal tag (with the dark outline) to see only posts about that proposal and not all of the others that are in progress. Tags are at the bottom of each post.

How does fine tuning work? Question

+2
−0

LLMs are trained on a lot of text. At its heart, an LLM is a glorified autocomplete engine. You give it text, and it adds more text that matches the pattern.

You need to "direct" the autocomplete, so that it's useful. For example, when given a question, you don't want it to simply keep rephrasing the question and saying how confused it is in different ways. You want it to complete with an answer. I understand that this part is accomplished by fine tuning. I know that models are provided a set of QA pairs, but also these are a much smaller set than the training corpus. Fine tuning also does alignment, where the model is taught to give useful and "good" answers.

How exactly does fine tuning work, at a high level? I am asking only for an overview of common/popular techniques, used by models like GPT, Claude, Bard, Llama and their open source relatives. I don't need a comprehensive list of every single technique - I understand it is a complex topic and there's a lot of variation. However, I would like enough mention of technical detail, that I can follow up on individual subtopics in my future research.

If there is a good, accessible summary available in the literature (like a review paper), feel free to mention it! However, I would expect an answer that at least summarizes or paraphrases it, rather than a link-only answer.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

0 answers

Sign up to answer this question »