Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Incubator Q&A

Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.

Are you here to participate in a specific proposal? Click on the proposal tag (with the dark outline) to see only posts about that proposal and not all of the others that are in progress. Tags are at the bottom of each post.

What types of learning should I know about? Question

+3
−0

In machine learning there seem to be various types of learning. I commonly hear about

  • supervised learning,
  • unsupervised learning, and
  • reinforcement learning.

However, it is not always clear to me how they differ or what paradigm is best used for a particular task. I can also imagine that these paradigms do not cover all possible tasks, as this list and the series topics on that page seem to suggest. Therefore, my question: What are the differences between these paradigms and what other paradigms should I know about?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Comprehensive? (1 comment)

1 answer

+0
−0

This question sounds basic, so I'll try to answer with a layperson in mind.

"Learning" is a bit misleading, because programs do not currently learn the way humans do. They do not have a mind or ability to reason, they don't "think". In essence, all machine learning today, including advanced AI like ChatGPT, is dumb habit, more akin to a mussel learning to close in response to some noxious stimuli than a college student learning a complex theory in class.

The learning happens because the model embeds some formula for crunching the input (which is currently always converted to numeric form, ie. to a list of numbers) and spitting out some result. This result can be a simple 1/0 (binary classification), or a new list of numbers which can be converted to an image (Dall-E), or even the URL or ID pointing to some resource online or in a database (retrieval, content recommendation) but currently it is always numeric (a list of numbers).

"Learning" is the technical term for tweaking the contents of the formula. For example, with linear regression, there is a set equation but the constants are tweaked. With a tree model, there is a decision tree that gets rearranged. With a neural network, there are many simple formulas (embedded in virtual "neurons"), and how exactly they are combined (their "weights") is what gets tweaked.

You can learn blindly, by just randomly tweaking the formula until it works on your test set. But usually it is much more efficient to have some heuristic for the model to tweak in a slightly more goal oriented manner. Professionals usually don't call it a "formula" but rather "weights", however weights can also refer to the structure of the formula and not merely the values.

In supervised learning, you simply show the model a bunch of paired inputs and correct outputs, and make it figure out the weights that would have resulted in those outputs. There are many ways to actually update the weights. Usually, you don't want the weights to perfectly match your sample data ("overfitting"), but rather you want to allow some freedom so the model can better deal with new data later that was not present in your sample. Incidentally, if the sample data is too complex and it's impossible to find a decent set of weights, that's underfitting. Note that for this to work, you need a potentially very large set of pre-calculated input-output pairs. Often you have to pay people to create this by hand, or pay data vendors to the dataset to you.

In unsupervised learning, there are precalculated outputs. You can see how it's a bit harder than supervised - how will you figure out the weights if you have no sample output? People have come up with some clever tricks though. The classic is clustering: It groups similar data together. Technically, there is an aspect of supervised to clustering - how many groups, how similar is similar enough, etc are basically weights. But these are usually not "learned" with machine learning, but rather selected in ad hoc ways by the programmer. You don't cluster by showing the model a bunch of correct clusters, instead you implement the clustering logic directly (even if you have to tweak the parameters a bit) - that's what makes unsupervised. Not all unsupervised learning is clustering.

Reinforcement learning is probably the most similar to biological learning. You show the model some inputs, and based on its response, you "reward" or "punish" it. This doesn't mean you kick and pat the computer, you simply give it a score which the model uses to update its weights based on whatever algorithm was chosen by the inventor of that type of model. Notably, the reward or punishment is not necessarily based on whether it gave the "correct output" - it's possible to do it based on a more vague basis such as whether you like the output or not. I'm not an expert on reinforcement, so I'll stop here.

Not mentioned is transfer learning. Many supervised models are programmed so that they're trained on a large batch of data. They don't consume inputs one at a time. Let's say you have a model that predicts if an email would be considered good news or bad news by the reader. Every week, you get feedback from users about whether the model was right or wrong, and you want to use this to refine the training. But it's expensive to train the model on a giant (and ever-growing) batch of emails every week, and a lot of those emails it's seeing are the same every week, so it's wasted effort to "relearn" those. Isn't there a better way? With some ML algorithms, there is. If the model is programmed the right way, instead of retraining on a whole batch each time, you just feed it only the new data, and it can update the weights, while respecting what was learned from the previous training runs. This is called incremental training. When you train first on data X, then incrementally on data Y, the model may not be the same or as good as if you had trained on data X+Y.

Another important use of transfer learning: Say Google publishes a model that can recognize human faces in photos. It took them millions of dollars and billions of pre-labeled photos to do it. That's great! But now you want to detect cats in photos. Do you also have to go to all that trouble? Not necessarily. A lot of the work of recognizing objects in pictures is not particular to the type of object: Things like finding outlines, segmenting, adjusting for light are universal (not if you want to use a face recognition model to find molecules electron microscope captures!). If the model Google used supports transfer learning, it can be possible to simply do a much smaller, additional training with a few cat photos only, to teach it to "do the same thing, except with cats instead of faces".

With LLMs, there is also fine tuning. The bulk of the model is about learning to parse sentences and make new sentences. However, there is a tiny extra bit where you also try to teach the model to not curse, answer in the correct language, don't make things up too much. That extra bit is called fine tuning. I don't know much about fine tuning, but I suspect it can be via reinforcement, transfer learning, incremental training or some combination of these.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Might be wrong (4 comments)

Sign up to answer this question »