Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.
Are you here to participate in a specific proposal? Click on the proposal tag (with the dark outline) to see only posts about that proposal and not all of the others that are in progress. Tags are at the bottom of each post.
What is unsupervised learning? Question
In unsupervised learning, training data does not have labels. But, how can a model be trained without labels?
1 answer
Unsupervised learning covers a variety of different tasks. Depending on the task at hand, different techniques can be used. However, there are a few common paradigms that are used to extract information from unlabelled data.
A simple, yet effective method is to create labels from the input data. This is also known as self-supervised learning. Typically, this is done by using some modified version of the data as inputs and the original inputs as the label. This requires the model to learn a reconstruction function, making it capture a lot of useful information about the data at hand. Autoregressive models like GPT are trained to predict one or more of the next elements for a given sub-sequence.
It is also possible to learn something about the complex distribution of the data by mapping (unlabelled) samples to some simple distribution and/or vice versa. E.g. kernel density estimators represent the data distribution by a simple mixture model. A lot of modern generative (image and audio) models have learned to map samples from some normal distribution to samples from the data distribution (as captured by the model).
Finally, it is often possible to directly formulate an optimisation objective for the kind of information you wish to extract. The key challenge here is to find an objective that can be efficiently optimised. E.g. clustering methods typically try to find an assignment that minimises the distances between samples in the same cluster. Another example is principle components, which are the orthogonal unit vectors that maximise the variance of the data.
I tried to break things down a bit for the sake of clarity, but in the end, learning typically comes down to optimisation. Therefore, if you can set up an objective and have a method to find (approximate) solutions for your objective, you will be able to learn something, even from unlabelled data.
0 comment threads