Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.
What is the technical term for converting a sound recording to a phoneme vector? Question
Many natural language processing models begin by taking text and converting it to a vector where each element is a number representing some semantic entity (I would say each number is a word, but a token is not necessarily a word). This is sometimes called tokenization.
If you wanted to take a raw sound recording of someone speaking a specified known language (say you know a priori the person is speaking correct English) and you want to extract a a vector where each element is a number representing a phonemic (or perhaps phonetic is a better term) token, what would this process be called?
I know that there are many voice recognition models, but these usually go the whole way of converting sound to text, losing most non-semantic properties (accents, inflection) in the process. I would like to know the name of the process up until the filtering out of the non-semantics.
1 answer
I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called transcription. However, this does not distinguish whether these symbols are phones, phonemes or text.
I strongly suspect that there is no more specific terminology (as in NLP) because of the following reasons:
- tokenisation is a more general term in the context of computer science. E.g. compilers require lexical tokenisers to build syntax trees.
- most audio ML models use spectograms as inputs and therefore bypass the need for concepts like phones or phonemes.
- phonemes might not be the best way of modelling spoken language (e.g. due to their non-uniqueness).
However, I have never had the opportunity to tackle speech recognition problems and someone with better insights might have a different answer.
1 comment thread