Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.
Are you here to participate in a specific proposal? Click on the proposal tag (with the dark outline) to see only posts about that proposal and not all of the others that are in progress. Tags are at the bottom of each post.
Post History
I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called transcription. However, this does not distinguish whether these symbols...
Answer
#1: Initial revision
I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called _transcription_. However, this does not distinguish whether these symbols are phones, phonemes or text. I strongly suspect that there is no more specific terminology (as in NLP) because of the following reasons: - tokenisation is a more general term in the context of computer science. E.g. compilers require [lexical tokenisers](https://en.wikipedia.org/wiki/Lexical_analysis) to build syntax trees. - most audio ML models use [spectograms](https://en.wikipedia.org/wiki/Spectrogram) as inputs and therefore bypass the need for concepts like phones or phonemes. - phonemes might not be the best way of modelling spoken language (e.g. due to their [non-uniqueness](https://en.wikipedia.org/wiki/Phoneme#The_non-uniqueness_of_phonemic_solutions)). However, I have never had the opportunity to tackle speech recognition problems and someone with better insights might have a different answer.