Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.
Post History
I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called transcription. However, this does not distinguish whether these symbols...
Answer
#1: Initial revision
I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called _transcription_. However, this does not distinguish whether these symbols are phones, phonemes or text. I strongly suspect that there is no more specific terminology (as in NLP) because of the following reasons: - tokenisation is a more general term in the context of computer science. E.g. compilers require [lexical tokenisers](https://en.wikipedia.org/wiki/Lexical_analysis) to build syntax trees. - most audio ML models use [spectograms](https://en.wikipedia.org/wiki/Spectrogram) as inputs and therefore bypass the need for concepts like phones or phonemes. - phonemes might not be the best way of modelling spoken language (e.g. due to their [non-uniqueness](https://en.wikipedia.org/wiki/Phoneme#The_non-uniqueness_of_phonemic_solutions)). However, I have never had the opportunity to tackle speech recognition problems and someone with better insights might have a different answer.