What is the technical term for converting a sound recording to a phoneme vector?

−0

I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called transcription. However, this does not distinguish whether these symbols are phones, phonemes or text.

I strongly suspect that there is no more specific terminology (as in NLP) because of the following reasons:

tokenisation is a more general term in the context of computer science. E.g. compilers require lexical tokenisers to build syntax trees.
most audio ML models use spectograms as inputs and therefore bypass the need for concepts like phones or phonemes.
phonemes might not be the best way of modelling spoken language (e.g. due to their non-uniqueness).

However, I have never had the opportunity to tackle speech recognition problems and someone with better insights might have a different answer.

posted over 1 year ago

CC BY-SA 4.0

mr Tsjolder‭

11 9 3

Copy Link

Raw

Markdown

History

Communities

What is the technical term for converting a sound recording to a phoneme vector? Question

1 comment thread

1 answer

0 comment threads