Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Incubator Q&A

Welcome to the staging ground for new communities! Each proposal has a description in the "Descriptions" category and a body of questions and answers in "Incubator Q&A". You can ask questions (and get answers, we hope!) right away, and start new proposals.

Post History

60%
+1 −0
Incubator Q&A What is the technical term for converting a sound recording to a phoneme vector?

I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called transcription. However, this does not distinguish whether these symbols...

posted 6mo ago by mr Tsjolder‭

Answer
#1: Initial revision by user avatar mr Tsjolder‭ · 2023-11-09T08:47:52Z (6 months ago)
I am no specialist on audio processing, but I believe the general task of converting raw audio to discrete symbols is called _transcription_.
However, this does not distinguish whether these symbols are phones, phonemes or text.

I strongly suspect that there is no more specific terminology (as in NLP) because of the following reasons:
 - tokenisation is a more general term in the context of computer science. E.g. compilers require [lexical tokenisers](https://en.wikipedia.org/wiki/Lexical_analysis) to build syntax trees.
 - most audio ML models use [spectograms](https://en.wikipedia.org/wiki/Spectrogram) as inputs and therefore bypass the need for concepts like phones or phonemes.
 - phonemes might not be the best way of modelling spoken language (e.g. due to their [non-uniqueness](https://en.wikipedia.org/wiki/Phoneme#The_non-uniqueness_of_phonemic_solutions)).

However, I have never had the opportunity to tackle speech recognition problems and someone with better insights might have a different answer.