Saturday, March 2, 2013

Image Inning: Real time speech recognition

My friends are watching videos of Starcraft 2 matches, which includes play by play commentary, much like professional or college sports broadcasts.  The commentary included reading the chat between the two players. One player said "Sorry, my English is bad." to which the other player replied, "No, my Korean is bad.  Your English is pro."  This made me imagine speech recognition that compensated for accents and also real time speech recognition. 

The accent-recognition would be based on a database of all of the languages of humans.  The various languages would be categorized based on the International Phonetic Alphabet, which defines the common elements of human speech.  The user would have to program the software by selecting their native language.  Then they would select their linguistic upbringing, which would include the region they were raised in, where they learned their native language, the language of the persons that raised them, what languages besides their native language do they know, and when and how did they learn them (i.e. from birth, at home or during teen years, in high school, etc.).  The software would sort through a matrix of speech and language traits to synthesize the most probable speech pattern of the user.  Then the user would be given several words in their native language to repeat (2 times each), and then several words in the other languages that they identified themselves as speaking.  This would allow for the software to match the actual speech pattern with the predicted one for greater accuracy with a strong understanding of how a user would speak a different language. 

The user could then select whichever language they intended to speak, and the software would compensate for the user's accent producing accurate text regardless of the user's ability to speak the language.  Other possible parameters to improve accuracy would be to have the user speak at different speeds to compensate for how our pronunciation can change depending on the rate at which we speak.  Using grammar correction, such as the ones found in common word processor software, this technology could compensate for grammatical errors made by both foreign and native speakers.  This could be adjusted for strictness to allow for more slang, and a slang database could easily be included in the software.  International grammar correction could be included in the program based on a database of the grammatical rules of the various languages.  The differences between the respective language's rules would give insight to the likely mistakes a foreigner speaker might make, and this could also be adjusted for strictness. 

I imagine this being available on smart phones and tablets and maximizing the average civilians efficiency when using software and conducting international business.  I can't imagine this technology being that far away ;)s