Sunday, March 31, 2013

Hashing

  I ran across an issue of some importance several years ago that's gotten little acknowledgement. Basically, its how well a lexicon uses the distinctions amongst its phonemic inventory.  It's a step away from phonotactics.
  Essentially, it makes a little sense to distinguish sounds if they don't contribute to the distinctiveness of words. It's slightly contentious, but some languages test this more strongly than others, in terms of  allophony (some languages' phonemes are very flexible, like in Pirahã). There can be extra ways to distinguish words e.g. syntax, prosody. So the level of redundancy probably varies as well.
  The point tho is that, if the words in your language are, eg CVCV...etc. and you have syllables A, B, C, and D, and a lexicon of a few words using them, e.g. AB and CD, then you can do several things, because they're not utilizing the inventory very well. Maximally, if these are the only 2 words in the language, you actually only need two syllables in your inventory, with this lexicon, eg AA and CC (or any other pair of syllables) maybe even condensing the words to one syllable eg A and B.  Anyways, this process is known as hashing and it was solved for optimality in 1992.
  So I'm not a programmer or anything but its cool. It lets you know how important words are to the sound of the language eg testing the high frequency words and sounds of the language, and testing polysemy, etc. it would be good for designing a shorthand e.g. adapting PLOVER to a language, and its how a stenotype for Japanese uses only 10 keys, basically a home row.

No comments:

Post a Comment