Thursday, January 27, 2011

Ok thoughts for a moment on the way that spelling is taught and utilized in america. The pure and infinite fact is simply that we make spelling up, that the spelling system is ugly and horrible and nobody likes it and it's basically logographic. The secondary thing is that the kids need to learn things by rote, day after day, and just like understand it.
I'd ignore this bullshit phonics shit. They can figure things out themselves. Patterns, you can tell them, but think of itlike leaning any set of patterns in a foreign language - you have to acknowledge that there are analogical sets that they'll be using, so just focus on that. Get out clusters of orthographic-phonetics patterns, get out clusters of derivatioal patterns, have them learn them.
nasal/nose
papal/pope
mice mouse, lice louse,
their heir shares his cares with the bears and pears. Rhyming books.

rhyming books. in essense, we know what to do, and it really is not about some linguist showing up and telling the children'sbook guides how to work. Rather, complicated acts occur in the motion from book to book. More complicated task is how to chose which book. appropriateness vs fun vs educataionality. A given book can be given a code based upon rough estimations of the lexicon of a given child, roughly modelled by the difficulty of the book and the utterance wordcounts from each book. These may then model which books will be easy-yet-exciting and which wouldn't. Keep apace. Model that with enjoyment factors, with choice, with description. this is about mcie. this is adventure. this talks alot, or describes alot. but the language part we can model. if we have constructional parsers we can model that too. Given term has x difficulty. given construction has x difficulty. drop a constructional lexicon into it; how easy would this be for X? Assume that reading is everything. factor weightings of diffculty based upon reaction of studetn. 43 difficult was too hard. 30 too easy. 38 then? enders game howbout. etcetc.
How then is this part of a larger educational frame? The prime question being a two-part one; how much data can one glean about a person's knowledge from what they've read, and how much data can one glean about a person's knoweldge form other sources, ie testing. Now the most insane possible methodology for this would be to do heavyparses on a bunch of data, calculate rough measures of difficulty for a given section, and follow the motion of one's eyes through a webcam to track their reading abilities. This would give you the kind of seemless updating of lexical knowledge - what terms they have to look back over, what constructions they have to look back over, etc - which would give a really heavy model of how well a person knows the language. The kind of data sparsity involved in estimating a lexicon, then, could be done through averages over kNN models; the average knowledge of word i, given interaction with word i, across the set of l most similar readers. Given 500 readers, 10 will be equally likely to know requisite and equally unlikely to know peruse, etc.