Skip to content

Is Language Just Statistics

January 6, 2009

coglanglab's picture

Many years ago, I attended a talk in which a researcher (in retrospect, probably a graduate student) was talking about some work she was doing on modeling learning. She mentioned that a colleague was very proud of a model he had put together in which he had a model world populated by model creatures which learned to avoid predators and find food.

She reported that he said, "Look, they are able to learn this without *any* input from the programmer. It's all nurture, not nature." She argued with him at length to point out that he had programmed into his model creatures the structures that allowed to them to learn. Change any of those parameters, and they ceased to learn.

There are a number of researchers in the field of language who, impressed by the success of statistical-learning models, argue that much or all of language learning can be accomplished by simply noticing statistical patterns in language. For instance, there is a class of words in English that tend to follow the word "the." A traditional grammarian might call these "nouns," but this becomes unnecessary when using statistics.

There are many variants of this approach, some more successful than others. Some are more careful in their claims (one paper, I recall, stated strongly that the described model did away with not only grammatical rules, but words themselves).

While I am impressed by much of the work that has come out of this approach, I don't think it can ever do away with complex (possibly innate) structure. The anecdote above is an argument by analogy. Here is a great extended quote from Language Learnability and Language Development, Steven Pinker's original, 1984 foray into book writing:

As I argued in Pinker (1979), in most distributional learning procedures there are vast numbers of properties that a learner could record, and since the child is looking for correlations among these properties, he or she faces a combinatorial explosion of possibilities. For example, he or she could record of a given word that it occurs int eh first (or second, or third, or nth) position in a sentence, that it is to the left (or right) of word X or word Y or ..., or that it is to the left of the word sequence WXYZ, or that it occurs n the same sentence with word X (or words X, Y, Z, or some subset of them), and so on. Adding semantic and inflectional information to the space of possibilities only makes the explosion more explosive. To be sure, the inappropriate properties will correlate with no others and hence will eventually be ignored, leaving only the appropriate grammatical properties, but only after astronomical amounts of memory space, computation, or both.

In any case, most of these properties should be eliminated by an astute learner as being inappropriate to learning a human language in the first place. For example, there is no linguistic phenomenon in any language that is contingent upon a word's occupying the third serial position in a sentence, so why bother testing for one? Testing for correlations among irrelevant properties is not only wasteful but potentially dangerous, since many spurious correlations will arise in local samples of the input. For example, the child could hear the sentences John eats meat, John eats slowly, and the meat is good and then conclude that the slowly is good is a possible English sentence.

Ultimately, a pure-statistics model still has to decide what regularities to keep track of and what to ignore, and that requires at least some innate structure. It probably also requires fairly complex grammatical structures, whether learned or innate.

Comments

Slowly is good

January 7, 2009 by Anonymous, 43 weeks 3 days ago
Comment id: 33648

Concerning the Pinker quote above: it should be pointed out that "slowly is good" IS a possible English sentence (in fact, it would make a decent slogan for a speed reduction campaign).

Combinatorial Explosion

January 6, 2009 by Anonymous, 43 weeks 3 days ago
Comment id: 33640

The problem that I see with Pinker's "Combinatorial Explosion" is that he is presupposing a computational method by which possibilities are being tested -- i.e. in some serial assessment: "the child is looking for correlations".

However, the "learning procedure" is not only distributional, but the processing is as well. Every sentence an infant hears is being processed by literally the same wetware which acts as both processor and storage. The child does not need to *look* for correlations as though pulling examples from some internal database and comparing them one-by-one. Incoming utterances are processed; the act of processing, alters the processor. Over time, correlations reinforce each other and fall out of the system by virtue of the process.

I see language learning as a process, something much more akin to a genetic algorithm, another distributed computational method. The "Traveling Salesman" becomes computationally intractable when approached serially because of combinatorial explosion; GAs solves it quite easily.

Pinker suffers from an overly 'digital CPU' concept of the brain (and language). Here's a 'great' Pinker quote:

    “On the order of 40 million base pairs differ between chimpanzees and humans, and we see no reason to doubt that universal grammar would fit into these 10 megabytes with lots of room left over, especially if provisions for the elementary operations of a symbol-manipulation architecture are specified in the remaining 99% of the genome.” (Pinker and Bloom, 1990: 726)

Even in 1990, it was asinine to suggest any such quantification is possible and it betrays much about how Pinker believes DNA and brains function and how information in them is stored and processed. In reality brains and genomes do not store digital information, even though superficial similarities can be observed of a single neuron firing or of a single allele.

That being said, I don't disagree that there has to be innate structure. That's equally apparent; otherwise chimpanzees would be telling us about their days, their aspirations and their disappointments. However, that innateness is likely to be much more high level and structural than Pinker believes; and more likely derived from perception/action and cognitive salience than from specifying features and rules.

Innateness

January 6, 2009 by Anonymous, 43 weeks 4 days ago
Comment id: 33638

The question of innateness in computer programs is arguably moot; all programs consist of predetermined structure. No matter what the program's purpose or algorithmic method it must have innate qualities, otherwise it simply wouldn't be a program. On the whole I believe it is erroneous to constrict one's self by working within the assumed dichotomous schema of nature vs. nurture, as it so consistently produces senseless results and empty debates.



About us

Science Blog was started in August 2002. It lives, breathes and eats press releases from research organizations around the globe. Most of what you read here are press releases from the outfits named in the stories themselves. Got a news story you think belongs here? Let's talk. The other half of the equation is blog posts from readers like you. So if you have an interest in science, please register and join others like you in an ongoing, vibrant dialog about what makes the world tick. Meantime, please take a minute to read our Privacy Policy and Site Disclaimer.