Is Language Just Statistics

Many years ago, I attended a talk in which a researcher (in retrospect, probably a graduate student) was talking about some work she was doing on modeling learning. She mentioned that a colleague was very proud of a model he had put together in which he had a model world populated by model creatures which learned to avoid predators and find food.

She reported that he said, “Look, they are able to learn this without *any* input from the programmer. It’s all nurture, not nature.” She argued with him at length to point out that he had programmed into his model creatures the structures that allowed to them to learn. Change any of those parameters, and they ceased to learn.

There are a number of researchers in the field of language who, impressed by the success of statistical-learning models, argue that much or all of language learning can be accomplished by simply noticing statistical patterns in language. For instance, there is a class of words in English that tend to follow the word “the.” A traditional grammarian might call these “nouns,” but this becomes unnecessary when using statistics.

There are many variants of this approach, some more successful than others. Some are more careful in their claims (one paper, I recall, stated strongly that the described model did away with not only grammatical rules, but words themselves).

While I am impressed by much of the work that has come out of this approach, I don’t think it can ever do away with complex (possibly innate) structure. The anecdote above is an argument by analogy. Here is a great extended quote from Language Learnability and Language Development, Steven Pinker’s original, 1984 foray into book writing:

As I argued in Pinker (1979), in most distributional learning procedures there are vast numbers of properties that a learner could record, and since the child is looking for correlations among these properties, he or she faces a combinatorial explosion of possibilities. For example, he or she could record of a given word that it occurs int eh first (or second, or third, or nth) position in a sentence, that it is to the left (or right) of word X or word Y or …, or that it is to the left of the word sequence WXYZ, or that it occurs n the same sentence with word X (or words X, Y, Z, or some subset of them), and so on. Adding semantic and inflectional information to the space of possibilities only makes the explosion more explosive. To be sure, the inappropriate properties will correlate with no others and hence will eventually be ignored, leaving only the appropriate grammatical properties, but only after astronomical amounts of memory space, computation, or both.

In any case, most of these properties should be eliminated by an astute learner as being inappropriate to learning a human language in the first place. For example, there is no linguistic phenomenon in any language that is contingent upon a word’s occupying the third serial position in a sentence, so why bother testing for one? Testing for correlations among irrelevant properties is not only wasteful but potentially dangerous, since many spurious correlations will arise in local samples of the input. For example, the child could hear the sentences John eats meat, John eats slowly, and the meat is good and then conclude that the slowly is good is a possible English sentence.

Ultimately, a pure-statistics model still has to decide what regularities to keep track of and what to ignore, and that requires at least some innate structure. It probably also requires fairly complex grammatical structures, whether learned or innate.


Substack subscription form sign up

Comments are closed.