The evolution of the exponent of Zipf's law in language ontogeny

PLoS One. 2013;8(3):e53227. doi: 10.1371/journal.pone.0053227. Epub 2013 Mar 13.

Abstract

It is well-known that word frequencies arrange themselves according to Zipf's law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf's law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf's law and linguistic complexity are inter-related. The assumption that Zipf's law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Age Factors
  • Child
  • Child, Preschool
  • Communication
  • Female
  • Humans
  • Infant
  • Language*
  • Linguistics
  • Male
  • Models, Theoretical*
  • Speech*

Grants and funding

This work was supported by grant 'Iniciacio i reincorporacio a la recerca' from the Universitat Politecnica de Catalunya (http://www.upc.cat) and the grant 'Biological and Social Data Mining: Algorithms, Theory, and Implementations' (TIN2011-27479-C04-03) from the Spanish Ministry of Science and Innovation (http://www.micinn.es/) (JB and RFC). This work was supported by the Northern Norwegian Regional Health Authority, Helse Nord RHF (BE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.