ChatGPT Models Reveal Fractal Nature of Human Language

Edited by: Vera Mo

Computational linguists have explored various methods for modeling language over six decades, recently finding potential answers using Large Language Models (LLMs) like ChatGPT. Initial approaches used Noam Chomsky's formal grammars and rigid semantics, which struggled with the fluid nature of meaning. The 1990s saw the introduction of statistical models based on n-grams, describing language via probabilities of word co-occurrence. For example, "io vedo" [I see] is more frequent than "io casa" [I house]. These models automated linguistic analysis but reduced meaning to word proximity. The advent of LLMs, based on transformer networks, marked a revolution. LLMs learn by predicting the next word in a sentence, a process repeated across web text. This enables them to predict words and generate text continuations. LLMs have facilitated large-scale statistical analysis of word frequencies, revealing the fractal nature of language. Like fractals, language exhibits self-similarity at different scales, with properties like coherence appearing in words, sentences, and entire texts. Long-range correlations connect distant words or paragraphs semantically. LLMs succeed due to their ability to generalize local information and model tacit knowledge, as defined by Michael Polanyi, acquired through experience. Linguists now recognize human language as a chaotic, complex phenomenon, with LLMs serving as tools to study its intricacies.

Did you find an error or inaccuracy?

We will consider your comments as soon as possible.