dorsal/arxiv
View SchemaA quantitative analysis of concepts and semantic structure in written language: Long range correlations in dynamics of texts
| Authors | E. Alvarez-Lacalle, B. Dorow, J. -P. Eckmann, E. Moses |
|---|---|
| Categories | |
| ArXiv ID | physics/0510276 |
| URL | https://arxiv.org/abs/physics/0510276 |
Abstract
Understanding texts requires memory: the reader has to keep in mind enough words to create meaning. This calls for a relation between the memory of the reader and the structure of the text. To investigate this interaction, we first identify a connectivity matrix defined by co-occurrence of words in the text. A vector space of words characterizing the text is spanned by the principal directions of this matrix. It is useful to think of these weighted combinations of words as representing ``concepts''. As the reader follows the text, the set of words in her window of attention follows a dynamical motion among these concepts. We observe long range power law correlations in this trajectory. By explicitly constructing surrogate hierarchical texts, we demonstrate that the power law originates from structural organization of texts into subunits such as chapters and paragraphs.
{
"annotation_id": "d6e3e70a-b08a-43b9-ae80-d56ff7138888",
"date_created": "2026-03-02T18:01:04.040000Z",
"date_modified": "2026-03-02T18:01:04.040000Z",
"file_hash": "22b2382aa4afa793dd9019794e23c4f5b063db2a8ac972ce66f8690ce0449c30",
"private": false,
"record": {
"abstract": "Understanding texts requires memory: the reader has to keep in mind enough\nwords to create meaning. This calls for a relation between the memory of the\nreader and the structure of the text. To investigate this interaction, we first\nidentify a connectivity matrix defined by co-occurrence of words in the text. A\nvector space of words characterizing the text is spanned by the principal\ndirections of this matrix. It is useful to think of these weighted combinations\nof words as representing ``concepts\u0027\u0027. As the reader follows the text, the set\nof words in her window of attention follows a dynamical motion among these\nconcepts. We observe long range power law correlations in this trajectory. By\nexplicitly constructing surrogate hierarchical texts, we demonstrate that the\npower law originates from structural organization of texts into subunits such\nas chapters and paragraphs.",
"arxiv_id": "physics/0510276",
"authors": [
"E. Alvarez-Lacalle",
"B. Dorow",
"J. -P. Eckmann",
"E. Moses"
],
"categories": [
"physics.soc-ph",
"cond-mat.stat-mech"
],
"title": "A quantitative analysis of concepts and semantic structure in written language: Long range correlations in dynamics of texts",
"url": "https://arxiv.org/abs/physics/0510276"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "31bad076-f97a-46e4-80ee-f9dfbec2df72",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}