dorsal/arxiv
View SchemaA mathematical theory of citing
| Authors | M. V. Simkin, V. P. Roychowdhury |
|---|---|
| Categories | |
| ArXiv ID | physics/0504094 |
| URL | https://arxiv.org/abs/physics/0504094 |
| Journal | Journal of the American Society for Information Science and Technology, 58(11):1661--1673, 2007 |
Abstract
Recently we proposed a model in which when a scientist writes a manuscript, he picks up several random papers, cites them and also copies a fraction of their references (cond-mat/0305150). The model was stimulated by our discovery that a majority of scientific citations are copied from the lists of references used in other papers (cond-mat/0212043). It accounted quantitatively for several properties of empirically observed distribution of citations. However, important features, such as power-law distribution of citations to papers published during the same year and the fact that the average rate of citing decreases with aging of a paper, were not accounted for by that model. Here we propose a modified model: when a scientist writes a manuscript, he picks up several random recent papers, cites them and also copies some of their references. The difference with the original model is the word recent. We solve the model using methods of the theory of branching processes, and find that it can explain the aforementioned features of citation distribution, which our original model couldn't account for. The model can also explain "sleeping beauties in science", i.e., papers that are little cited for a decade or so, and later "awake" and get a lot of citations. Although much can be understood from purely random models, we find that to obtain a good quantitative agreement with empirical citation data one must introduce Darwinian fitness parameter for the papers.
{
"annotation_id": "c207c7df-4046-4feb-adfe-a56e670d024f",
"date_created": "2026-03-02T18:00:56.314000Z",
"date_modified": "2026-03-02T18:00:56.314000Z",
"file_hash": "b839fbfb4fc54ea817ce1bc9dec635bdf8374a7d523e2091303541a57ad5c445",
"private": false,
"record": {
"abstract": "Recently we proposed a model in which when a scientist writes a manuscript,\nhe picks up several random papers, cites them and also copies a fraction of\ntheir references (cond-mat/0305150). The model was stimulated by our discovery\nthat a majority of scientific citations are copied from the lists of references\nused in other papers (cond-mat/0212043). It accounted quantitatively for\nseveral properties of empirically observed distribution of citations. However,\nimportant features, such as power-law distribution of citations to papers\npublished during the same year and the fact that the average rate of citing\ndecreases with aging of a paper, were not accounted for by that model. Here we\npropose a modified model: when a scientist writes a manuscript, he picks up\nseveral random recent papers, cites them and also copies some of their\nreferences. The difference with the original model is the word recent. We solve\nthe model using methods of the theory of branching processes, and find that it\ncan explain the aforementioned features of citation distribution, which our\noriginal model couldn\u0027t account for. The model can also explain \"sleeping\nbeauties in science\", i.e., papers that are little cited for a decade or so,\nand later \"awake\" and get a lot of citations. Although much can be understood\nfrom purely random models, we find that to obtain a good quantitative agreement\nwith empirical citation data one must introduce Darwinian fitness parameter for\nthe papers.",
"arxiv_id": "physics/0504094",
"authors": [
"M. V. Simkin",
"V. P. Roychowdhury"
],
"categories": [
"physics.soc-ph",
"cond-mat.dis-nn",
"math.PR"
],
"journal_ref": "Journal of the American Society for Information Science and\n Technology, 58(11):1661--1673, 2007",
"title": "A mathematical theory of citing",
"url": "https://arxiv.org/abs/physics/0504094"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "9b51a388-92a0-411e-b926-15fb5637db19",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}