dorsal/arxiv
View SchemaSifting data in the real world
| Authors | Martin M. Block |
|---|---|
| Categories | |
| ArXiv ID | physics/0506010 |
| URL | https://arxiv.org/abs/physics/0506010 |
| DOI | 10.1016/j.nima.2005.10.019 |
| Journal | Nucl.Instrum.Meth. A556 (2006) 308-324 |
Abstract
In the real world, experimental data are rarely, if ever, distributed as a normal (Gaussian) distribution. As an example, a large set of data--such as the cross sections for particle scattering as a function of energy contained in the archives of the Particle Data Group--is a compendium of all published data, and hence, unscreened. Inspection of similar data sets quickly shows that, for many reasons, these data sets have many outliers--points well beyond what is expected from a normal distribution--thus ruling out the use of conventional $\chi^2$ techniques. This note suggests an adaptive algorithm that allows a phenomenologist to apply to the data sample a sieve whose mesh is coarse enough to let the background fall through, but fine enough to retain the preponderance of the signal, thus sifting the data. A prescription is given for finding a robust estimate of the best-fit model parameters in the presence of a noisy background, together with a robust estimate of the model parameter errors, as well as a determination of the goodness-of-fit of the data to the theoretical hypothesis. Extensive computer simulations are carried out to test the algorithm for both its accuracy and stability under varying background conditions.
{
"annotation_id": "b178042f-caca-43f4-92c4-9bab2f78061a",
"date_created": "2026-03-02T18:01:00.856000Z",
"date_modified": "2026-03-02T18:01:00.856000Z",
"file_hash": "53a2f46cb90224e665ab71f4234cbcf3a42ed4662813de58df8f8fd772c04481",
"private": false,
"record": {
"abstract": "In the real world, experimental data are rarely, if ever, distributed as a\nnormal (Gaussian) distribution. As an example, a large set of data--such as the\ncross sections for particle scattering as a function of energy contained in the\narchives of the Particle Data Group--is a compendium of all published data, and\nhence, unscreened. Inspection of similar data sets quickly shows that, for many\nreasons, these data sets have many outliers--points well beyond what is\nexpected from a normal distribution--thus ruling out the use of conventional\n$\\chi^2$ techniques. This note suggests an adaptive algorithm that allows a\nphenomenologist to apply to the data sample a sieve whose mesh is coarse enough\nto let the background fall through, but fine enough to retain the preponderance\nof the signal, thus sifting the data. A prescription is given for finding a\nrobust estimate of the best-fit model parameters in the presence of a noisy\nbackground, together with a robust estimate of the model parameter errors, as\nwell as a determination of the goodness-of-fit of the data to the theoretical\nhypothesis. Extensive computer simulations are carried out to test the\nalgorithm for both its accuracy and stability under varying background\nconditions.",
"arxiv_id": "physics/0506010",
"authors": [
"Martin M. Block"
],
"categories": [
"physics.data-an",
"astro-ph",
"hep-ex",
"hep-ph"
],
"doi": "10.1016/j.nima.2005.10.019",
"journal_ref": "Nucl.Instrum.Meth. A556 (2006) 308-324",
"title": "Sifting data in the real world",
"url": "https://arxiv.org/abs/physics/0506010"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "6bc96a57-8665-44b9-a833-ee2b0ca4dd97",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}