dorsal/arxiv
View SchemaEffective Sample Size: Quick Estimation of the Effect of Related Samples in Genetic Case-Control Association Analyses
| Authors | Yaning Yang, Elaine F. Remmers, Chukwuma B. Ogunwole, Daniel L. Kastner, Peter K. Gregersen, Wentian Li |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0611093 |
| URL | https://arxiv.org/abs/q-bio/0611093 |
| DOI | 10.1016/j.compbiolchem.2010.12.006 |
| Journal | Computational Biology and Chemistry, 35(1):40-49 (2011) |
| License | http://arxiv.org/licenses/nonexclusive-distrib/1.0/ |
Abstract
Affected relatives are essential for pedigree linkage analysis, however, they cause a violation of the independent sample assumption in case-control association studies. To avoid the correlation between samples, a common practice is to take only one affected sample per pedigree in association analysis. Although several methods exist in handling correlated samples, they are still not widely used in part because these are not easily implemented, or because they are not widely known. We advocate the effective sample size method as a simple and accessible approach for case-control association analysis with correlated samples. This method modifies the chi-square test statistic, p-value, and 95% confidence interval of the odds-ratio by replacing the apparent number of allele or genotype counts with the effective ones in the standard formula, without the need for specialized computer programs. We present a simple formula for calculating effective sample size for many types of relative pairs and relative sets. For allele frequency estimation, the effective sample size method captures the variance inflation exactly. For genotype frequency, simulations showed that effective sample size provides a satisfactory approximation. A gene which is previously identified as a type 1 diabetes susceptibility locus, the interferon-induced helicase gene (IFIH1), is shown to be significantly associated with rheumatoid arthritis when the effective sample size method is applied. This significant association is not established if only one affected sib per pedigree were used in the association analysis. Relationship between the effective sample size method and other methods -- the generalized estimation equation, variance of eigenvalues for correlation matrices, and genomic controls -- are discussed.
{
"annotation_id": "f240799c-05d5-42e7-8abf-7dc370957ec9",
"date_created": "2026-03-02T18:01:34.736000Z",
"date_modified": "2026-03-02T18:01:34.736000Z",
"file_hash": "7c575cd23886c47e3b4f6af7965a05ebf6f485bebb97b540d4efd5d07e861509",
"private": false,
"record": {
"abstract": "Affected relatives are essential for pedigree linkage analysis, however, they\ncause a violation of the independent sample assumption in case-control\nassociation studies. To avoid the correlation between samples, a common\npractice is to take only one affected sample per pedigree in association\nanalysis. Although several methods exist in handling correlated samples, they\nare still not widely used in part because these are not easily implemented, or\nbecause they are not widely known. We advocate the effective sample size method\nas a simple and accessible approach for case-control association analysis with\ncorrelated samples. This method modifies the chi-square test statistic,\np-value, and 95% confidence interval of the odds-ratio by replacing the\napparent number of allele or genotype counts with the effective ones in the\nstandard formula, without the need for specialized computer programs. We\npresent a simple formula for calculating effective sample size for many types\nof relative pairs and relative sets. For allele frequency estimation, the\neffective sample size method captures the variance inflation exactly. For\ngenotype frequency, simulations showed that effective sample size provides a\nsatisfactory approximation. A gene which is previously identified as a type 1\ndiabetes susceptibility locus, the interferon-induced helicase gene (IFIH1), is\nshown to be significantly associated with rheumatoid arthritis when the\neffective sample size method is applied. This significant association is not\nestablished if only one affected sib per pedigree were used in the association\nanalysis. Relationship between the effective sample size method and other\nmethods -- the generalized estimation equation, variance of eigenvalues for\ncorrelation matrices, and genomic controls -- are discussed.",
"arxiv_id": "q-bio/0611093",
"authors": [
"Yaning Yang",
"Elaine F. Remmers",
"Chukwuma B. Ogunwole",
"Daniel L. Kastner",
"Peter K. Gregersen",
"Wentian Li"
],
"categories": [
"q-bio.QM"
],
"doi": "10.1016/j.compbiolchem.2010.12.006",
"journal_ref": "Computational Biology and Chemistry, 35(1):40-49 (2011)",
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"title": "Effective Sample Size: Quick Estimation of the Effect of Related Samples in Genetic Case-Control Association Analyses",
"url": "https://arxiv.org/abs/q-bio/0611093"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "b24b69e9-c576-4f1e-9020-06db43154513",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}