dorsal/arxiv
View SchemaFinding Sequence Features in Tissue-specific Sequences
| Authors | Arvind Rao, Alfred O. Hero III, David J. States, James Douglas Engel |
|---|---|
| Categories | |
| ArXiv ID | q-bio/0702022 |
| URL | https://arxiv.org/abs/q-bio/0702022 |
Abstract
The discovery of motifs underlying gene expression is a challenging one. Some of these motifs are known transcription factors, but sequence inspection often provides valuable clues, even discovery of novel motifs with uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes, such as development and disease progression. In this work, we present an approach to the principled selection of motifs (not necessarily transcription factor sites) and examine its application to several questions in current bioinformatics research. There are two main contributions of this work: Firstly, we introduce a new metric for variable selection during classification, and secondly, we investigate a problem of finding specific sequence motifs that underlie tissue specific gene expression. In conjunction with the SVM classifier we find these motifs and discover several novel motifs which have not yet been attributed with any particular functional role (eg: TFBS binding motifs). We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue specific regulatory potential of any conserved sequence element identified from genome-wide studies. Finally, we propose the utility of this developed framework to not only aid discovery of discriminatory motifs, but also to examine the role of any motif of choice in co-regulation or co-expression of gene groups.
{
"annotation_id": "951a318a-ce14-4253-80b1-e5a21de2b945",
"date_created": "2026-03-02T18:01:35.720000Z",
"date_modified": "2026-03-02T18:01:35.720000Z",
"file_hash": "8de7a6aee1c897fd692102cf301f0872dc76e8c6cf42ae9eab34b64deb04fd59",
"private": false,
"record": {
"abstract": "The discovery of motifs underlying gene expression is a challenging one. Some\nof these motifs are known transcription factors, but sequence inspection often\nprovides valuable clues, even discovery of novel motifs with uncharacterized\nfunction in gene expression. Coupled with the complexity underlying\ntissue-specific gene expression, there are several motifs that are putatively\nresponsible for expression in a certain cell type. This has important\nimplications in understanding fundamental biological processes, such as\ndevelopment and disease progression. In this work, we present an approach to\nthe principled selection of motifs (not necessarily transcription factor sites)\nand examine its application to several questions in current bioinformatics\nresearch.\n There are two main contributions of this work: Firstly, we introduce a new\nmetric for variable selection during classification, and secondly, we\ninvestigate a problem of finding specific sequence motifs that underlie tissue\nspecific gene expression. In conjunction with the SVM classifier we find these\nmotifs and discover several novel motifs which have not yet been attributed\nwith any particular functional role (eg: TFBS binding motifs). We hypothesize\nthat the discovery of these motifs would enable the large-scale investigation\nfor the tissue specific regulatory potential of any conserved sequence element\nidentified from genome-wide studies.\n Finally, we propose the utility of this developed framework to not only aid\ndiscovery of discriminatory motifs, but also to examine the role of any motif\nof choice in co-regulation or co-expression of gene groups.",
"arxiv_id": "q-bio/0702022",
"authors": [
"Arvind Rao",
"Alfred O. Hero III",
"David J. States",
"James Douglas Engel"
],
"categories": [
"q-bio.GN"
],
"title": "Finding Sequence Features in Tissue-specific Sequences",
"url": "https://arxiv.org/abs/q-bio/0702022"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "0239783f-678c-4c94-98ae-fb6c83cd7a3d",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}