Annotation: dorsal/arxiv

Authors	Y. -h. Taguchi, Y. Oono
Categories	q-bio.GN q-bio.CB
ArXiv ID	q-bio/0407037
URL	https://arxiv.org/abs/q-bio/0407037
DOI	10.1093/bioinformatics/bti067

Authors

Y. -h. Taguchi, Y. Oono

Abstract

Motivation:Microarray experiments result in large scale data sets that require extensive mining and refining to extract useful information. We demonstrate the usefulness of (nonmetric) multidimensional scaling (MDS) method in analyzing a large number of genes. Applying MDS to the microarray data is certainly not new, but the existing works are all on small numbers (< 100) of points to be analyzed. We have been developing an efficient novel algorithm for nonmetric multidimensional scaling (nMDS) analysis for very large data sets as a maximally unsupervised data mining device. We wish to demonstrate its usefulness in the context of bioinformatics (unraveling relational patterns among genes from time series data in this paper). Results: The Pearson correlation coefficient with its sign flipped is used to measure the dissimilarity of the gene activities in transcriptional response of cell-cycle-synchronized human fibroblasts to serum [Iyer {\it et al}., Science {\bf 283}, 83 (1999)]. These dissimilarity data have been analyzed with our nMDS algorithm to produce an almost circular relational pattern of the genes. The obtained pattern expresses a temporal order in the data in this example; the temporal expression pattern of the genes rotates along this circular arrangement and is related to the cell cycle. For the data we analyze in this paper we observe the following. If an appropriate preparation procedure is applied to the original data set, linear methods such as the principal component analysis (PCA) could achieve reasonable results, but without data preprocessing linear methods such as PCA cannot achieve a useful picture. Furthermore, even with an appropriate data preprocessing, the outcomes of linear procedures are not as clearcut as those by nMDS without preprocessing.

{ "annotation_id": "1f81c81d-ced7-4b29-8e8a-c1e27fa72af3", "date_created": "2026-03-02T18:01:31.984000Z", "date_modified": "2026-03-02T18:01:31.984000Z", "file_hash": "0ff2dfa287711386d74d6ed1397903e61ede96e4927fb4537d1965f27db1a11b", "private": false, "record": { "abstract": "Motivation:Microarray experiments result in large scale data sets that\nrequire extensive mining and refining to extract useful information. We\ndemonstrate the usefulness of (nonmetric) multidimensional scaling (MDS) method\nin analyzing a large number of genes. Applying MDS to the microarray data is\ncertainly not new, but the existing works are all on small numbers\n (\u003c 100) of points to be analyzed. We have been developing an efficient novel\nalgorithm for nonmetric multidimensional scaling (nMDS) analysis for very large\ndata sets as a maximally unsupervised data mining device. We wish to\ndemonstrate its usefulness in the context of bioinformatics (unraveling\nrelational patterns among genes from time series data in this paper).\n Results: The Pearson correlation coefficient with its sign flipped is used to\nmeasure the dissimilarity of the gene activities in transcriptional response of\ncell-cycle-synchronized human fibroblasts to serum [Iyer {\\it et al}., Science\n{\\bf 283}, 83 (1999)]. These dissimilarity data have been analyzed with our\nnMDS algorithm to produce an almost circular relational pattern of the genes.\nThe obtained pattern expresses a temporal order in the data in this example;\nthe temporal expression pattern of the genes rotates along this circular\narrangement and is related to the cell cycle. For the data we analyze in this\npaper we observe the following. If an appropriate preparation procedure is\napplied to the original data set, linear methods such as the principal\ncomponent analysis (PCA) could achieve reasonable results, but without data\npreprocessing linear methods such as PCA cannot achieve a useful picture.\nFurthermore, even with an appropriate data preprocessing, the outcomes of\nlinear procedures are not as clearcut as those by nMDS without preprocessing.", "arxiv_id": "q-bio/0407037", "authors": [ "Y. -h. Taguchi", "Y. Oono" ], "categories": [ "q-bio.GN", "q-bio.CB" ], "doi": "10.1093/bioinformatics/bti067", "title": "Relational patterns of gene expression via nonmetric multidimensional scaling analysis", "url": "https://arxiv.org/abs/q-bio/0407037" }, "schema_id": "dorsal/arxiv", "source": { "execution_id": "643d6de3-e33b-4fc7-82eb-478d358b8074", "id": "arXiv Dataset IDs", "type": "Model", "variant": "snapshot-2026-03-01", "version": "0.1.0" }, "user_id": 1000002 }