DN (Asim S. Siddiqui, 2006) Sequence biases in large scale gene expression profiling data @NAR #20060902
CP Nucleic Acids Res. 2006 Jul 13;34(12):e83.
TI Sequence biases in large scale gene expression profiling data.
AU Siddiqui AS, Delaney AD, Schnerch A, Griffith OL, Jones SJ, Marra MA.
IN Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada.
AB We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, ‘Classic’ Massively Parallel Signature Sequencing (MPSS) and ‘Signature’ MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).
PM PMID: 16840527 [PubMed – indexed for MEDLINE]
CA Marco A. Marra, Email: [email protected] from Genome Sciences Centre, Suite 100, 570 West 7th Avenue, Vancouver BC, Canada V5Z 4S6. Tel: 604 877 6082; Fax: 604 877 6085;
1. Comparison of Expression Measurment between 1 hybridization-based technology and 4 sequencing-based method [G+C Contents]
a. 1 hybridization-based technology : Affymetrix GeneChip [HGU 133]
b. 4 sequencing-based technoloy:
2. The number of genes detected using different approaches can not be explanined by sampling depth alone. [Probe number and Tag Numbers]
3. Each method exhibits a different sensitivity to the G+C content of those transcript regions deteced by it.
1. RefSeq NM Genes : Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2003) NCBI Reference Sequence project: update and current status. Nucleic Acids Res.,
SP Sentence Patterns from Paper: