Design and Analysis of DNA Micriarryay Investigations
Index
balanced block design, 19–20
loop design, 20–21
models for, 92–94
Normalization
array, see Array normalization
quantile, 63
Normalization factor, 55, 62
Normalized signal log value, 62
Null hypothesis, 67
Null models for global test of clustering,
148–149
Number of samples, 23–26
Oligonucleotides, 5
Overfitting, 96, 108
Paired-specimen data, 73–75
Paired t-statistic, 73
Pairing samples, 17–21
Parametric tests, 68
Partial least squares analysis, 97–98
Partitional clustering methods, 131, 138
Patch, 31
Pathway analysis, 13
Pearson correlation, 123
Permutation F-test, 72–73
Permutation paired t-test, 74
Permutation tests, 68–71
multivariate, 26, 77–80
Permutation t-test, 69–70
Perou breast data, 166–167
analysis of, 74–81, 83–84, 178–182
Photomultiplier tube, 6
Photomultiplier tube (PMT) detector,
29–30
Pixel intensity, median, 32
Pixels, 6
saturated, 30, 48
Plaid model of Lazzeroni and Owen,
146
PMT (photomultiplier tube) detector,
29–30
Poly-A tail, 161
Pooled-variance formula, 67
Pooling of samples, 16–17
Post hoc comparisons, 72
Prediction accuracy, 26–27, 108–113
Principal components, 126
Principal components analysis,
125–128
Printed DNA microarrays, 7–9
Printers, robotic, 7
Probe-level quality control, 40–44
Probe pairs, 9
Probes, 5
Profile plot, 143
Prognostic index, 118–119
Prognostic prediction, 95, 118–119
Proportion of variance explained, 126
Proportional hazards regression model,
91
Proteins, 157–158
p-values, 49, 67
adjusted, 80
two-sided, 67
Quadratic discriminant analysis, 100
Quality control, 39–52
array-level, 47–48
for GeneChip arrays, 48–50
gene level, 44–47
probe-level, 40–44
Quantile normalization, 63
Randomized variance model, 86
Rank-based multidimensional scaling
methods, 131
Red-green-blue (RGB) image, 30
Reference design, 17–19
References, 185–194
Reference sample, 19
Regional background correction, 33
Regression model analysis, 90–91
Relative hybridization intensity, 17
Replicates, number needed, 23–27, 66
Replication, level of, 15
Reproducibility
of DNA microarrays, 16
of individual clusters, assessing,
152–155
Resubstitution estimate
bias of, 108–109
calculated, 109
Reverse labeling, 21–23
RGB (red-green-blue) image, 30
Ribosomes, 162
RNA molecules, 158–163
GeneCluster software, 143
Gene expression, 159
biology of, 157–163
Gene expression datasets, 165–168
Gene level quality control, 44–47
Gene-level summaries, 36
Genes
low variance, 46–47, 76
nondifferentially expressed, 46
Gene shaving method, 146
GeneSOM package, 143
Global array normalization, 56, 61–62
Global background correction, 33
Global tests
class comparison, 86–88
clustering, 148–150
Golub’s weighted vote method, 101–102
Graphical displays, 125–131
Gridding, 30
Hazard, 91
Heatmap, see color image plots
Hedenfalk breast cancer data, 168
analysis of, 182–184
Hierarchical clustering, 131–138
Hierarchical model, 85–86
Histogram segmentation, 32
Hotelling’s T2-test, 87
Housekeeping genes, 53–55, 89
Hybridization intensity, relative, 17
Hypothesis testing, 11, 67
Image analysis, 29–38
for Affymetrix GeneChip arrays,
35–38
for cDNA microarrays, 30–34
spots flagged at, 40–41
Image display, 30
Image file, 6, 29
visual inspection of, 40
Image generation, 29–30
Image output file, 34
Informative genes, 101
Intensity-based array normalization,
57–59, 62–64
Introns, 161
Jonckheere test, 73
Kaplan-Meier survival curves, 119
k-means clustering, 133, 138–141
Kruskal–Wallis test, 72
Labeling, reverse, 21–23
Labeling methods, 6–7
Label intensity, measuring, 5–6
Learning rate, 142
Leave-out-one cross-validation, 110–111
Linear array normalization, 56, 61–62
Linear discriminant analysis, 98–101
Linkage methods, 132
Local background estimation, 34
Location-based array normalization,
59–61
Loess normalization, 57–59
Loop design, 20–21, 92–93
Low variance genes, 46–47, 76
Luo prostate data, 166
analysis of, 68–71, 76
Mahalanobis distance, 123
Manhattan distance, 123
M-A plots, 57
Mean-pixel intensity, 32
Median centering, 124–125
Median pixel intensity, 32
Misclassification error rate, 26, 108–114
Missing data, 69–71, 97
Mixed model, 94
Mixture model, 145
Model-based clustering, 145
Morphological opening, 33
mRNA transcripts, 5, 163
Multidimensional scaling, 125–131
nonmetric, 131
Multiple comparisons problem, 75
Multivariate Gaussian probability
density, 100
Multivariate permutation methods, 26,
77–80
stepwise, 80–81
Multivariate regression models, 91
Nearest neighbor classification, 103–104
Noise, 39
Nondifferentially expressed genes, 46
Nonmetric multidimensional scaling,
131
Nonreference designs




