Software Notes:
This program selects differentially expressed genes in time series experiments based on the area of the region bounded by the time series expression profiles to be compared, and considers the gene differentially expressed if the area exceeds a threshold based on a model of the experimental error. This is a preliminary version of the software. Only comparison between time series (e.g. control vs treated) has been implemented. Comparison of a time series vs its baseline has not yet been implemented.
To use the program source the R script from R workspace (you can download R at http://www.r-project.org/).
Usage
SEL.TS.AREA(replicates,data1,data2,sampling.grid=NULL,takelog=FALSE,binsize=0.1,B=10000,NAcontrol=1)
Arguments
- replicates: a matrix of two coloumn containing two vectors of available replicated measurements. For example, if arrays C(t1bis), C(t2bis) represent experimental replicates of C(t1),C(t2) the first coloumn of “replicates” must contain data from C(t1),C(t2) (aligned in a single vector) and the second coloumn of “replicates” must contain data from C_t1bis, C_t2bis (aligned in a single vector)
- data1,data2: data to be compared. data1, data2 are matrix containing genes in rows and samples (time samples) in coloumns. data2 contains samples to be compared with those in data1.
- sampling.grid: time grid at which the data are sampled. Sampling grid must have length equal to the number of coloumn of data1 and data2. If time samples are equally spaced sampling grid can be set to NULL (default).
- takelog: should the log be taken or data are already inserted as log?
- binsize: size of intensity intervals to be used for error standardization
- B: number of time series genearted by resampling to calculate the null hypothesis distribution
- NAcontrol: minimum number of available samples (not NA) in a gene time serie that must be available in order the gene to be considered for selection.
Details
Replicates, data1, data2 must contain already pre-processed data (e.g normalized, filtered and not containing NA).
The algorithm works iteratively asking to the user:
1. wether using a constant or an intensity dependent model of the error (a plot of the variance of replicates1-Replicates2 vs average intensity is shown to help the decision).
2. to choose the best model among the ones fitting the null hypothesis distribution based on goodness of fit. Parameter precision and plots are given to help the choice. It’s also possible to use quantiles instead of distribution models
Value
No value returned. The file “Result.txt” is saved in the workspace
Citation
If you use the software for your research, please refer to the original paper with the citation below.
Di Camillo B, Toffolo G, Nair SK, Greenlund LJ, Cobelli C. Significance analysis of microarray transcript levels in time series experiments. BMC Bioinformatics. 2007 Mar; 8(Suppl 1):S10