pure oil perfume: Cufflink Sets

Friday, 27 July 2012

Cufflink Sets

Some users were reporting a high FAIL rate on gene and transcripts quantification. This has been resolved according to a battery of tests using real and simulated data. The root cause was that in conditions with substantial overdispersion across replicates, the FPKM variance-covariance matrices produced by the Cuffdiff model were not always positive-definite. Cuffdiff was detecting this, and marking those genes as having unreliable confidence intervals. Prior to 2.0.0, the model contained a heuristic approximation of the covariances between assigned fragment counts (which are necessary for calculating the variance on each gene's expression level), and this approximation was producing poorly conditioned matrices. We have replaced the heuristic approximation with a direct sampling approach, in effect "simulating" the assignment of fragments to each isoform many times for each gene. By simulating fragment generation and assignment to each transcript, we are reconstructing variance-covariance matrices for assigned fragment counts that are always properly condition.