![]() This is especially true in statistical genetics, microarray analysis and the broad and rapidly expanding area of -omics studies. Random forest (RF) and related methods such as conditional inference forest (CIF) are both tree-building methods that have been found increasingly successful in bioinformatics applications. On the other hand, we show examples where this increased importance may result in spurious signals. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. Unconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Scaled VIMs were clearly biased under H A and H 0. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under H A and was unbiased under H 0. ![]() In the case when both predictor correlation was present and predictors were associated with the outcome (H A), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H 0) the unconditional RF VIM was unbiased. We present an extended simulation study to synthesize results. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |