skip to main content

UPDATED | Social Sciences Brown Bag Seminar

Wednesday, June 3, 2015
12:00pm to 1:00pm
Add to Cal
Baxter B125
Fuzzy Forests: Variable selection among correlated predictors when p >> n
Christina Ramirez, Visiting Associate in Economics, Division of the Humanities and Social Sciences, Caltech,

Fuzzy Forests is a new machine learning algorithm for ranking variable importance of features in high-dimensional classification and regression problems where there is high correlation among the predictors and p >>n.  Fuzzy Forests borrows from the strength of Weighted Gene Co-Expression Network Analysis (WGCNA) to form modules of high correlated features.  The resulting clusters are relatively independent from each other.  Recursive feature elimination Random Forests is then used to sieve the variables until the user is given the set of k variables that are the most important in terms of prediction of the outcome.  Simulations and real-world examples show excellent performance of Fuzzy Forests as well as the added bonus of slightly better prediction than Random Forests.  Applications from HIV immunology show important variables in predicting elite control of HIV.  These variables selected in silico have been shown to have a biologic basis for elite control and are being validate in vivo with follow up cohorts.

For more information, please contact Sheryl Cobb by phone at 626-395-4220 or by email at [email protected].