When we our betters see bearing our WOEs,
We scarcely think our miseries our foes.
From King Lear
Of course, this post is not about Shakespeare or King Lear for that matter. This is about visualization and variable screening using Weight of Evidence (WOE) and Information value (IV). As I described in a post back in August, these techniques provide a powerful and simple way to do exploratory data analysis for binary classifiers – despite dating back to the 1950s. More specifically, WOE and IV analysis enable one to:
- Consider each variable’s independent contribution to the outcome.
- Detect linear and non-linear relationships.
- Rank variables in terms of “univariate” predictive strength.
- Visualize the correlations between the predictive variables and the binary outcome.
- Seamlessly compare the strength of continuous and categorical variables without creating dummy variables.
- Seamlessly handle missing values without imputation.
- Assess the predictive power of missing values.
R Package in CRAN
In the post from August I leveraged an R package called “Information” which was under development at the time. The Information package is designed to perform WOE and IV analysis for binary classification models as well as uplift models. To maximize performance, aggregations are done in data.table and creation of WOE vectors can be distributed across multiple cores.
This package is now in CRAN and has undergone bug cleansing as well as a series of cosmetic changes. Check out the Vignette, which contains a high-level description of the underlying theory as well as examples, and the PDF documentation.