Ensemble models implement the idea that a combination of many simple models is just as good a predictor, maybe even better, than a single complicated model. The advantage of this principle is that models become simpler to make since less manual effort is required to analyze the intricacies of the data set. With sufficient training data, the ensemble should automatically account for such dependencies and arrive at a better aggregate result.
Given the ease of training ensemble models combined with improved predictive accuracy, ensembles are rapidly gaining in popularity and have been applied in many industries. To benefit from more precise predictions, however, we must be able to apply such models despite their higher computational complexity in various different IT systems. As a result, portability of ensemble models becomes paramount for their operational application, e.g., on Big Data or for real-time applications.
In the context of the R data mining environment, the Predictive Model Markup Language (PMML) industry standard holds the key to model portability. It allows us to decouple the data mining process from the operational execution by building the models in R, then exporting them in the PMML standard format and, finally, deploy and execute the models in any target IT environment, using a PMML-compliant scoring engine optimized for scalability and performance. In support of the PMML industry standard, Zementis maintains a set of open source PMML packages for R covering various standard algorithms as well as pre-processing and data manipulation.
To complement the popular open source PMML packages for R, Zementis now introduces several commercial R packages for PMML export which focus on optimized support for complex ensemble models. These new PMML packages especially minimize time and memory required to handle large models and either offer higher-performance alternatives to the open source PMML package or add support for other popular ensemble algorithms.
The following table lists all proprietary Zementis R PMML export packages available and details their feature set:
- applicable to R ada package
- supports the Stochastic Boosted Tree algorithm
- applicable to R C50 package
- supports the C5.0 algorithm
- creates single and boosted trees as well as boosted rulesets
- currently only binary categorical models with numeric inputs supported
- applicable to the R gbm package
- supports the Gradient Boosted Machine algorithm
- supports Bernoulli (boolean), poisson (discrete) and multinomial (categorical) distributions
- applicable to the R randomForest package
- supports the random forest algorithm
- This alternative to the open source "pmml.randomForest" function of the pmml package has been optimized to minimize the time taken and memory required so as to better handle large models
- applicable to the R rfsrc package
- supports the Random Survival Forest algorithm