Follow

Announcing Support for PMML 4.2

PMML 4.2 is out! That's really great. The DMG (Data Mining Group) has been working on this new version of PMML for over two years now. And, we can truly say, it is the best PMML ever! If you haven't seen the press release for the new version, please see posting below:
http://www.kdnuggets.com/2014/02/data-mining-group-pmml-v42-predictive-modeling-standard.html



What changed?


PMML is a very mature language. And so, there isn't really dramatic changes in the language at this point. One noteworthy change is that old PMML used to call the target field on a predictive model "predicted". This was confusing since a predicted field is usually the result of scoring or executing a model. The score so to speak. Well, PMML 4.2 clears things up a bit. The target field is now simply "target". A small change, but a huge step towards making it clear that the Output element is where the predicted outputs should be defined.

 

Continuous Inputs for Naive Bayes Models


This is a great new enhancement to the NaiveBayes model element. We wrote an entire paper about this new feature and presented it at the KDD 2013 PMML Workshop. If you use Naive Bayes models, you should definitely take a look at our article.

 

 

And, now you can benefit from actually having our proposed changes in PMML itself! This is really remarkable and we are all already benefiting from it. The Zementis Py2PMML (Python to PMML) Converter uses the proposed changes to convert Gaussian Naive Bayes models from scikit-learn to PMML.

 

 

Complex Point Allocation for Scorecards


The Scorecard model element was introduced to PMML in version 4.1. It was a good element then, but it is really great now in PMML 4.2. We added to it a way for computing complex values for the allocation of points for an attribute (under a certain characteristic) through the use of expressions. That means, you can use input or derived values to derive the actual value for the points. Very cool! 

 

Andy Flint (FICO) and Alex Guazzelli (Zementis) wrote a paper about the Scorecard element for the KDD 2011 PMML Workshop. So, if you haven't seen it yet, it will get you started into how to use PMML to represent scorecards and reason codes.

 

 

Revised Output Element


The output element was completely revised. It is much simpler to use. With PMML 4.2, you have direct access to all the model outputs + all post-processing directly from the attribute "feature".

 

The attribute segmentId also allows users to output particular fields from segments in a multiple model scenario. 

 

The newly revised output element spells flexibility. It allows you to get what you need out of your predictive solutions.

 

For a complete list of all the changes in PMML 4.2 (small and large), see:

 

 

What is new? Text Mining!


PMML 4.2 introduces the use of regular expressions to PMML. This is solely so that users can process text more efficiently. The most straightforward additions are simple: 3 new built-in functions for concatenating, replacing and matching strings using regular expressions.

 

The more elaborate addition is the incorporation of a brand new transformation element in PMML to extract term frequencies from text. The ideas for this element were presented at the KDD 2013 PMML Workshop by Benjamin De Boe, Misha Bouzinier, Dirk Van Hyfte (InterSystems). Their paper is a great resource for finding out the details behind the ideas that led to the new text mining element in PMML. 

 

 

Obviously, the changes described above are also new, but it was nice to break the news into two pieces. For the grand-finale though, nothing better than taking a look at PMML 4.2 itself. 

 

 

 

ADAPA and PMML 4.2

 

Zementis has been offering its ADAPA scoring engine as a service on the Amazon Cloud for a few years now. With ADAPA on the Amazon Cloud, companies all over the world benefit from fast deployment and execution of predictive analytics via Web-services and PMML, the Predictive Model Markup Language. You can even launch your own ADAPA instance in the cloud through the AWS Marketplace with a single click.



ADAPA and its sister product, the Universal PMML Plug-in (UPPI) are PMML-based scoring engines. That is, they can consume predictive models built in any data mining tool as long as the model is represented in PMML, the Predictive Model Markup Language standard. PMML is supported by most commercial and open-source data mining tools, including FICO, IBM SPSS, KNIME, RapidMiner, R, SAS, and SAP. With PMML, one can simply move a predictive model from the scientist's desktop where it was built to the IT operational environment with no need for custom code.

Zementis was the first company to announce compatibility with PMML 4.2, the latest version of the PMML standard. PMML 4.2 introduces extensive text mining capabilities into the standard and now Zementis is bringing these exciting new PMML features to its AWS customers.

It is really super simple to deploy and score your models using ADAPA. And now, with PMML 4.2 support on the Amazon Cloud, predictive analytics as a service has just become amazingly powerful.

Visit the Zementis website for details

0 Comments

Article is closed for comments.
Powered by Zendesk