Model verification includes the successful uploading of your model in ADAPA followed by the score-matching test. The sections below cover errors and warnings during the uploading process as well as model validation.
1) Model Upload - Errors
Before successfully uploading a PMML file, ADAPA will make sure that it is a valid PMML file. During this phase, you may get errors and warnings. All errors need to be resolved before ADAPA successfully uploads a model. The figure below shows the kind of feedback you get through the ADAPA Console whenever the PMML file contains invalid XML.
If your model is valid XML but not a valid model according to the PMML schema, ADAPA will evaluate the severity of the problem and output errors or warnings. If you get errors, ADAPA will not upload your file. See figure below.
To find out what errors ADAPA encountered in your PMML file, try clicking on the "Details" button and open the annotated file in an XML editor. Errors are displayed as embedded comments in your model file. Use the information you get back from ADAPA to correct your PMML file and give it another try.
If you get warnings instead, see "Model Upload - Warnings" below.
Note that besides validating your PMML file syntactically and semantically, ADAPA is also capable of converting older versions of PMML to its latest, version 4.2. It will also automatically correct any known issues with PMML code from different vendors.
If you use a PMML element not currently supported by ADAPA, feel free to let us know. You can find our contact information in the Zementis website, contacts page.
2) Model Upload - Warnings
Warnings are generated whenever ADAPA finds inconsistencies in the PMML file which do not affect scoring. However, you should check all warnings even if you get a perfect score match (see "Model Verification Test" below), i.e. your model validates fine. If your model is uploaded with warnings, ADAPA will let you know as shown in the figure below.
To see the warnings, you can click on the "Details" button. You can also click on the yellow flag icon itself which is positioned next to the model name in the table of available models in the ADAPA Console. A yellow flag means that the model was uploaded with warnings. A green flag means that the model was uploaded without warnings or errors. A red flag indicates that the model has an error and cannot be used for scoring. The figure below shows three models in the table of available models. Two have warnings as indicated by the yellow flags next to their names.
Once you click on the yellow flag, you will get the annotated PMML model file back. By opening this file in your XML editor, you will be able to see the model you just uploaded with all detected warnings embedded into it as comments. ADAPA will also generate a comment on the top of the file giving a summary of its findings. For example, if three warnings are detected, it will read as follows:
(Comment generated by ADAPA) There are at least 3 warnings in this PMML document.
Detailed information can be found as comments embedded in the appropriate locations within this document.
3) Model Verification Test (Score Matching)
Given that you built your model outside of ADAPA, you want to make sure that both ADAPA and your development environment produce exactly the same results.
ADAPA provides an integrated testing process to make sure your model was uploaded and works as expected. It allows for a test file containing from 1 to thousands of records with all the necessary input variables and the expected result for each record to be uploaded for score matching (click HERE to find out how to format your data file).
Given that the models have been built in a tool other than ADAPA, we want to make sure that both development tool and ADAPA produce the same results. This is done by supplying ADAPA with the expected results for a number of input records. When this happens, ADAPA will automatically compare the given expected value with its own computed value. If the both values match for all records (and given enough validation records), we can feel confident that ADAPA has uploaded the model correctly. When this happens, there is no longer the need to supply ADAPA with the expected results, since all we really want from now on is to get the computed results back.
Score-matching can easily be done through the ADAPA Console. After processing the file, ADAPA returns statistics on total amount of matched and unmatched records, percentages, etc (see figure below). If any records failed the matching test, a complete list of all failed records is displayed. One can then peer through computed information for each record to locate where expected and computed values differed and thus pinpoint the source of the problem (see below).
PMML also offers a Model Verification element for similar testing purposes. In this way, verification records are part of the PMML file itself. The PMML "ModelVerification" element has been integrated into ADAPA as of release 3.0. In so doing, ADAPA users have more than one way to test their models.
The verification test may fail because:
- The model ADAPA loaded and executed is different than the model you built in your development environment. This may reflect a problem with ADAPA or see below.
- The PMML file you got out of your model development environment does not really represent all aspects of the model or is problematic semantically speaking.
In both cases, you can try to follow ADAPA's computations by clicking on the file icon next to the row id for the record you want to look at in the ADAPA Console (shown in the Figure above). By clicking on the file icon for a particular record, you will be able to download a text file containing a log of all the computations performed by ADAPA for that record. This may be very helpful in determining why ADAPA generated the value(s) it did.
Also, the problem may have to do with your data validation file itself. It may be the case that you generated your model in IBM SPSS Statistics, for example, exported it as a PMML file and uploaded it into ADAPA. So far so good, but how about the data? If you saved your data in SPSS as well, you have to make sure you saved the expected value or prediction with the correct name. SPSS usually calls this value "PRE_1." You will need to change the name of this variable to the name of the predicted variable defined in the PMML file. Also, if your data contains the original target used to build the model, you will need to rename it to something different than the predicted variable. Your new predicted variable now should be the predicted result you got out of SPSS or any model development environment you used to score the data in the first place.