How to score data using the ADAPA Add-in for Microsoft Office Excel?

Well, first things first, if you haven't already, go to the Zementis website and fill out the form to sign up for the free ADAPA Add-in for Microsoft Office Excel 2007 and 2010. Once you do that, you will receive an e-mail from Zementis with a link to download the add-in and instructions on how to use a demo ADAPA instance to score test data. You can also use the add-in to score your own data using your private instance of ADAPA on the Cloud (Amazon EC2 or IBM SmartCloud).

If you already received the e-mail and installed the add-in in your Microsoft Office Excel (2007 or 2010), you should be able to see the "Predict with ADAPA" menu (as displayed on the left).

The ADAPA Add-in for Microsoft Office Excel is yet another way Zementis makes the process of model execution and data scoring a walk in the park. It bridges the gap between the desktop and the Cloud by combining analysis, data manipulation, and graphics of Microsoft Office Excel with the power of ADAPA.

This post describes how to score data available in the sample Microsoft Office Excel file you were provided when you signed up for the add-in. It contains three data sets that can be scored against models already pre-loaded into the ADAPA demo instance. The file is divided in 4 worksheets. The first worksheet "DataInfo" contains information about each of the data sets as well as the models used to score them. Each of the other worksheets contain a different data set. These are: IrisData, AuditData, and LoanData.

To score any of the test data sets, open the file in Microsoft Office Excel, select one of the worksheets containing the data you would like to score, say IrisData. Select "Predict with ADAPA" from the top menu. When you do that, you will see two groups of buttons.

The first group "Score Your Data" contains all you need for scoring. The second group "Information" contains two buttons: 1) "Support Blog" links to our support blog; and 2) "About Zementis" links to the Zementis website.

Now, you just need to follow the 4 simple steps below for effective data scoring on your desktop.

Setup Connection

Go ahead and click on "Setup Connection" (the first button in the first group). This action will bring up a dialog box asking you to enter information about your ADAPA instance.

If you check "Use Demo", the URL field will automatically be filled with the ADAPA demo instance URL. You still need to enter the "User Email" and "Password" which are part of the e-mail you received from Zementis. The ADAPA demo instance comes with three predictive models: IrisMLRModel, AuditSVMModel, and LoanNNModel which can be used to score the three test data sets: IrisData, AuditData, and LoanData, respectively.

Test Connection

It is now time to test the connection to make sure the setup is correct. Click on the button "Test". You should see a dialog box with a message saying that the connection was setup successfully and that 3 models were found.

Given that you have the Iris data set in front of you (note that it does not have the predictive value "class" which we want to predict nor the class probabilities), make sure you select the data you want to score (selecting the appropriate columns will do) and click on the second button in the first group "Apply Model".

Map Input Columns

Once you do that, the "Apply Model" dialog box is shown. Since we are scoring the Iris data set, make sure to select the IrisMLRModel from the "Model" drop-down menu. The ADAPA Add-in will automatically pair "Table Columns" with "Model Input Fields" based on their names. You can now review (or select the most appropriate) mappings between table columns and expected model inputs.

Data Scoring

Once you are satisfied with the mappings, click on "Score" ... et voila' ... after processing, ADAPA creates extra columns, which are appended to the original data, containing the output fields. For the Iris data set, this entails the class of Iris plant with the highest probability as well as the probability associated to each of the three possible classes (setosa, virginica, and versicolor).

Feel free to score the other two datasets in the sample file ... and then to try your own data and models. Make sure your data is formatted as a table before attempting to score it.


Article is closed for comments.
Powered by Zendesk