Big Data is usually defined in terms of Volume, Variety and Velocity (the so called 3 Vs). Volume implies breadth and depth, while variety is simply the nature of the beast: on-line transactions, tweets, text, video, sound, ... Velocity, on the other hand, implies that data is being produced amazingly fast (according to IBM, 90% of the data that exists today was generated in the last 2 years), but that it also gets old pretty fast. In fact, a few data varieties tend to age quicker than others.
To be able to tackle Big Data, systems and platforms need to be robust, scalable, and agile.
It is in this context that IntelliFest 2012 came to be. The conference theme this year was "Intelligence in the Cloud", exploring the use of applied AI in cloud computing, mobile apps, Big Data, and many other application areas. Among several amazing speakers at Intellifest were Stephen Grossberg from Boston University, Rajat Monga from Google, Carlos Serrano-Morales from Sparkling Logic, Paul Vincent from TIBCO, and Alex Guazzelli from Zementis.
Dr. Alex Guazzelli's talk on Big Data, Predictive Analytics, and PMML is now available for on-demand viewing on YouTube. The abstract follows below, together with several resources including the presentation slides and files used in the live demo.
Predictive analytics has been used for many years to learn patterns from historical data to literally predict the future. Well known techniques include neural networks, decision trees, and regression models. Although these techniques have been applied to a myriad of problems, the advent of big data, cost-efficient processing power, and open standards have propelled predictive analytics to new heights.
Big data involves large amounts of structured and unstructured data that are captured from people (e.g., on-line transactions, tweets, ... ) as well as sensors (e.g., GPS signals in mobile devices). With big data, companies can now start to assemble a 360 degree view of their customers and processes. Luckily, powerful and cost-efficient computing platforms such as the cloud and Hadoop are here to address the processing requirements imposed by the combination of big data and predictive analytics.
But, creating predictive solutions is just part of the equation. Once built, they need to be transitioned to the operational environment where they are actually put to use. In the agile world we live today, the Predictive Model Markup Language (PMML) delivers the necessary representational power for solutions to be quickly and easily exchanged between systems, allowing for predictions to move at the speed of business.
This talk will give an overview of the colliding worlds of big data and predictive analytics. It will do that by delving into the technologies and tools available in the market today that allow us to truly benefit from the barrage of data we are gathering at an ever-increasing pace.
- Download the presentation slides
- Download the KNIME workflow used to generate a sample neural network for predicting churn
- Download the PMML file created during the demo