Machine learning is so pervasive that we can often assume its presence in big data applications without having to specifically call it out. About a year ago, I blogged about the "hardcore" big data use cases -- in other words, the applications that deliver the best results at "extreme scales." By the latter, I was referring to any combination of petabyte data volumes, real-time data velocities, and/or multistructured data varieties.
When compiling the list of applications in that article, I deliberately avoided listing "machine learning analytics." The reason why: Machine learning is a tool used in many, if not most, of these analytic use cases, but it's not a use case in itself -- in other words, it's not a specific application domain in its own right. For that same reason, I didn't list schema design, metadata management, or data integration as big data use cases. As with machine learning, all of these contribute in varying degrees to realizing value for most big data analytic applications.
Machine learning's contribution to big data application ROI is twofold: boosting data scientist productivity and uncovering hidden patterns that even the best data scientists may have overlooked. These value points derive from machine learning's core function: enabling analytic algorithms to learn from fresh feeds of data without constant human intervention and without explicit programming. The approach allows data scientists to train a model on an example data set, then leverage algorithms that automatically generalize and learn both from that example and from fresh data feeds.
In many ways, machine learning can be the ROI capstone of your big data initiative. Your investment in machine learning can help deepen whatever business case you've made for big data in the enterprise. That's because machine-learning algorithms grow even more effective at your data scales in volume, velocity, and variety. As such, it's another example of, per my discussion in this recent article, how big data's bigness can be its core driver of value.
As Mark van Rijmenam says in this recent article on machine learning: "The more data is processed, the better the algorithm will become." Many of the machine-learning applications that he discusses -- ranging from speech and facial recognition to clickstream processing, search-engine optimization, and recommendation engines -- might be described as "sense-making analytics" (which, now that I think of it, I should have included in my list of hardcore big data applications).
Sense-making analytics involves continuous monitoring of feeds whose semantic patterns, context, and importance must be inferred from the stream. In support of automated sense-making, machine-learning algorithms must often handle feeds of daunting complexity, such as feeds that incorporate implicit semantic hierarchies among constituent objects or those environments where an overall sense must be gleaned in real time through correlation of multiple distinct streams. The streams may include various objects, such as data, video, images, speech, faces, gestures, geospatials, and browser clicks. And the sense to be auto-extracted from streams, via machine learning, may be any blend of cognitive, affective, sensory, and volitional features, per my recent discussion in this recent blog.