"Data Lake" is a proprietary term. "We have built a series of big data platforms that enable clients to inject any type of data and to secure access to individual elements of data inside the platform. We call that architecture the data lake," says Peter Guerra, Principal, Booze, Allen, Hamilton. Yet, these methods are not exclusive to Booze, Allen, Hamilton.
"I have read what's available about it," says Dr. Stefan Deutscher, Principal, IT Practice, Boston Consulting Group, speaking of the data lake; "I don't see what's new. To me, it seems like re-vetting available security concepts with a name that is more appealing." Still, the approach is gaining exposure under that name.
In fact, enterprises are showing enough interest that vendors are slapping the moniker on competing solutions. Such is the case with the Capgemini / Pivotal collaboration on the "business data lake" where the vendors are using the name to highlight the differences between the offerings.
This enterprise curiosity stems from real big data ills that need equally genuine cures. Enterprises from government agencies to large concerns and on down use big data inside public multitenant cloud environments. All the risks of mutlitenancy apply in these scenarios including the vulnerabilities that come with the weaker security of another tenant, potential access by users of an adjacent tenant, PII/PHI exposure, and other regulatory non-compliance. Data lakes could protect big data from all the perils of the public cloud.
But, while Defense agencies need the protection data lakes offer for each individual data element, the typical enterprise does not. Nor can most enterprises afford the performance hit that comes with using data lakes in this way. That's why some vendors are using data lakes to protect the whole of big data rather than each piece while also avoiding the performance lag of the former approach. Enterprises in the market for solutions to security challenges that come with the public cloud should consider one or both data lake approaches.
Securing data elements
"The overarching concept is the ability to pull in different types of data, tag that data, and enable users and administrators to secure the individual data elements within the data lake," says Guerra. Rather than deidentifying PII/PHI and providing data privacy on the whole, this data lake approach determines what pieces of data are sensitive and what pieces are not and works from there.
"We like to bring all the data into the data lake in its rawest format," says Guerra; "we don't do any extraction or transformation of data ahead of time." Instead, this approach tags each data element with a set of metadata tags and attributes that describes the data and how the IAM systems that access it should handle it.