According to Guerra, the IAM system enforces the security of individual data elements using XACML (Extensible Access Control Markup Language) -based rules. An administrator or system writes rules in the IAM system, which enforces those rules when a user authenticates. The system passes the user's security authorizations to the big data architecture. "The big data architecture then matches the individual security authorizations with the XACML rules and returns only the appropriate data," says Guerra.
Pros and cons
Data lakes still require role-based access, policies, and policy enforcement. "You use PKI to ensure the person is who they say they are and to bind their attributes to the platform that stores the individual data attributes to ensure that security is complete," says Guerra. The system requires policies and policy enforcement to limit and permit access based on the metadata tags and attributes. The system uses a technology that brokers the data access requests in order to enforce the security policy.
"It's very difficult to implement those systems and attribute enforcements throughout the data lake platform stack," says Guerra. But Guerra has worked closely with clients to define policies, he says.
With this kind of system, a data assailant would have to break through the perimeter security around the data lake and through the security protecting the individual data elements in order to retrieve anything. The system uses PKI to cryptographically sign and enforce security tags for the data elements. "You can't change them nor can you break them. An attacker would have to break each tag in order to gain access to all data elements," says Guerra.
However, this kind of approach requires an IAM system with attribute-based access controls (ABAC). There are a number of ABAC vendors in the market. But, system scalability and performance are still concerns with ABAC systems, according to NIST Special Publication 800-162, "Guide to Attribute Based Access Control (ABAC) Definition and Considerations" (January, 2014).
But ABAC IAM systems in an unstructured data lake work differently than existing structured systems and legacy security solutions do, says Jerry Irvine, CIO, Prescient Solutions and member of the National Cyber Security Task Force. "Access and authorization controls within the data lake are distributed across multiple categories of service and systems," says Irvine. This offsets the potential for these IAM systems to experience load and performance issues at a single point of failure.
How data lakes identify and tag data from legacy platforms is another concern. "Most applications don't provide sufficient meta-information about data they generate," says Dr. Deutscher. This can make it difficult for data lakes to know how to tag data elements with attributes.
"We've handled that a couple of ways," says Guerra. One method is to query legacy systems and apply tagged attributes to the results. Another way is to classify legacy systems as a whole. A small subset of people can read an older financial transaction system, for example. "We integrate the output from that legacy system and pull it into the data lake," says Guerra. The data becomes part of the lake while retaining access rights for the appropriate people.