The amount of big data governments and companies grapple with is only going to get bigger. According to a document from the Cloud Security Alliance (CSA), the volume of big data is going to double every two years; the report projects that there will be 40,000 exabytes of generated data in 2020.
CSA is a group of industries that promote the best practices in cloud computing. Members include Amazon Web Services, Microsoft, and Red Hat.
CSA on Aug. 26 published a list of the 100 Best Practices in Big Data Security and Privacy to help big data service providers strengthen their infrastructures as the amount of data rapidly grows. The list offers 10 solutions for each of the 10 biggest challenges in big data security and privacy.
The following is a condensed list, presenting one recommendation for each challenge:
1. Challenge: Maintaining Secure Computations in Distributed Programming Frameworks
Solution: De-identify data to avoid making a trail between data from a subject and external data. This will protect the subject’s privacy. Personally identifiable information (PII) should be masked or removed from the data set. Data managers should also be aware of “quasi-identifiers,” such as someone’s ZIP code, birth date, and gender.
2. Challenge: Securing Nonrelational Data Stores
Solution: Nonrelational data stores such as NoSQL databases usually have very few security properties. Use fuzzing methods for security testing in order to expose vulnerabilities in NoSQL databases. Purposefully offer invalid, unusual, or random inputs and test for them. One technique is dumb fuzzing, which uses random input to test for vulnerabilities. Open Web Application Security Project (OWASP) provides guidelines on fuzzing.
3. Challenge: Securing Data Storage and Transaction Logs
Solution: Apply reliable proof of retrievability (POR) or provable data possession (PDP) methods to verify that data uploaded to the cloud is available and intact. Giuseppe Ateniese, chair of Computer Science at Stevens Institute of Technology, introduced a model that allows a user that has stored data at an untrusted server to verify that the server possesses the original data without having to retrieve it.
4. Challenge: Validating and Filtering Endpoint Input from Personal Devices
Solution: Filter malicious data with a central collection system in order to avoid extra computation of endpoint devices. Create a model that represents “normal” input behavior, and use this curve to gauge deviant behavior to detect malicious input.
5. Challenge: Monitoring Security and Compliance in Real Time
Solution: Use big data analytics to establish trusted connections to a cluster to ensure that only authorized connections occur on a cluster. Monitoring tools like a security information and event management (SIEM) solution can be used to find anomalous connections.
6. Challenge: Preserving Privacy while Conducting Analytics
Solution: Use homomorphic encryption, which allows specific types of computations to be carried out on ciphertext, which is more secure than plaintext. The results will also be encrypted; when they are decrypted, they will match the operations performed in plaintext.
7. Challenge: Migrating to Cryptographic Technologies
Solution: Find a middle ground between anonymity and authentication through a group signature, a cryptographic scheme where individuals can sign their data but can only be identifiable as members of a group. Only trusted third parties can identify the individual.
8. Challenge: Ensuring Granular Access Control
Solution: Use standard single sign-on (SSO) mechanisms to reduce the administrative work involved in sustaining a large user base. SSOs shift the burden of user authentication from administrators to publicly available systems.
9. Challenge: Establishing Granular Audits
Solution: Separate big data and audit data to distinguish between duties. The audit data regards information regarding what has happened within the big data infrastructure, but should be kept separate from the “regular” big data. A different network segment or cloud should be set up to host the audit system infrastructure.
10. Challenge: Security Data Provenance
Solution: Allow only authorized users to obtain certain data through fine-grained access control. Data will be available only to users who possess a set of certain weighted attributes.