No Result
View All Result
SUBSCRIBE | NO FEES, NO PAYWALLS
MANAGE MY SUBSCRIPTION
NEWSLETTER
Corporate Compliance Insights
  • Home
  • About
    • About CCI
    • CCI Magazine
    • Writing for CCI
    • Career Connection
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Explore Topics
    • See All Articles
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Well-Being at Work
    • Leadership and Career
    • Opinion
  • Vendor News
  • Library
    • Download Whitepapers & Reports
    • Download eBooks
    • New: Living Your Best Compliance Life by Mary Shirley
    • New: Ethics and Compliance for Humans by Adam Balfour
    • 2021: Raise Your Game, Not Your Voice by Lentini-Walker & Tschida
    • CCI Press & Compliance Bookshelf
  • Podcasts
    • Great Women in Compliance
    • Unless: The Podcast (Hemma Lomax)
  • Research
  • Webinars
  • Events
  • Subscribe
Jump to a Section
  • At the Office
    • Ethics
    • HR Compliance
    • Leadership & Career
    • Well-Being at Work
  • Compliance & Risk
    • Compliance
    • FCPA
    • Fraud
    • Risk
  • Finserv & Audit
    • Financial Services
    • Internal Audit
  • Governance
    • ESG
    • Getting Governance Right
  • Infosec
    • Cybersecurity
    • Data Privacy
  • Opinion
    • Adam Balfour
    • Jim DeLoach
    • Mary Shirley
    • Yan Tougas
No Result
View All Result
Corporate Compliance Insights
Home Featured

CIOs: 5 Big Data Operational Changes to Make Now

by David Dingwall
September 14, 2017
in Featured, Internal Audit
businessman pointing to pocket watch

Preparing Your Organization for GDPR Compliance

The threat of a $24 million fine is enough to make any organization sit up and listen to what changes they must make to adhere to new European Union laws on data protection. But, in preparing for General Data Protection Regulation (GDPR), are U.S. companies focused too much on the “data” in their big data clusters? David Dingwall, of Fox Technologies, believes so. He says putting these clusters through GDPR compliance is dependent on some fundamental technical setups. Getting the “plumbing” wrong can bypass all that expensive compliance process review work and cause your organization to fail audit reviews.

The beauty of building extra-large Linux clusters is that it’s easy. Hadoop, OpenStack, hypervisor and HPC installers enable you to build on commodity hardware and deal with node failure reasonably simply. However, a minimum fine of at least €20 million (US$24 million) for a GDPR violation does make you focus on how auditors are going to treat their review of your organization’s people-related data storage and manipulation.

Most of the GDPR review articles you may have read in the last 12 months reinforce that privacy and encryption of people data is hugely important. Multiple layers of encryption for data at rest and in transit through your infrastructure is appropriate.  However, when dealing with new big data infrastructures, crucial audit areas of concern include being clear how the software manipulates, aggregates, anonymizes or de-anonymizes (soon to be illegal in the U.K.) people data.

There are some key lessons from the financial services marketplace, which have been using Linux-based HPC and blade clusters for data modelling and forecasting for the last 15 years, especially the operational planning and setup that make ongoing cycles audits easier to complete.

Big Data Cluster Fundamentals: The Large Sausage Machine Without Real People

There is temptation to build a new data-processing cluster on a standalone network to constrict data movement, with supplemental admin access on a second corporate LAN interface. Once loaded, however, like an Oracle database in the past, a data work package for Hadoop and HPC clusters tends to execute all running data transforming tasks in a cluster with a single account (e.g., “hadoop”), not the submitting user ID.

Audit needs to prove not just how personal data is stored, but also how data is manipulated. Therefore, this includes understanding who on your staff can create, change or log in at these application-specific accounts, or worse, the operating system root account.

#1: Too Many Setup Options, Not Enough Certified (People) Installers

Big Data or HPC cluster software tools have specific setup and deployment models that suggest standard templates for installation. According to the 451 Group, less than 20 percent of Hadoop licences purchased worldwide so far have moved into live production, and sadly, those typical cluster installation tool models from commercial edition vendors like Cloudera, SAS and Hortonworks do not reflect the compliance regimes you are going to need in 2018. Frankly, unless members of your staff have worked for one of the internet giants like Google or Yahoo!, admin staff life cycle experience is very limited and we are all learning on the job.

#2: Ensure Your Administrators are Real People

For traceability later, ensure your organization has a consistent organizational user ID (UID)/group ID (GID) strategy for Linux. For your cluster’s software, the unique application user and group IDs need to fit into that matrix across the organization’s infrastructure, not just in your cluster. Your staff’s ID needs to be unique across your business, not just in the cluster, and best practice now says using multifactor authentication challenges should be utilized when they login when moving from node to node in your infrastructure to prove they are a real person, not a stolen account and password pair. This is essential to implement early.

#3: Visibility into Your Organization’s SIEM, and Needing to Track Correlated Events

Clusters can generate a large wave of log files. For example, the Hortonworks distribution of Hadoop generates hundreds or thousands of “su hadoop” messages in a few minutes. Security information and event management (SIEM) platforms (opensource or not) are a fantastic way to make sense of correlated events. For example:

  • David logged into the corporate network from home via a VPN using MFA
  • David SSHed into the production jumpstart server
  • David SSHed into cluster node 47, then SUed to root
  • David changed the UID of the Hadoop account from 10011 to 13011
  • The Cluster ran 138 SU jobs as the Hadoop account on node 47 until 18:00

An operating system, application or cluster manager’s log viewer may only show you slices of this picture. Sending all logs at all levels to your enterprise SIEM is safer, more complete and, frankly, becomes another team’s responsibility in terms of reporting.

Ensuring your admin staff have unique account names and account IDs makes correlation very simple to track in the network, operating system and software layers. Auditors and your business data owners actually prefer this hands-off model, where someone apart from your Linux admin team is proving what happened.

#4: Give Auditors the Right Tools to Do their Jobs – Your Admin Staff are too Busy Running the Business!

A main sign that your Linux admin team is overwhelmed is if a team member is taking more than four days per audit cycle to help auditors. In that case, something is broken and/or not obvious. The ideal is one to two days maximum.

Keep in mind searching for “what actually happened?” events from an SIEM rather than interrupting the operation of your Big Data cluster is going to be essential. Unlike data warehouses from, say, 10 years ago, as a trade-off for x10 or x100 data-processing performance improvements, it is often impossible to get a time-based snapshot of what your customer data looked like 45 days ago from your cluster.

Thankfully, most opensource and commercial SIEM systems have interactive reporting capabilities, and there are robust third-party report tool vendors, often specialising in specific market sectors. Auditor training using these reporting tools can take one to two days, a significant audit cycle cost savings, rather than attempting to train them in the full operation of your cluster’s operations, which can take weeks (and back to point one, always assuming they have technical audit headcount with the appropriate admin and life cycle experience).

#5: Certify Your Organization, Not Just the Big Data Cluster

Whilst working on your 2018 operational plan for your organization, your big data clusters and GDPR, think carefully about how auditors will work their way through their checklists. One more international compliance regime with large potential fines can be quite distracting.   With potential fines starting at 2 percent of your company’s total worldwide turnover or €20 million (whichever is the larger), your scope needs to be a whole-organization approach.

A focus just on data privacy is going to be a problem – specifically the exposure of “user-less” big data software solutions are vulnerable to small teams of administration staff who can easily subvert the cluster’s technical platforms. Luckily, international banks have been dealing with exactly these assurance issues on UNIX and Linux platforms for three decades and data forecasting clusters for the last 15 very similar to today’s big data systems, and they’ve been passing quarterly audit cycles with relative ease.

As Pablo Picasso once said, “Good artists copy, great artists steal.” There are a great deal more UNIX and Linux staff with banking operations life cycle experience available on the market compared to the very small pool of big data cluster specialists. To get your organization’s big data clusters through GDPR audit, I suggest you “steal” one or two of these heads to supplement your data science and cluster admin geeks.


Tags: Big DataGDPR
Previous Post

TRACE: The Compliance Whisperer

Next Post

The Battle for Call Recording Compliance

David Dingwall

David Dingwall

David Dingwall is Vice President of Product Marketing at Fox Technologies, a global security company that helps organizations centralize Linux and UNIX access management across hybrid IT environments. Enterprises worldwide rely on Fox’s security solutions to enforce granular security controls, simplify compliance and increase overall IT department efficiencies.

Related Posts

origami tiger

Paper Tigers Won’t Protect You: The Reality of Effective NIS2 Compliance

by Hans Kayaert
March 24, 2025

Why Belgium's early adoption model could prevent another round of ‘compliance theater’ across Europe

examining data on laptop screen

Privacy Rights Surge Forces Rethink of Data Management

by Gal Ringel
March 14, 2025

As global privacy regulations multiply, organizations face mounting pressure to efficiently respond to data subject requests amid complex data environments

gdpr website screenshot

In the World of JavaScript, GDPR Consent Forms Merely Scratching the Surface

by Rui Ribeiro
December 16, 2024

Consent forms alone don’t mean much when consumers are so tired of checking boxes they don’t even read the policies

us map black and white

Minnesota Latest State to OK Consumer Data Privacy Law

by Amanda Novak
August 26, 2024

Measure set to go into effect for most covered entities next summer

Next Post
business team wearing phone headsets

The Battle for Call Recording Compliance

No Result
View All Result

Privacy Policy | AI Policy

Founded in 2010, CCI is the web’s premier global independent news source for compliance, ethics, risk and information security. 

Got a news tip? Get in touch. Want a weekly round-up in your inbox? Sign up for free. No subscription fees, no paywalls. 

Follow Us

Browse Topics:

  • CCI Press
  • Compliance
  • Compliance Podcasts
  • Cybersecurity
  • Data Privacy
  • eBooks Published by CCI
  • Ethics
  • FCPA
  • Featured
  • Financial Services
  • Fraud
  • Governance
  • GRC Vendor News
  • HR Compliance
  • Internal Audit
  • Leadership and Career
  • On Demand Webinars
  • Opinion
  • Research
  • Resource Library
  • Risk
  • Uncategorized
  • Videos
  • Webinars
  • Well-Being
  • Whitepapers

© 2025 Corporate Compliance Insights

Welcome to CCI. This site uses cookies. Please click OK to accept. Privacy Policy
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
No Result
View All Result
  • Home
  • About
    • About CCI
    • CCI Magazine
    • Writing for CCI
    • Career Connection
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Explore Topics
    • See All Articles
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Well-Being at Work
    • Leadership and Career
    • Opinion
  • Vendor News
  • Library
    • Download Whitepapers & Reports
    • Download eBooks
    • New: Living Your Best Compliance Life by Mary Shirley
    • New: Ethics and Compliance for Humans by Adam Balfour
    • 2021: Raise Your Game, Not Your Voice by Lentini-Walker & Tschida
    • CCI Press & Compliance Bookshelf
  • Podcasts
    • Great Women in Compliance
    • Unless: The Podcast (Hemma Lomax)
  • Research
  • Webinars
  • Events
  • Subscribe

© 2025 Corporate Compliance Insights