Tuesday, January 26, 2021
Corporate Compliance Insights
  • Home
  • About
    • About CCI
    • Writing for CCI
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Articles
    • See All Articles
    • NEW: COVID-Related
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Leadership and Career
  • Vendor News
  • Jobs
  • Events
    • Webinars & Events
    • Submit an Event
  • Downloads
    • eBooks
    • Whitepapers
  • Podcasts
  • Videos
  • Subscribe
No Result
View All Result
  • Home
  • About
    • About CCI
    • Writing for CCI
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Articles
    • See All Articles
    • NEW: COVID-Related
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Leadership and Career
  • Vendor News
  • Jobs
  • Events
    • Webinars & Events
    • Submit an Event
  • Downloads
    • eBooks
    • Whitepapers
  • Podcasts
  • Videos
  • Subscribe
No Result
View All Result
Corporate Compliance Insights
Home Data Privacy

Leveraging AI to Proactively Detect, Track and Minimize Data Loss Threats

Deep Dark Web (DDW) Monitoring with Artificial Intelligence

by Anju Chopra and Heather Williams
May 29, 2019
in Data Privacy, Featured
illustration of iceberg with portion underwater representing deep/dark web

Kroll’s Anju Chopra and Heather Williams and Cognistx’s Eric Nyberg discuss how organizations are using AI to prevent and mitigate the severity of data breaches.

with co-author Dr. Eric Nyberg

Regulators, consumers and investors/stakeholders are increasingly not willing to accept the prevailing “not if, but when” defeatist attitude regarding data breaches. For example, the commission set up to oversee implementation for the European Union’s General Data Protection Regulation (GDPR) unequivocally states, “As an organisation it is vital to implement appropriate technical and organisational measures to avoid possible data breaches.”

So, it is not a rhetorical question to ask, what if organizations could predict, with a great degree of confidence, where and when their data might be compromised?

One of the key areas to find answers is in the deep dark web (DDW), with its known havens for cybercriminals and other bad actors. However, the DDW is a huge environment; not only does it have a decades-long history of data, but it also continues to grow at a staggering pace across a multitude of protocols, forums and sources.

Extracting relevant information from the DDW’s millions of files and petabytes[1] of information is a herculean and complex challenge, but just one part of the threat intelligence equation. Today, researchers are exploring exciting new frontiers in artificial intelligence and machine learning to determine the data that is meaningful and to create useful models to advance cybersecurity efforts.

Having identified and indexed DDW data for more than 14 years, Kroll has created a comprehensive data resource that is continually refreshed and curated. In analyzing this wealth of data, researchers hypothesized that by understanding the patterns and contexts leading up to data breaches, threat intelligence experts could better predict where and when future breaches might occur with the goal of preventing these from escalating into major events.

This article describes a methodology that addresses the challenge for organizations in three phases:

  1. Reduce and optimize the massive search space of the DDW to better direct analytical focus
  2. Find needles in this massive haystack (i.e., information that is pertinent to the organization)
  3. Discern patterns across the data that can serve as an early warning system

Reduce and Optimize the Search Space with AI

The process begins by creating a list of key terms, IP addresses, domain names, etc., that are unique to the organization (Kroll calls this the Dynamic Signature Profile or DSP). Given the near-overwhelming vastness of the DDW, optimizing the search space is required to separate vital signals from the noise.

AI is used to remove files that may contain DSP terms but which do not represent a risk. These can include items such as large PDF books, media interviews, marketing collateral, speaking engagements, etc. This is accomplished by training several supervised machine learning models to distinguish between pertinent and irrelevant files. Current trained models developed by the firm achieve 99.97 percent accuracy on this task by combining an initial dataset of several thousands of files with insights from highly experienced threat intelligence analysts.

Finding Organization-Specific “Haystack Needles” in the DDW

After reducing the search space, the next step is to use cognitive clustering to find patterns across the reduced but still very massive data store.

Clustering approaches supplemented by interactive human judgment are fundamental to the firm’s process. Over time, files based on salient terms and metadata have been organized into coherent groups that reflect various aspects of exposure to potential data losses.

In this way, several productive patterns have been uncovered that can be routinely monitored by analysts using predictive analytics tools. This combination of man-machine analysis has helped find several valuable exposure indicators across different industries. For example, in one case, large numbers of travel itineraries for an airline were found in the DDW, which indicated a specific vulnerability for this organization.

Learning from the above analysis, similar techniques are applied to the relatively smaller set of files that are pertinent to a specific organization, enabling the discovery of risk indicators specific to them in the DDW datastore. As clusters emerge for a given organization, the exposure assessment model is applied to help determine the organization’s level of risk on the DDW.

The methodology used for this phase includes developing organization-specific exposure indicators that take into account context idiosyncrasies. These indicators are built on the basis of signals mined from the text, which analysts use to create a “salient term matrix.” Analysts also examine the format and review occurrence characteristics, such as the timing, protocols and locations of organization-pertinent files.

An active learning loop with human analysts continues to refine and expand these signals. This loop also helps optimize the number of clusters for human review, which enhances the analysts’ ability to find evidence pertinent to potential data loss events for an organization.

Predicting Breaches Based on Client Files Being Seen on the DDW

Once organization-specific exposure indicators and clusters have been determined, the next phase involves a timeline analysis to study historical pre-breach patterns with the goal of eventually predicting potential breaches utilizing AI. The basic premise of the model is outlined below.

Researchers have found that clustering analysis shows a significant increase in the number of files on the DDW for an organization after a data loss event. They also observed that bad actors will accumulate files and once a threshold of sufficient data is met, the information is exploited for potentially nefarious activities. For example, the chart below shows how clusters emerged for a given organization over a 10-year time period (source of data: Kroll’s DDW datastore). The third data loss event in the past five years ultimately gave bad actors enough exposure to act on the data.

Models trained in this way can detect the surge in activities for organization, which, when combined with active monitoring by human analysts, helps threat intelligence specialists to detect potential breaches. The goal is by leveraging automated systems, human team members can alert organizations as these clusters emerge. In Kroll’s case, it enables us to work with clients to proactively detect data loss before it becomes a larger incident and to help them understand where the data exposure is coming from within their organization.

Conclusion

The integration of investigative expertise with next-generation AI and machine learning can help organizations better understand and address their data’s exposure on the DDW. As researchers continue to refine and grow models and methodologies, they will be able to help organizations proactively detect data loss and prevent those data losses from escalating into major events that can harm their operations, finances and reputations for years to come.


[1] One petabyte = 250 bytes; 1024 terabytes, or a million gigabytes


Tags: Artificial Intelligence/A.I.data breachGDPRmachine learning
Previous Post

How Leaders Make Decisions in the Face of Ambiguity

Next Post

Compliance Training is Overrated – Why Influencing the Culture is a Better Strategy

Anju Chopra and Heather Williams

Anju Chopra is Senior Vice President, Cyber Technologies, Identity Theft & Breach Notification at Kroll. In a career spanning over 20 years, Anju has been a leader in delivering innovative, often ground-breaking advances in complex technology systems, cybersecurity, artificial intelligence and enterprise architecture. She has particular expertise in developing cybersecurity and identity theft remediation products that integrate artificial intelligence technology. Anju’s strong business acumen and entrepreneurial vision have resulted in strategic solutions that have transformed client services and the internal operations that support them.
Heather Williams is Vice President, Product Management, Identity Theft & Breach Notification at Kroll. Heather has been with Kroll for over 12 years and a driving force for product innovation for nearly a decade. Her expertise in the fields of identity theft, breach response and cybersecurity have led to the development of enhanced solutions for these complex issues. She has been instrumental in developing dark web monitoring solutions, as well as cyber investigative resources that integrate artificial intelligence technology to better serve clients and their customers.
Dr. Eric Nyberg is Co-founder and Chief Data Scientist at Cognistx. Eric is a tenured Professor at Carnegie Mellon’s School of Computer Science and has worked on a broad range of AI applications — automatic language understanding, translation, and generation; advanced information retrieval and ranking; and automatic question-answering systems — since the 1980s. As a member of the original Watson development team, he helped IBM develop a generalized, scalable architecture for multi-strategy question answering systems, as well as specific techniques. In 2005, he co-founded Cognistx, an applied AI company, building multi-strategy AI systems for clients across the U.S.

Related Posts

illustration of man on ladder with binoculars, 2021 outlook concept

Financial Services Compliance in 2021

January 25, 2021
illustration of mafia man in silhouette with red tie

The Mafia’s Jackpot: How Criminal Organizations are Profiting from COVID-19

January 22, 2021
illustration of videoconference, screen and speech bubbles

New Risks as COVID-19 Forces Rapid Technology Adoption

January 21, 2021
silhouette of businesspeople in meeting with blue cyber background

Cyber Risk Quantification and Prioritization is the Future of GRC

January 20, 2021
Next Post
group with intersecting speech bubbles

Compliance Training is Overrated – Why Influencing the Culture is a Better Strategy

Access realtime data
Dynamic Risk Assessments with Workiva

Special Coverage

Special COVID page graphic

Jump to a Topic:

anti-corruption anti-money laundering/AML Artificial Intelligence/A.I. automation banks board of directors board risk oversight bribery CCPA/California Consumer Privacy Act Cloud Compliance communications management Coronavirus/COVID-19 corporate culture crisis management cyber crime cyber risk data analytics data breach data governance decision-making diversity DOJ due diligence fcpa enforcement actions financial crime GDPR GRC HIPAA information security internal audit KYC/know your customer machine learning monitoring ransomware regtech reputation risk risk assessment Sanctions SEC social media risk technology third party risk management tone at the top training whistleblowing
No Result
View All Result

Privacy Policy

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • RSS Feed

Category

  • CCI Press
  • Compliance
  • Compliance Podcasts
  • Cybersecurity
  • Data Privacy
  • eBooks
  • Ethics
  • FCPA
  • Featured
  • Financial Services
  • Fraud
  • Governance
  • GRC Vendor News
  • HR Compliance
  • Internal Audit
  • Leadership and Career
  • Opinion
  • Resource Library
  • Risk
  • Uncategorized
  • Videos
  • Webinars
  • Whitepapers

© 2019 Corporate Compliance Insights

No Result
View All Result
  • Home
  • About
  • Articles
  • Vendor News
  • Podcasts
  • Videos
  • Whitepapers
  • eBooks
  • Events
  • Jobs
  • Subscribe

© 2019 Corporate Compliance Insights