No Result
View All Result
SUBSCRIBE | NO FEES, NO PAYWALLS
MANAGE MY SUBSCRIPTION
NEWSLETTER
Corporate Compliance Insights
  • Home
  • About
    • About CCI
    • CCI Magazine
    • Writing for CCI
    • Career Connection
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Explore Topics
    • See All Articles
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Well-Being at Work
    • Leadership and Career
    • Opinion
  • Vendor News
  • Library
    • Download Whitepapers & Reports
    • Download eBooks
    • New: Living Your Best Compliance Life by Mary Shirley
    • New: Ethics and Compliance for Humans by Adam Balfour
    • 2021: Raise Your Game, Not Your Voice by Lentini-Walker & Tschida
    • CCI Press & Compliance Bookshelf
  • Podcasts
    • Great Women in Compliance
    • Unless: The Podcast (Hemma Lomax)
  • Research
  • Webinars
  • Events
  • Subscribe
Jump to a Section
  • At the Office
    • Ethics
    • HR Compliance
    • Leadership & Career
    • Well-Being at Work
  • Compliance & Risk
    • Compliance
    • FCPA
    • Fraud
    • Risk
  • Finserv & Audit
    • Financial Services
    • Internal Audit
  • Governance
    • ESG
    • Getting Governance Right
  • Infosec
    • Cybersecurity
    • Data Privacy
  • Opinion
    • Adam Balfour
    • Jim DeLoach
    • Mary Shirley
    • Yan Tougas
No Result
View All Result
Corporate Compliance Insights
Home Data Privacy

Leveraging AI to Proactively Detect, Track and Minimize Data Loss Threats

Deep Dark Web (DDW) Monitoring with Artificial Intelligence

by Anju Chopra and Heather Williams
May 29, 2019
in Data Privacy, Featured
illustration of iceberg with portion underwater representing deep/dark web

Kroll’s Anju Chopra and Heather Williams and Cognistx’s Eric Nyberg discuss how organizations are using AI to prevent and mitigate the severity of data breaches.

with co-author Dr. Eric Nyberg

Regulators, consumers and investors/stakeholders are increasingly not willing to accept the prevailing “not if, but when” defeatist attitude regarding data breaches. For example, the commission set up to oversee implementation for the European Union’s General Data Protection Regulation (GDPR) unequivocally states, “As an organisation it is vital to implement appropriate technical and organisational measures to avoid possible data breaches.”

So, it is not a rhetorical question to ask, what if organizations could predict, with a great degree of confidence, where and when their data might be compromised?

One of the key areas to find answers is in the deep dark web (DDW), with its known havens for cybercriminals and other bad actors. However, the DDW is a huge environment; not only does it have a decades-long history of data, but it also continues to grow at a staggering pace across a multitude of protocols, forums and sources.

Extracting relevant information from the DDW’s millions of files and petabytes[1] of information is a herculean and complex challenge, but just one part of the threat intelligence equation. Today, researchers are exploring exciting new frontiers in artificial intelligence and machine learning to determine the data that is meaningful and to create useful models to advance cybersecurity efforts.

Having identified and indexed DDW data for more than 14 years, Kroll has created a comprehensive data resource that is continually refreshed and curated. In analyzing this wealth of data, researchers hypothesized that by understanding the patterns and contexts leading up to data breaches, threat intelligence experts could better predict where and when future breaches might occur with the goal of preventing these from escalating into major events.

This article describes a methodology that addresses the challenge for organizations in three phases:

  1. Reduce and optimize the massive search space of the DDW to better direct analytical focus
  2. Find needles in this massive haystack (i.e., information that is pertinent to the organization)
  3. Discern patterns across the data that can serve as an early warning system

Reduce and Optimize the Search Space with AI

The process begins by creating a list of key terms, IP addresses, domain names, etc., that are unique to the organization (Kroll calls this the Dynamic Signature Profile or DSP). Given the near-overwhelming vastness of the DDW, optimizing the search space is required to separate vital signals from the noise.

AI is used to remove files that may contain DSP terms but which do not represent a risk. These can include items such as large PDF books, media interviews, marketing collateral, speaking engagements, etc. This is accomplished by training several supervised machine learning models to distinguish between pertinent and irrelevant files. Current trained models developed by the firm achieve 99.97 percent accuracy on this task by combining an initial dataset of several thousands of files with insights from highly experienced threat intelligence analysts.

Finding Organization-Specific “Haystack Needles” in the DDW

After reducing the search space, the next step is to use cognitive clustering to find patterns across the reduced but still very massive data store.

Clustering approaches supplemented by interactive human judgment are fundamental to the firm’s process. Over time, files based on salient terms and metadata have been organized into coherent groups that reflect various aspects of exposure to potential data losses.

In this way, several productive patterns have been uncovered that can be routinely monitored by analysts using predictive analytics tools. This combination of man-machine analysis has helped find several valuable exposure indicators across different industries. For example, in one case, large numbers of travel itineraries for an airline were found in the DDW, which indicated a specific vulnerability for this organization.

Learning from the above analysis, similar techniques are applied to the relatively smaller set of files that are pertinent to a specific organization, enabling the discovery of risk indicators specific to them in the DDW datastore. As clusters emerge for a given organization, the exposure assessment model is applied to help determine the organization’s level of risk on the DDW.

The methodology used for this phase includes developing organization-specific exposure indicators that take into account context idiosyncrasies. These indicators are built on the basis of signals mined from the text, which analysts use to create a “salient term matrix.” Analysts also examine the format and review occurrence characteristics, such as the timing, protocols and locations of organization-pertinent files.

An active learning loop with human analysts continues to refine and expand these signals. This loop also helps optimize the number of clusters for human review, which enhances the analysts’ ability to find evidence pertinent to potential data loss events for an organization.

Predicting Breaches Based on Client Files Being Seen on the DDW

Once organization-specific exposure indicators and clusters have been determined, the next phase involves a timeline analysis to study historical pre-breach patterns with the goal of eventually predicting potential breaches utilizing AI. The basic premise of the model is outlined below.

Researchers have found that clustering analysis shows a significant increase in the number of files on the DDW for an organization after a data loss event. They also observed that bad actors will accumulate files and once a threshold of sufficient data is met, the information is exploited for potentially nefarious activities. For example, the chart below shows how clusters emerged for a given organization over a 10-year time period (source of data: Kroll’s DDW datastore). The third data loss event in the past five years ultimately gave bad actors enough exposure to act on the data.

Models trained in this way can detect the surge in activities for organization, which, when combined with active monitoring by human analysts, helps threat intelligence specialists to detect potential breaches. The goal is by leveraging automated systems, human team members can alert organizations as these clusters emerge. In Kroll’s case, it enables us to work with clients to proactively detect data loss before it becomes a larger incident and to help them understand where the data exposure is coming from within their organization.

Conclusion

The integration of investigative expertise with next-generation AI and machine learning can help organizations better understand and address their data’s exposure on the DDW. As researchers continue to refine and grow models and methodologies, they will be able to help organizations proactively detect data loss and prevent those data losses from escalating into major events that can harm their operations, finances and reputations for years to come.


[1] One petabyte = 250 bytes; 1024 terabytes, or a million gigabytes


Tags: Artificial Intelligence (AI)Data BreachGDPRMachine Learning
Previous Post

How Leaders Make Decisions in the Face of Ambiguity

Next Post

Compliance Training is Overrated – Why Influencing the Culture is a Better Strategy

Anju Chopra and Heather Williams

Anju Chopra and Heather Williams

Anju Chopra is Senior Vice President, Cyber Technologies, Identity Theft & Breach Notification at Kroll. In a career spanning over 20 years, Anju has been a leader in delivering innovative, often ground-breaking advances in complex technology systems, cybersecurity, artificial intelligence and enterprise architecture. She has particular expertise in developing cybersecurity and identity theft remediation products that integrate artificial intelligence technology. Anju’s strong business acumen and entrepreneurial vision have resulted in strategic solutions that have transformed client services and the internal operations that support them.
Heather Williams is Vice President, Product Management, Identity Theft & Breach Notification at Kroll. Heather has been with Kroll for over 12 years and a driving force for product innovation for nearly a decade. Her expertise in the fields of identity theft, breach response and cybersecurity have led to the development of enhanced solutions for these complex issues. She has been instrumental in developing dark web monitoring solutions, as well as cyber investigative resources that integrate artificial intelligence technology to better serve clients and their customers.
Dr. Eric Nyberg is Co-founder and Chief Data Scientist at Cognistx. Eric is a tenured Professor at Carnegie Mellon’s School of Computer Science and has worked on a broad range of AI applications — automatic language understanding, translation, and generation; advanced information retrieval and ranking; and automatic question-answering systems — since the 1980s. As a member of the original Watson development team, he helped IBM develop a generalized, scalable architecture for multi-strategy question answering systems, as well as specific techniques. In 2005, he co-founded Cognistx, an applied AI company, building multi-strategy AI systems for clients across the U.S.

Related Posts

news roundup new

Few Business Leaders Feel Fully Prepared for Challenges of 2025

by Staff and Wire Reports
June 20, 2025

Data center operators not using full slate of available sustainability tactics; companies continue to use AI without policies

robot nurturing a good idea

Innovation vs. Compliance: In the Age of AI, Why Not Both?

by Asha Palmer
June 17, 2025

As governments scramble to regulate AI, forward-thinking companies are writing their own compliance playbooks

human robot working as team pie chart

Smart Machines, Smarter Humans: Why Compliance Still Needs a Human Touch

by Roman Eloshvili
June 17, 2025

From the 2008 financial crisis to everyday judgment calls, the case for keeping humans in the compliance loop

surrealist businessmen on platforms doing tug of war

Regulation vs. Innovation: The Tug-of-War Defining Finance’s Future

by Alex Tsepaev
June 6, 2025

AI compliance creates a global patchwork where EU fines reach €35 million while the US encourages growth — leaving financial...

Next Post
group with intersecting speech bubbles

Compliance Training is Overrated – Why Influencing the Culture is a Better Strategy

No Result
View All Result

Privacy Policy | AI Policy

Founded in 2010, CCI is the web’s premier global independent news source for compliance, ethics, risk and information security. 

Got a news tip? Get in touch. Want a weekly round-up in your inbox? Sign up for free. No subscription fees, no paywalls. 

Follow Us

Browse Topics:

  • CCI Press
  • Compliance
  • Compliance Podcasts
  • Cybersecurity
  • Data Privacy
  • eBooks Published by CCI
  • Ethics
  • FCPA
  • Featured
  • Financial Services
  • Fraud
  • Governance
  • GRC Vendor News
  • HR Compliance
  • Internal Audit
  • Leadership and Career
  • On Demand Webinars
  • Opinion
  • Research
  • Resource Library
  • Risk
  • Uncategorized
  • Videos
  • Webinars
  • Well-Being
  • Whitepapers

© 2025 Corporate Compliance Insights

Welcome to CCI. This site uses cookies. Please click OK to accept. Privacy Policy
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
No Result
View All Result
  • Home
  • About
    • About CCI
    • CCI Magazine
    • Writing for CCI
    • Career Connection
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Explore Topics
    • See All Articles
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Well-Being at Work
    • Leadership and Career
    • Opinion
  • Vendor News
  • Library
    • Download Whitepapers & Reports
    • Download eBooks
    • New: Living Your Best Compliance Life by Mary Shirley
    • New: Ethics and Compliance for Humans by Adam Balfour
    • 2021: Raise Your Game, Not Your Voice by Lentini-Walker & Tschida
    • CCI Press & Compliance Bookshelf
  • Podcasts
    • Great Women in Compliance
    • Unless: The Podcast (Hemma Lomax)
  • Research
  • Webinars
  • Events
  • Subscribe

© 2025 Corporate Compliance Insights