No Result
View All Result
SUBSCRIBE | NO FEES, NO PAYWALLS
MANAGE MY SUBSCRIPTION
NEWSLETTER
Corporate Compliance Insights
  • Home
  • About
    • About CCI
    • CCI Magazine
    • Writing for CCI
    • Career Connection
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Explore Topics
    • See All Articles
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Well-Being at Work
    • Leadership and Career
    • Opinion
  • Vendor News
  • Library
    • Download Whitepapers & Reports
    • Download eBooks
    • New: Living Your Best Compliance Life by Mary Shirley
    • New: Ethics and Compliance for Humans by Adam Balfour
    • 2021: Raise Your Game, Not Your Voice by Lentini-Walker & Tschida
    • CCI Press & Compliance Bookshelf
  • Podcasts
    • Great Women in Compliance
    • Unless: The Podcast (Hemma Lomax)
  • Research
  • Webinars
  • Events
  • Subscribe
Jump to a Section
  • At the Office
    • Ethics
    • HR Compliance
    • Leadership & Career
    • Well-Being at Work
  • Compliance & Risk
    • Compliance
    • FCPA
    • Fraud
    • Risk
  • Finserv & Audit
    • Financial Services
    • Internal Audit
  • Governance
    • ESG
    • Getting Governance Right
  • Infosec
    • Cybersecurity
    • Data Privacy
  • Opinion
    • Adam Balfour
    • Jim DeLoach
    • Mary Shirley
    • Yan Tougas
No Result
View All Result
Corporate Compliance Insights
Home Compliance

Optimizing Document Review in Compliance Investigations, Part 2

by Thomas Gricks
August 6, 2018
in Compliance, Featured
hand highlighting portions of a document

Using Advanced Analytics and Continuous Active Learning to “Prove a Negative”

This is the second article in a two-part series that focuses on document review techniques for managing compliance in internal and regulatory investigations. Part 1 provided several steps for implementing an effective document review directed at achieving the objectives of a compliance investigation. This installment outlines an approach that can be used to demonstrate that there are no responsive documents to an equivalent statistical certainty – essentially proving a negative.

What Does it Mean to “Prove a Negative?”

The objective of a compliance investigation is most often to quickly locate the critical documents that will establish a cohesive fact pattern and provide the materials needed to conduct effective personnel interviews. In that situation, the documents are merely a means to an end.

Occasionally, however, the documents become an end unto themselves. For example, governmental agencies often use civil investigative demands (CIDs) to investigate allegations of potential statutory liability. In that context, the documents themselves become the object of the investigation. While those documents may well have downstream utility, the emphasis of the document review in responding to the CID is purely on locating any responsive documents.

There may well be situations where there simply are no responsive documents to be found. In practice, that typically means reviewing every single document in a given collection, just to come up empty handed. With modern electronically stored information (ESI) collections that total in the hundreds of thousands, or even millions, of documents, a linear review of that magnitude can be prohibitively expensive and time-consuming.

Alternatively, it is possible to leverage advanced analytics, continuous active learning and statistics to review only a fraction of an ESI collection, yet demonstrate that there are so few responsive documents potentially in the collection that a full-blown review would be entirely unreasonable. That is what is meant by proving a negative – undertaking an aggressive effort to locate responsive documents, finding none and using statistics to demonstrate the virtual absence of responsive documents.

Why Continuous Active Learning?

There are three principal technology-assisted review (TAR) protocols that can be used to enhance a document review: simple passive learning, simple active learning and continuous active learning. Because of the way these different protocols train the underlying TAR algorithms, only continuous active learning (CAL) protocols are effective in proving a negative.

As discussed in greater detail below, the objective in proving a negative is actually to make every possible effort to find responsive documents. So the technology-assisted review protocol should advance that objective.

The only TAR protocol that effectively seeks out responsive documents throughout the review process is CAL. A simple passive protocol trains by passing random documents to the reviewer. And a simple active protocol trains by a process known as uncertainty sampling, which provides the “gray” documents to the reviewer (i.e., the documents that are right at the border between documents that look to be responsive and those that look to be non-responsive).

By comparison, CAL primarily uses a process known as relevance feedback to pass training documents to the reviewer. Relevance feedback uses everything that is known about the documents coded to that point in time to select training documents that are most likely to be responsive.

Using a CAL protocol leverages the TAR algorithm. Every document reviewed in the process is a document that the algorithm sees as most likely to be responsive. That approach advances the objective of finding responsive documents far more efficiently than an approach that relies on random or gray documents and, therefore, CAL is critical to proving a negative.

Use Statistics to Scope the Review

The first step in proving a negative is to establish the statistical parameters that will set the margins of error for the review and, in turn, the number of documents that may have to be reviewed in the process. The expectation is that no responsive documents will ever be found, regardless of how many documents are reviewed. With that assumption, statistics will control the relationship between the number of documents reviewed and the margin of error – in other words, the number of responsive documents that might exist in the collection.

There is no hard-and-fast rule for setting the statistical boundaries. Rather, the decision depends on the relationship between the value of finding any responsive documents and the cost of obtaining these documents. In essence, the decision depends on some measure of proportionality and is likely going to be negotiated with the requesting party.

As an example, consider a collection of 500,000 documents that is not expected to contain a single responsive document. Using a binomial statistical calculator (such as the one at statpages.info/confint.html), the margins of error can be evaluated for samples of 1 percent, 2 percent, 5 percent and 10 percent of the collection to establish a range of alternatives:

With a range of alternatives, the relative cost and benefit of various sample sizes can be evaluated, and the number of documents to be reviewed can be negotiated and set accordingly.

Use Analytics to Initiate a Review for Documents That Are “Close” to Responsive

The latent objective in proving a negative is actually to make every conceivable effort to locate the precise documents that are not expected to exist in the collection. That means truly exploiting every available analytical approach to locating responsive documents while keeping in mind that the TAR tool will eventually do the heavy lifting.

And, since no approach is likely to locate responsive documents (as none are expected to exist in the collection), the review should focus on finding documents that are contextually close to being responsive. These “close” documents will eventually serve as the best available training examples for the CAL review.

Start the process by using keyword searches that are carefully crafted to locate any responsive documents that might exist in the collection. And be sure to solicit any reasonable keyword searches from the requesting party. Doing so will not only enhance the potential for finding truly responsive documents, but also alleviate any concern on the part of the requesting party that the scope of the review might be too narrow. If a search returns too many documents, review a reasonable random sample across the entire hit population to establish a statistical absence of responsive documents.

Then, use advanced analytics to diligently explore specific components of the collection that are most likely to contain responsive documents. For example, keyword searches can be refined to focus on the documents held by specific key custodians. Communication analytics can be used to identify email exchange patterns that may be pertinent to the investigation. There may be certain file types (e.g., Microsoft Excel files or PowerPoint presentations) that are more likely to be responsive. Even associated metadata, like the original file path for a document, can be explored in a meticulous effort to find responsive documents.

This review should continue until all reasonable searches have been exhausted and between 20 percent and 30 percent of the total anticipated review effort has been completed. Doing so will initially establish the absence of responsive documents and provide a reasonable starting point for training the CAL algorithm. And these efforts should be recorded, should it be necessary to explain and justify the process down the line.

Use CAL in an Effort to Surface Any Truly Responsive Documents

Once the analytics review is complete, use continuous active learning to complete the remainder of the review. The CAL algorithm will efficiently analyze the entire collection to locate any documents that are contextually similar to “close” documents located during the analytics review and will continuously learn from every coding decision made along the way.

Optimize the CAL training regime from the analytics review with one or more synthetic seeds. Draft an electronic document that reflects the specific content of a document that would be considered responsive if it existed within the collection. Import the document into the collection, being careful to include some designation (such as a unique Bates identifier) that makes it easy to identify, and mark the synthetic seed as responsive. This will provide the continuous active learning algorithm with a very clear example of the precise language that makes a document responsive.

As with the keyword search process, a synthetic seed may be solicited from the requesting party as well. Doing so will ensure that the CAL algorithm will recognize, and elevate for review, documents that are contextually similar to specifically what the requesting party is seeking.

Make sure that some fraction of the documents reviewed during the CAL process are contextually diverse from the responsive synthetic seeds and the “close” documents identified in the analytics review. As discussed in the first article in this series, modern TAR tools include functionality directed at locating contextually diverse documents. This functionality is critical in proving a negative, as it ensures a thorough exploration of the entire collection.

Presumably, the CAL review will not locate any responsive documents, since they are not expected to exist within the collection. As with the analytics review, however, documents that are close to being responsive should be coded as positive in order to continuously surface any contextually similar documents and maximize the potential for finding truly responsive documents.

Use the Review and Statistics to “Prove a Negative”

Assuming no responsive documents have been located during the review, the underlying statistics can be used to essentially prove a negative. Obviously, without reviewing the entire collection, there is no way to be certain that it contains no positive documents. What can be said, however, is that there are a very limited number of responsive documents that might exist in the collection. From the above example, a review of 25,000 documents using this process would mean that there are likely no more than 100 responsive documents in the entire collection.

And, although that analysis is not based on a purely random statistical sample, this review process actually comprises a much more thorough effort to find positive documents. By using analytics and continuous active learning and by including contextually diverse documents in the CAL review, this process optimizes the likelihood of finding a responsive document in the collection, if one exists. Since no responsive documents have been found in the review, the likelihood that a responsive document exists elsewhere in the collection is, for all practical purposes, even less than if the review had been random.

Altogether, this process is a reasonable way to demonstrate the absence of responsive documents in a collection without having to review the entire collection and to do so in a way that is even more stringent than a random review.


Tags: Data AnalyticsInternal Investigation
Previous Post

Data Here, Data There, Data Everywhere

Next Post

Why Cryptocurrency is So Difficult for the IRS to Regulate

Thomas Gricks

Thomas Gricks

Thomas Gricks, Esq., is Managing Director, Professional Services at Catalyst. A prominent e-discovery lawyer and one of the nation's leading authorities on the use of TAR in litigation, Tom advises corporations and law firms on best practices for applying TAR technology to reduce the time and cost of discovery. He has more than 25 years of experience as a trial lawyer and in-house counsel, most recently with the law firm Schnader Harrison Segal & Lewis, where he was a partner and chair of the e-Discovery Practice Group.

Related Posts

check engine light

What Gets Measured Gets Managed, but What Actually Matters in Compliance?

by Keshonda Walker
May 16, 2025

Looking beyond standard measurements to identify the quiet signals that help compliance teams address issues before they become crises

hidden value abstract

CCO Insights: How to Articulate the True Value of Your Compliance Program

by Kenneth Koch and Phillip Ostwalt
May 14, 2025

Benefits of robust programs aren’t always obvious, but buy-in remains critical

workplace investigation hand holding magnifying glass

First, Do No Harm: The Hidden Mental Health Cost of Ethics & Compliance Investigations

by Lisa Beth Lentini Walker
January 27, 2025

Cultivate well-being throughout an investigation process

binoculars peeking through screen

Practical Tips & Considerations for External Investigations

by Maria Walker
January 13, 2025

Employee participation is critical to investigation success; here's how to ensure meaningful engagement

Next Post
increasing value of cryptocurrencies

Why Cryptocurrency is So Difficult for the IRS to Regulate

No Result
View All Result

Privacy Policy | AI Policy

Founded in 2010, CCI is the web’s premier global independent news source for compliance, ethics, risk and information security. 

Got a news tip? Get in touch. Want a weekly round-up in your inbox? Sign up for free. No subscription fees, no paywalls. 

Follow Us

Browse Topics:

  • CCI Press
  • Compliance
  • Compliance Podcasts
  • Cybersecurity
  • Data Privacy
  • eBooks Published by CCI
  • Ethics
  • FCPA
  • Featured
  • Financial Services
  • Fraud
  • Governance
  • GRC Vendor News
  • HR Compliance
  • Internal Audit
  • Leadership and Career
  • On Demand Webinars
  • Opinion
  • Research
  • Resource Library
  • Risk
  • Uncategorized
  • Videos
  • Webinars
  • Well-Being
  • Whitepapers

© 2025 Corporate Compliance Insights

Welcome to CCI. This site uses cookies. Please click OK to accept. Privacy Policy
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
No Result
View All Result
  • Home
  • About
    • About CCI
    • CCI Magazine
    • Writing for CCI
    • Career Connection
    • NEW: CCI Press – Book Publishing
    • Advertise With Us
  • Explore Topics
    • See All Articles
    • Compliance
    • Ethics
    • Risk
    • FCPA
    • Governance
    • Fraud
    • Internal Audit
    • HR Compliance
    • Cybersecurity
    • Data Privacy
    • Financial Services
    • Well-Being at Work
    • Leadership and Career
    • Opinion
  • Vendor News
  • Library
    • Download Whitepapers & Reports
    • Download eBooks
    • New: Living Your Best Compliance Life by Mary Shirley
    • New: Ethics and Compliance for Humans by Adam Balfour
    • 2021: Raise Your Game, Not Your Voice by Lentini-Walker & Tschida
    • CCI Press & Compliance Bookshelf
  • Podcasts
    • Great Women in Compliance
    • Unless: The Podcast (Hemma Lomax)
  • Research
  • Webinars
  • Events
  • Subscribe

© 2025 Corporate Compliance Insights