When financial institutions face SEC orders requiring assessments of their electronic communications compliance programs, the traditional manual approach involves analyzing hundreds of documents across five core areas from policies to surveillance systems. StoneTurn experts Jonny Frank, Nathan Gibson, Michael Costa and Kashif Sheikh explain how large language models can process these massive datasets to identify responsive documents, flag gaps and generate information requests, proving that AI can serve as a robust baseline for compliance reviews while allowing professionals to focus on insight-rich analysis.
Text messages and group chats are where filters go to die and, for investigators, where misconduct often reveals itself. Mobile device messaging lures us to drop our guard and engage in discussions without considering that our words will be scrutinized later. These off-the-cuff exchanges function like planted surveillance — except the subjects are bugging themselves.
So, is it any surprise that the SEC Enforcement Division and the DOJ have placed a premium on obtaining electronic communications? Recognizing the immense investigative value of electronic communications, the SEC has conducted a four-year sweep, resulting in nearly $1.8 billion in penalties and administrative orders against over 50 major financial institutions. The DOJ added electronic communications to its “Evaluation of Corporate Compliance Programs” guidance (ECCP) in 2024, and in 2025, it incorporated controls over electronic communications into its corporate enforcement and voluntary self-disclosure policy (CEP). These enforcement actions require companies to conduct comprehensive assessments of their electronic communications compliance programs — reviews that traditionally involve manually analyzing hundreds of documents spanning millions of words.
Our experimental results reveal that large language models (LLMs) can accelerate electronic communications reviews, delivering the same quality as manual processes in half the time.
Electronic communications compliance review challenges
Beginning in December 2021 and continuing through January 2025, the SEC issued orders against over 55 banks, broker-dealers, credit ratings agencies, investment management firms and private equity firms relating to the preservation of employees’ electronic communications, such as texts and emails, on their personal devices. The orders were nearly identical, requiring the respondents to pay hefty fines and assess their electronic communications compliance programs.
The SEC orders required assessments and testing of electronic communications policies, as well as formal and informal training programs, technology for capturing and preserving electronic communications, surveillance to detect potential violations, investigation of allegations, remediation of compliance violations and consistency in penalties handed out across business lines and seniority levels. While the SEC orders detailed what the institutions needed to assess, they did not provide criteria on which examiners were to base their evaluations.
These mandated reviews begin with constructing a comprehensive set of assessment criteria, drawing from internal guidance, government expectations (e.g., the ECCP) and professional frameworks (e.g., the COSO Internal Control Integrated Framework). These criteria span five core areas to review: (1) supervisory, compliance and other policies and procedures; (2) training and certification; (3) technological solutions; (4) surveillance; and (5) noncompliance and disciplinary frameworks. Manual reviews define base criteria and sub-criteria, applying both design and operational effectiveness testing to benchmark each firm’s practices against regulatory standards.
Manual reviews assess design effectiveness by examining policies, procedures, and controls to determine whether they adequately mitigate risk, assuming they operate as intended. Design testing is followed by testing operating effectiveness, i.e., whether policies, processes and controls operate as designed by persons with adequate authority and competency. Testing operating effectiveness includes interviews, process walk-throughs, focus groups, sample testing and inputting test messages.
10 Reasons Why DOJ’s Foray Into Ephemeral Messaging Is Misguided
Latest corporate compliance guidance threatens to transform every workplace into a surveillance state
Read moreDetailsLeveraging large language models
Data scientists and risk and controls experts can leverage large language models (LLMs) to demonstrate how companies can effectively employ AI to conduct the same review in substantially less time. AI is particularly helpful for repeatable assessments against the same criteria. Here, for example, we focus on electronic communications compliance programs, but companies and counsel can similarly leverage LLMs to assess and test compliance programs against the DOJ ECCP and “timely and appropriate remediation” against the DOJ CEP.
The AI-assisted process differs from the use of popular chatbots primarily through its direct use of an LLM and supplementary software to conduct the analysis. The process examines each assessment criterion against the financial institutions’ provided documents, as well as any interviews, walk-throughs, focus groups and other additional information-gathering exercises that our team may have conducted.
Documents that the LLM processes can include PDFs or Word files, spreadsheets, PowerPoint presentations, emails, screenshots and other images, audio recordings, web pages and videos. These can easily span hundreds of files and the equivalent of well over 2 million words, making each assessment criteria task like finding the proverbial needle in a haystack.
The LLM can analyze massive data sets to find the needle. Software is built around the LLM to methodically guide the AI model to examine each assessment criterion against this massive document corpus in order to uncover evidence that the company is compliant (or not) with that particular requirement.
For each requirement, the LLM identifies:
- All responsive documents, including supporting citations and quotations, speakers and a detailed explanation as to how and why the document(s) meet the criteria.
- Gaps across the corpus of documents that need to be addressed to meet the criteria, whether the company is somewhat compliant or entirely noncompliant.
- Information and document requests that can be submitted to the financial institution to fill in the gaps.
Our experiment found that the LLM flagged gaps and inconsistencies mirrored those identified in the manual review. We also found that using AI can cut the review time by 50%. The close alignment of the findings demonstrates AI’s potential to serve as a robust baseline for compliance reviews, allowing for a more critical consideration of the findings by applying years of professional judgment to the more insight-rich analysis.
This dual-track approach highlights the efficiency and reliability of integrating AI into compliance gap assessments. LLM outputs provide a strong foundation for in-depth human inquiry and validation, ultimately enhancing the quality and consistency of our recommendations while significantly reducing the time required for initial document review.
So what?
Recent advancements in LLMs and natural language processing have transformed the landscape of compliance reviews. Integration of these technologies has delivered measurable benefits across several dimensions when coupled with human experts:
Automated policy and procedure review
LLMs can rapidly compare relevant content from extensive policy documentation to assessment criteria and cite areas of response controls or flag control gaps. This automation accelerates review cycles and reduces human error.
AI-generated gap analysis summaries
LLMs can be prompted to generate preliminary gap summaries for analyst validation, expediting insight generation and improving clarity.
Interview preparation and analysis
Internal documentation can be input to LLMs to generate targeted interview questions. After the interview, LLMs can summarize notes and themes for comprehensive analysis.
Operating effectiveness
LLMs can help the review team consider interview notes and control testing results to assess operational effectiveness, it helps them evaluate design effectiveness with policies and procedures.
Enhance lexicon-based electronic communications surveillance programs
AI can enhance (and possibly replace) lexicon-based electronic communications surveillance systems and reduce the many hundreds, if not thousands, of hours wasted on investigating false positives.
Improved reporting
LLMs can help prepare initial drafts of regulator reports by integrating structured findings and generating consistent, actionable narratives, enabling professionals to focus on analysis and recommendations.
Quality control
AI can verify that conclusions drawn about an institution’s compliance or noncompliance with assessment criteria are based on accurate information, i.e., the provided set of documents and other materials.
Conclusion
As AI technology continues to evolve, so too will its potential to streamline compliance reviews and gap assessments. Used wisely, AI enables compliance teams to move faster without compromising quality. It’s not just about saving time; it’s about sharpening insight. By blending human expertise with AI tools, organizations can deliver stronger results and build programs that truly meet regulatory expectations, providing data-driven, actionable recommendations. The combination of advanced analytics and professional expertise empowers organizations to address risk, close compliance gaps and maintain a competitive edge in an increasingly complex regulatory environment.