A common view holds that predictive coding software is either not viable or not worth the expense and effort for discovery in cases involving tens of thousands rather than millions of documents. But predictive coding software can yield substantial dividends in time, expense and accuracy even in such comparatively small cases, with ROI often exceeding 500%. It also allows small and mid-sized firms to handle cases involving discovery of massive scope which would otherwise be unmanageable. A handful of case studies illustrates these points.
First, a brief description of predictive coding for the uninitiated. predictive coding uses computer software to categorize (apply tags to) documents as relevant, not relevant, hot, related to a particular issue, even privileged or not. It typically involves some variation of four basic steps. First, one or a handful of attorneys review a “training set” of documents, typically vendor-designated, from which the software “learns” how to categorize documents and assesses its own accuracy by comparison to the attorney reviewer categorizations. Second, the reviewing attorneys review additional training documents, and revisit potential errors flagged by the software from the first round of review. The software again assesses its own accuracy. If the accuracy materially improves with the new batch of documents, this step gets repeated until additional training sets yield no, or negligible, improvements in accuracy. Third, the software designates which documents belong in which categories and, potentially, which documents it is unable to assess accurately and therefore require further attorney attention. Fourth, attorneys review the software categorizations (typically a subset, such as all or a sampling of all documents categorized as relevant, potentially privileged documents, priority custodians, and the like) and, if applicable, produce the documents. Is the overhead (in attorney training) and expense (typically a vendor service) worthwhile even for smaller firms and cases? Consider the following four case studies (full disclosure: I participated in each) involving small- to mid-size firms with big matters and, in one case, a big firm with a smaller matter.
Case No. 1: Mid-size firm reviews, culls trial exhibits from 70 million documents in 5 weeks.
A mid-sized firm, in order to overcome a recalcitrant defendant, offered to process over 70 million of defendant’s raw, unfiltered emails. The firm astutely deployed keyword searches to narrow the number of files to some 900,000, then trained predictive-coding software to winnow those down to 90,000 documents for attorney review. Having trained its sights on the most relevant subset of documents with technology rather than brute force with an army of contract attorneys, four or five high-level attorneys found the thirty or so exhibits that made a difference in its subsequent nine-figure victory at trial. The firm completed this work within approximately five weeks from receipt of the data, at a fraction of the cost compared to the many months and millions of dollars that a traditional review would have
entailed.
Case No. 2: Partner at small firm reviews 320 training documents, eliminates 85% extraneous matter, completes production from 300,000 client emails for less than $20,000.
A partner at a fifteen-attorney firm spent a week reviewing about 320 vendor-prescribed training documents. This sufficed to enable the predictive-coding software to isolate for attorney review 90% of the relevant documents in a mere 14% of the original corpus of about 300,000 client emails at an expense of less than $20,000. Through this highly accurate, automated, near-instantaneous computer categorization of over 250,000 documents, the firm focused its time and the client’s limited resources on superb lawyering rather than document review en route to settlement.
Case No. 3: Thirty-attorney firm predictively codes 300,000 documents, saves $100,000.
See James R. Hietala, Jr., Linguistic Key Words in E-Discovery, 37 Am. J. Trial Advoc. 603 (Spring 2014).
Case No. 4: AmLaw 10 firm reduces cost by 83%, errors by 80%, and review burden by 93% in response to RFP involving 25,000 documents.
An AmLaw 10 firm charged with responding to an RFP confronted a collection of 25,000 documents. The associate reviewed 250 training documents, enabling the predictive-coding software to categorize the 25,000 documents to a level of accuracy in excess of what contract attorneys typically achieve. The associate reviewed 1,500 of the auto-categorized documents, partly as a double-check and partly to learn the contents of the documents being produced. The entire production review was done in a little over a week. The firm realized approximately an 83% reduction in total cost, a 91% reduction in total time, and an 80% reduction in responsive documents erroneously omitted as compared with a typical contract-attorney review, as depicted in the charts at the end of this article.
As the foregoing matters show, not only massive discovery lends itself to predictive coding, and not only massive firms can use it. predictive coding software is worth considering for small- to midsize firms wishing to handle large discovery matters but unfamiliar or unable to cope with traditional methods of review (numerous contract attorneys consuming copious amounts of space, time, money, computer terminals, and supervision). It can also bear fruit for firms of any size handling cases with as few as tens of thousands of documents.