Dagger points you to hot, relevant, privileged, or issue-specific documents using supervised and unsupervised machine-learning techniques that are continuously updated to keep in the vanguard of artificial-intelligence technology, including natural language processing, continuous active learning, dynamic tuning of parameters, algorithm selection, and feature representations, and cross-validation techniques to maximize training value and accuracy assessment and minimize human-reviewers burden.
Dagger’s predictive analytics outputs can be overlaid into your existing review process or computer-assisted review tool (e.g., Relativity, Concordance), or you can use ours.
Our legal data analytical tools and outputs include:
- Document Scoring
- Automated Document Categorization
- Privilege Detection
- Key Document Detection
- Concept Extraction and Clustering
- Near-duplicate Grouping and Deduping
- Plaintiffs Firms
Document Scoring
After the client reviews a critical mass of documents (typically 500-1000), each document receives a score for each tag, such as relevance, hotness, privilege, or issue-based (e.g., an RFP specification). The score for a given tag falls on either of two scales:
- a scale of zero through one, with the score representing the probability that the tag applies to that document, or
- a user-defined scale, commonly one through five, representing gradations of relevance to the tag.
Dagger provides a comprehensive accuracy report including graphs, tables, learning curves, and trade-offs between accuracy and additional documents to review. The client can improve the accuracy measurements by reviewing documents identified by Dagger that will best train the model using continuous active learning, random and stratified sampling, and other techniques.
Eventually, the model will meet the client’s accuracy demands or accuracy will cease to improve, and the review effort can shift from training the model to reviewing the documents associated with each tag.
Automated Document Categorization—Making Your Production
When responding to a subpoena, request for production, or other document demand, Dagger categorizes documents as relevant or non-relevant. The great bulk of documents are typically non-relevant. Depending on the nature of the document set and client needs, there may be a third, residual group of documents: those which the software cannot categorize to the requisite accuracy. Armed with these three groups of documents, the client can disregard or lightly sample the non-relevant group (typically the great majority), produce the relevant group with or without prior screening, and review the uncategorized residuum in its entirety. This methodology enables the client to minimize the population of documents requiring expensive and time-consuming human review.
Privilege Detection
As with other tags, Dagger can analyze and flag potentially privileged documents. Dagger recommends that its privilege scores and categorizations be treated as a supplement, rather than a substitute, for attorney review. The number of such documents requiring review can be minimized with Dagger’s document scoring and automatic categorization solutions. In addition, Dagger seeks to launch a universal privilege-detection service which seeks to accumulate a database of anonymized, encoded privileged documents of sufficient size and breadth to enable privilege detection on all matters, rather than case-by-case. The privileged documents provided would be stored in an irreversible, undetectable, irretrievable format. Contact us to participate and receive discounted privilege-detection services in the future.
Key Document Detection and “Find More Like This”
For each hot or key document identified by the client, Dagger provides a list of the conceptually most similar documents, employing an analysis more intricate and sophisticated than simple textual similarity. Dagger also identifies documents most conceptually similar to very-low-prevalence tags. This is frequently the most fecund source of additional documents for such tags, usually including hot, which are often the most important.
Concept Extraction and Clustering
Dagger extracts and provides a list of the key and defining concepts present in the dataset, facilitating expeditious early case analysis. In addition, for each document, Dagger identifies the key concepts present therein, and enables aggregation or clustering based on such concepts.
Near-duplicate Grouping and Deduping
Dagger identifies duplicate and near-duplicate documents. These determinations facilitate sampling and bulk coding of substantially similar document groups, such as recurring spreadsheets which differ in temporal and numerical content but not in substance. Near-deduping also facilitate quality-control checks on coding discrepancies between textually similar documents.
Plaintiffs Firms
Plaintiffs face unique challenges which are well met with Dagger predictive coding and analytics tools: budgetary constraints, a deluge of documents (often duplicative or nearly duplicative) bearing little or no relevance, unpredictable, rolling, last-minute, and incomplete adversarial document productions, hidden smoking guns, and limited technological and manpower resources. Dagger has helped a number of plaintiffs firms overcome such hurdles with predictive coding, conceptual clustering, “more like this” analysis, data processing, and more. References available upon request.