Dagger exploits elastic computing resources to expand capacity rapidly and seamlessly at times of peak demand, yielding near-immediate results anywhere in the world.
We offer processing in numerous geographic locations to accommodate privacy and regulatory considerations.
Dagger can pre-apply user-defined keyword, wildcard, proximity, and combined searches to avoid over-collection and database clutter.
All outputs are accompanied by comprehensive graphical and tabular reports.
Our processing services and outputs include:
- Text and Metadata Extraction
- Remote Email Retrieval
- Personally Identifiable Information (PII) Detection
- Date Identification
- Optical Character Recognition (OCR)
- Named Entity Extraction
- Language Translation and Detection
- Audio Transcription
- Image Recognition (Detect Who Or What It Is)
Text and Metadata Extraction
Dagger extracts text and metadata from all standard word processor, presentation, spreadsheet, email, and portable-document files (e.g., Word, Excel, Outlook .pst and .ost, PDF, GIF, JPG, TIF).
Dagger outputs data in text files and the user’s choice of load-file format, including Concordance load file, CSV, pipe-and-caret, and user-defined.
Load-file field options include S.E.C. standard, DoJ standard, and user-defined.
The user may specify Bates-number length, start point, prefix and suffix.
Output to user’s choice of kdrive.online folder, FTP site, Amazon S3 bucket, Dropbox folder, Google Drive folder, or physical media.
For regulatory and practical considerations, deliveries are available to multiple geographic locations worldwide via kdrive.online, FTP, and AWS S3.
Remote Email Retrieval
Dagger bulk-fetches email from any IMAP or POP3-enabled system when provided with appropriate credentials.
Email can be fetched en masse or with keyword filters to avoid over-collection.
Personally Identifiable Information (PII) Detection
Dagger flags email with user-selectable personally identifiable information.
Date Identification
Dagger extracts customary date-time information such as time of creation, modification, printing, or, for emails, times sent and received. Dagger also applies a wide variety of date templates to extract from the text of each document and populate fields with the earliest date, latest date, and all dates that appear anywhere in the document or its metadata.
Optical Character Recognition (OCR)
- Accepts PDF, TIF, JPG, JPEG, PNG, GIF, and BMP images
- Multi-page TIF and PDF OK
- Automatic language detection (including CJK) improves accuracy
- HIPAA-compliant, BAA available
- 10,000 pages/min vs. 10’s of pages/min via mass parallelization
- Complete millions of pages in hours instead of weeks or months
- Bookmarks preserved from original PDF
Free Tier
- Send email to ocr@daggerdata.com with images attached
- Receive auto-reply email with OCR-ed PDFs and text files
Enterprise Tier
- ¾¢ per page = $7.50 per 1,000 pages. No charge for failures.
- Billing by folder for ease of client pass-through
- Available to download for 30 days
- 24/7 telephone and email support
Named Entity Extraction
Dagger employs natural language processing to extract names of people, places, organizations, facilities, and geo-political entities in each document using technology pioneered at Stanford University.
Language Translation and Detection
Dagger detects every language in every document to sequester foreign-language documents requiring separate review treatment.
Dagger offers automated translation from some languages to English facilitating single-language review or language-appropriate review routing.
Audio Transcription
Dagger can provide rudimentary automated audio transcription of audio files. While these transcriptions do not approach court-reporter quality, they may suffice to facilitate preliminary review and appropriate routing for audial review only as needed.
Image Recognition (Detect Who Or What It Is)
Dagger offers all-purpose image recognition to identify who or what an image contains. Dagger can also create specialized recognition tools as needed for specific image-recognition tasks.