Processing Specifications

How We Ensure Data Integrity with Unique Control Numbers

Below is an overview of our standard specifications when processing Electronically Stored Information (ESI). Please note that these standards may change from time to time due to advancements in technology and improvements to workflows. Should you have any questions or concerns related to the specifications defined herein, then please contact our professional services team at [email protected]

Numbering Settings

Numbering Type:

  • Documents will receive a unique identifier during data processing. This unique identifier is referred to as the “Control Number”.
  • By default, the Control Number will be prefixed with “REL” and contain 7 leading digits (e.g.; REL0000001)
  • The Control Number will be applied to each record with the next available number for that prefix.

Parent/Child Numbering

  • Child records will always receive a sequential Control Number immediately following their parent record. The only exception being if a retried exception file is published to a workspace. If a retried exception file is published to the workspace, then the retried children will be suffixed with the parent.

Global Deduplication

Global deduplication is applied on promotion to review by default. This means that during data processing only one copy of each parent record recognized during data processing is promoted to each workspace (along with its associated attachments).

Deduplication  in Relativity is applied only on Level 1 non-container parent files. If a child file (Level 2+) has the same processing duplicate hash as a parent file or another child file, then they will not be deduplicated, and they will be published to Relativity, regardless of whether the hash field has the same value. This is done to preserve family integrity.

Please contact our professional services team should you have a need to apply deduplication on a custodial level, or not at all.

Timezone

All files are processed in Coordinated Universal Time (UTC) by default. The time zone used to display date and time on a processed document. This selection determines the default time zone on the processing data sources that you create and then associate with a processing set. The default time zone is applied from the processing profile during the discovery stage.

Application of a standard time zone can help to normalize data sets when custodians reside in different regions. Please contact our professional services team should a need exist to change the default timezone.

Embedded Objects

All child files (attachments, embedded objects, images, and other non-parent files) recognized during discovery are extracted during data processing with the exception of the following:

  • Microsoft Embedded Images
  • Email inline Images

Some objects from specific file types may not be extractable during data processing. Please contact our team should you have any concerns, or wish to change the embedded object extraction behavior.

Extraction Settings

Email files will be output as MSG files for all Outlook, Lotnus Notes, and Bloomberg file types. Text will be extracted from Excel, PowerPoint, and Word documents leveraging the file’s native application. OCR will be performed in English by default for records which do not have text recognized.

DeNIST

All files found on the National Institute of Standards and Technology (NIST) list are removed prior to processing to ensure they are not promoted to review. Relativity makes new versions of the NIST list available shortly after the National Software Reference Library (NSRL) releases them quarterly. The list will change overtime.

By default, DeNIST’ing will not break any parent/child groups, regardless if the files are on the NIST list. Please let our professional services team know if there is a need to disable the DeNIST functionality during processing.

Metadata Fields

The following metadata fields are mapped to our ECA and Review workspaces by default and will be extracted where available. Please let us know should you have any questions on the below:
Name Field Type Is Relational
MD5 Hash Fixed-Length Text Yes
Family Group Fixed-Length Text Yes
Conversation Index Long Text No
Created Date/Time Date No
Last Modified Date/Time Date No
Email Received Date/Time Date No
Email Sent Date/Time Date No
Delivery Receipt Requested Yes/No No
Control Number Beg Attach Fixed-Length Text No
Control Number End Attach Fixed-Length Text No
File Extension Fixed-Length Text No
Email BCC Long Text No
Email CC Long Text No
Email From Fixed-Length Text No
Email To Long Text No
File Name Fixed-Length Text No
File Size Decimal No
Number of Attachments Whole Number No
Sort Date/Time Date No
Conversation Family Fixed-Length Text No
Attachment List Long Text No
Last Printed Date/Time Date No
File Type Fixed-Length Text No
Extracted Text Size in KB Decimal No
Email Subject Long Text No
Lotus Notes Other Folders Long Text No
All Custodians Multiple Object No
All Paths/Locations Long Text No
Attachment Document IDs Long Text No
Conversation Long Text No
Created Date Long Text No
Created Time Long Text No
DeDuped Custodians Multiple Object No
DeDuped Paths Long Text No
Title Long Text No
Author Fixed-Length Text No
Email BCC (SMTP Address) Long Text No
Email CC (SMTP Address) Long Text No
Child MD5 Hash Values Long Text No
Child SHA1 Hash Values Long Text No
Child SHA256 Hash Values Long Text No
Comments Long Text No
Company Fixed-Length Text No
Contains Embedded Files Yes/No No
Last Saved Date/Time Date No
Document Subject Long Text No
Unprocessable Yes/No No
Unified Title Long Text No
Track Changes Yes/No No
Email Created Date/Time Date No
Email Last Modified Date/Time Date No
Image Taken Date/Time Date No
Last Accessed Date/Time Date No
Meeting End Date/Time Date No
Meeting Start Date/Time Date No
Primary Date/Time Date No
Email Store Name Fixed-Length Text No
Last Saved By Fixed-Length Text No
MS Office Document Manager Fixed-Length Text No
MS Office Revision Number Fixed-Length Text No
Message ID Fixed-Length Text No
Original Author Name Fixed-Length Text No
Email Original Author Fixed-Length Text No
Original File Extension Fixed-Length Text No
Parent Document ID Fixed-Length Text No
SHA1 Hash Fixed-Length Text No
SHA256 Hash Fixed-Length Text No
Email Sender Name Fixed-Length Text No
Email Recipient Domains (BCC) Multiple Object No
Email Recipient Domains (CC) Multiple Object No
Email Recipient Domains (To) Multiple Object No
Email Sender Domain Multiple Object No
Email Format Single Choice No
Email Sensitivity Single Choice No
Importance Single Choice No
Media Type Single Choice No
Message Class Single Choice No
Message Type Single Choice No
Outlook Flag Status Single Choice No
Password Protected Single Choice No
Record Type Single Choice No
Text Extraction Method Single Choice No
Email Recipient Count Whole Number No
Email Has Attachments Yes/No No
Email Modified Flag Yes/No No
Email Sent Flag Yes/No No
Email Unread Yes/No No
Excel Hidden Columns Yes/No No
Excel Hidden Rows Yes/No No
Excel Hidden Worksheets Yes/No No
Excel Pivot Tables Yes/No No
Has Hidden Data Yes/No No
Has OCR Text Yes/No No
Is Embedded Yes/No No
Is Parent Yes/No No
PowerPoint Hidden Slides Yes/No No
Email Read Receipt Requested Yes/No No
Speaker Notes Yes/No No
Suspect File Extension Yes/No No
Document Title Long Text No
Email Categories Long Text No
Email Entry ID Long Text No
Email Folder Path Long Text No
Email In Reply To ID Long Text No
Email From (SMTP Address) Fixed-Length Text No
Keywords Long Text No
Last Accessed Date Long Text No
Last Accessed Time Long Text No
Last Modified Date Long Text No
Last Modified Time Long Text No
Last Printed Date Long Text No
Last Printed Time Long Text No
Last Saved Date Long Text No
Last Saved Time Long Text No
Meeting End Date Long Text No
Meeting End Time Long Text No
Meeting Start Date Long Text No
Meeting Start Time Long Text No
Message Header Long Text No
Native File Long Text No
Other Metadata Long Text No
Email Received Date Long Text No
Email Received Time Long Text No
Email Recipient Name (To) Long Text No
Email Sent Date Long Text No
Email Sent Time Long Text No
Source Path Long Text No
Email To (SMTP Address) Long Text No
Password Long Text No

Search Index

A dtSearch index will be created in the ECA or Review workspace and will be populated with the Extracted/OCR text extracted during data processing. The only text which will be searchable in the workspace is the Extracted/OCR text. Should a need exist to include more than the extracted text in the dtSearch index, then please reach out to our professional services team.

Disclaimer

This documentation is not an all-inclusive list of all settings and options available during data processing, but does detail many common processing specifications for ESI. Please reach out to our professional services team should you require any additional information related to the processing specifications for your project.

Schedule Your eDiscovery Consultation