eDiscovery Processing: Data Conversion

Once a variety of search strategies have been implemented, the documents and/or identified information may be staged for review depending on the instructions of the legal team. Those instructions typically are a function of the technical and human resources available to the firm or client. A client has a number of options for conducting this next phase, ranging from review of documents in their original native format to review of materials in quasi-paper formats, such as TIFF (Tagged Image File Format, usually represented by a file extension of .tif) or PDF (Portable Document Format, usually represented by a file extension of .pdf). Using these quasi-paper formats - image formats - became standard because the formats could be used both for review and as a production format that was considered unalterable. To accommodate a more efficient review, these image formats often are accompanied by files containing the text of the document. Document review systems built around quasi-paper formats often connect the document's images to text and metadata from the document as well as, in some situations, to copies of the files themselves.

Native File Review

An alternative to converting electronic documents to images prior to review is to perform an initial review of documents in their native format. It has been estimated that 80% or more of reviewed material ultimately is deemed irrelevant to the legal matter, resulting in wasted conversion fees. If a converted format is preferred for production, this approach enables the review team to only convert what is relevant, non-privileged or otherwise to be produced.

To accommodate native file review, many service and software processing providers have developed technologies to provide reviewers the ability to review native files after the metadata has been preserved and linked to the document. Some allow native files to be opened and viewed in their native application, while others allow documents to be viewed by using viewer technology. The technology that is chosen must be determined by the requirements of the case and the processing constraints (scope and schedule).

Native File Productions

In addition to native file and image review, there has been a recent interest in native file productions. Historically, paper productions represented the most common method of providing document collections to the opposing side. As lawyers and judges have become more educated on the benefits of electronic productions, from both cost and review perspectives, image productions have become more prevalent. Today, some regulatory bodies have been requiring productions to be made in native format. Under the proposed amendments to the Federal Rules of Civil Procedure, parties are to discuss the intended form or forms of production during the initial Rule 26(f) discovery conference. Determining production formats at an early stage of the discovery process may influence the review formats needed. Additionally, the requirement that files be made available consistent with the manner in which they were maintained (or "as ordinarily maintained") further can be interpreted to support a native file production. Recent case law relating specifically to Excel spreadsheets provides an indication of this trend. Review and production formats will be decided by the court, if not the parties, based on an analysis of the how the information is kept and how the production format aids in ensuring the just, speedy, and efficient conclusion of a matter.

Understanding the Details

Whether processing documents to image for review or production purposes, it is important to understand the details. The goal of creating a printable image of an electronic document is to render the document in a non-modifiable form that allows all document contents to be reviewed. In processing documents to an image format, some software and service providers use viewer technology to determine the rendering of the file. Viewer technology allows a variety of application files to be viewed without using the native applications. This can be useful in avoiding significant application license fees and increasing the speed with which the contents of a file can be viewed. These efficiencies are gained at the expense of completeness. No viewer renders all of the underlying application data. Others software and service providers use the native applications to render the information contained within the file.

User-created information can be nested within file types in ways that are not immediately apparent to the reader. For example, in Word documents, comments can be stored in a document, but the print range can be set to include comments or not. Similarly, comments in Excel spreadsheets may not be easily seen without specifically formatting the print range to include those items. Also, in spreadsheets, entire pages of a worksheet may be hidden or protected. It is crucial to unhide and unprotect this information to reveal all the contents within the file for review purposes.

Frequently, users protect files or subcomponents of files (e.g., sheets or cells in a spreadsheet). It is important to unprotect such files by cracking passwords. This process must occur prior to the application of any culling strategies, including keyword search, if the responsive dataset is to be complete. Files that are protected and not successfully cracked should be segregated and reported to the client.

Once the image format has been created, the images can be delivered along with the text that has been extracted for each file and its metadata information. As compared with paper productions, the electronic information deemed responsive to the searches and culling strategies are bundled into something called a "batch load" and subsequently delivered to the client or law firm. Once there, the electronic package will be placed into a discovery document management system, such as Concordance or Summation, which allows the litigation team to run multiple search queries to identify responsive documents and prepare the strategy for the case. Each image in the collection is given a unique identifier, typically a Bates number. This information can also be packaged for production by a processing software or service provider. Production sets can include images with Bates numbers for tracking purposes, various endorsements based on the specific case matter, or native files with their metadata preserved.

Source: EDRM: (edrm.net)