eDiscovery Identification: Data Scope

The Litigation Response Plan

Planning is one of the most important parts of the e-discovery process and should include outside counsel, inside counsel, corporate information technology personnel and possibly an electronic evidence consultant. In many cases, this process is driven by the generation of a litigation response plan (LRP). The purpose of the LRP is to:

  • Collect, assimilate, and document existing legal strategies, corporate infrastructure and topographies, and electronic evidence production methodologies;
  • Work with the inside counsel, outside counsel team, and the corporate information technology team to provide legal and technical strategy, including data gathering strategies, pleadings, and best practices consultation;
  • Establish a written policy to follow upon receipt of a discovery request, preservation order or other similar item;
  • Ensure that the company meets its legal obligations while minimizing the electronic discovery expense and burden; and
  • Ensure that the third-party consultant can provide expert witness testimony in the event the implementations of the electronic discovery strategies come into question.

 

Development of the litigation response plan should take place at the corporate site with the appropriate legal teams and knowledgeable information technology staff. Interviews with the Chief Information Officer, General Counsel or Director of Information Technology, technology support personnel, records management, and the disaster recovery team may yield a significant amount of information about the location of potentially relevant electronic documents.

Some outside counsel firms provide e-discovery consulting services, and many electronic discovery vendors have consultants to assist with designing and defining an electronic discovery project. When selecting a vendor, it is important to examine the credentials of the consultant being assigned the project. Ask for a detailed curriculum vitae and look for references, testimony experience and relevant consulting experience.

Identifying Key Witnesses and Custodians

The typical corporate infrastructure can manage large volumes of users as part of their everyday routine. Not everyone in a company may be relevant to a particular litigation, but the IT organization, which is designed to support an entire infrastructure, will be. In many cases, key players must be identified by opposing parties. Of course, the requesting parties don't want to limit a search that may result in responsive data being missed. It has been traditional to have the producing party decide who the key players are and target them for specific collection. The meet-and-confer conference held early on in the discovery process can be used to identify who the key players are and what kind of data can be expected from each.

Determine if identification should be based upon department, geography, job function or other criteria. Identify which departments and/or divisions within the organization may be responsible for subject matters identified in the document request. For example, responsibility/ownership of specific projects or contracts should be tied to the applicable departments and employees.

Determining Key Time Frames

Review the relevant pleadings and discovery requests to determine the relevant time period to the matter. Use these dates to assist in locating and culling relevant data.

Keyword Lists

As you interview each potential custodian, ask him or her about particular jargon or acronyms that may have been used in correspondence, reports, etc., to ensure that all relevant data is searched. Compile a keyword list to be used during the processing and review of the data collected.

Identifying Potentially Relevant Document Types

Are only certain types of data relevant? Can all other types be excluded from the search? Document types are another item for negotiation during the meet-and-confer process.

Mapping the Client's Information Systems

An essential component of a successful electronic discovery project is an accurate picture of the target company's data sources. It is important to keep in mind that all company information technology infrastructures are not created equal. The hardware and software deployed to accomplish commonplace tasks such as managing company e-mail or creating data backups varies widely from organization to organization. Indeed, it likely varies within the target company if the timeframe in question is broad enough, or if the company is widely distributed in various geographic locations.

It is necessary to get an accurate diagram of the type and location of all data resources throughout the organization, making sure that this reflects the arrangement in the relevant time period. This identification process implicates many types of servers with active and dynamic data (i.e., file servers, collaboration servers, e-mail servers) and many interrelated data management systems (i.e., document management systems, financial systems, disaster recovery and restoration systems). This includes servers responsible for general company data, as well as user specific data, such as user home directories or departmental shared directories. It also includes the myriad devices that users employ to utilize that data, including desktop computers, PDAs and cell phones. Lastly, it implicates inactive data archives on various media such as hard drives, tape backups, flash drives, CD-Roms and DVDs. All of this is further complicated by the fact that legacy data, potentially across all these categories, may exist from previous company systems within the relevant time period. The necessary hardware, software or technical expertise to access such legacy data may no longer exist with in the target company.

The Network Diagram

A good place to start is to obtain a general diagram from the client's IT network administrator. This diagram should depict the types and locations of servers deployed throughout the organization. Using this as a guide, obtain a general understanding of the kind of data stored on each server and which individuals and/or departments they serve.

Determine what, if any structure exists to tie the data on these servers to individual users. For example, do users have home directories that are mapped to server drives? If so, on what servers? Is there a size limit per user? Does the organization utilize shared folders or directories? If so, how are they assigned?

Document Management Systems (DMS)

Does the corporation store materials in a document management system that profiles and categorizes documents on particular servers? If so, what is the name and version number of the system? Is utilization of the system required of all users? Can the system be bypassed? What enforcement mechanisms are in place to ensure that documents are stored correctly? Can the system create audit reports of access, edits, versions or copies of documents stored within it.

Data Types

While identifying where data is physically stored on the network, it is important to identify the type of data that should be found at each location, such as e-mail, Microsoft application documents, Adobe Acrobat documents, proprietary application files, etc. Certain data types will warrant more detailed inquiry so that the best collection plan can be determined.

E-Mail Systems

What types of electronic mail servers are deployed within the organization? Seek specifics regarding hardware, operating system, software name and version, location of servers, persons responsible for administering the mail system, etc. Determine the location of mailboxes of relevant custodians. Is there e-mail management software in place that has "janitorial" functions such as deleting or archiving e-mail through an automated process? What are the server retention/archive settings? What happens to an employee's e-mail, mailbox and e-mail account when he or she leaves the company? Are mailboxes restricted in size? Do e-mail stores have encryption or password protection? Does the organization allow remote access to e-mail, and if so, by what means? Can users archive e-mail outside the mail server on their local drives, other network locations, or removable media?

As you can see, the organizations e-mail system plays a very big role in the identification of relevant data.

Additional Data Sources

Beyond the servers in the organization, there are many other devices and options that provide the ability to store active data. Examples include:

  • Ability to store files/e-mail archives on local workstations or laptop hard drives
  • A Storage Area Network (SAN) where data from multiple servers is centrally stored
  • Ability to store information on removable media (floppies, CD-Roms, DVDs, zip drives, thumb drives, etc.)
  • Digital voice mail stores and/or VOIP(voice over internet protocol) stores. Personal Digital Assistants (Palm Pilot, BlackBerry, etc.)
  • Information on company intranet/extranet and the ability to export from these sources
  • Employee use of home computers for business matters and the storage of business information on them
  • Any information cached in the e-mail gateway
  • Do users cell phones cache business data such as text messages or e-mails?
  • Does the company phone switch maintain relevant records?
  • Failed drives from which a forensic recovery might be possible
  • Former employee computers prior to being recycled. Are any PCs currently pending recycling?
  • Any collaborative systems (e.g. Groove Technologies, eRoom, SharePoint) that might contain relevant data

Forensic Data Capture

Have any types of activities occurred that would require the forensic copying of hard drives and servers? Have files been deleted or written over that may be potentially relevant? Deleted files generally require a burden of proof that documents have not been produced (see Zubalake). Some states have restrictions on whether deleted files come into play (e.g., TxCivR 196.4). When deleted files become involved, it is imperative to make a forensic bit by bit/sector by sector duplicate, otherwise known as a forensic image.

A forensic image preserves the information as it existed at the time of the acquisition. Spoliation of the data is always a major concern when hard drive data is involved in litigation. Data is routinely overwritten and purged from hard drives as part of standard operating procedures. Potentially hundreds of files or residual fragments of files that may be relevant, such as e-mail, word processing documents, internet usage files, and spreadsheets are constantly being deleted from an active computer. Preserving the data in a forensically sound manner by creating a forensic image makes an unalterable snapshot of the data, including potentially recoverable deleted information, partial instant message conversations, metadata, internet e-mail, swap files and temporary files.

Forensic processes can be discussed at the meet-and-confer conference to determine their necessity (and expense) to the project.

Determining Relevance of Backup Media

Backup tape systems were created as disaster recovery systems for a catastrophic event. They are meant to restore an entire system or systems after a cataclysmic event and not to restore an e-mail from a single user. Many large corporations can be managing four to five terabytes of documents and e-mail on a day-to-day basis. This is the equivalent of 4 to 5 billion pages of reviewable documents.

Interviews with information technology personnel will provide the details necessary to understand their backup procedure and schedule. Find out how the backups are performed, how often they are performed and where the tapes are kept. Obtain a list of all servers that are actually backed up. Does their backup process perform a full system copy each time or are incremental backups performed? Determine if there have been any system changes during the relevant time period. Did the company change its hardware or software? Did it start using a third-party service? Are individual hard drives backed up?

Data restoration from backup tapes can be costly and time-consuming if an environment needs to be re-created from scratch and can become more difficult the farther back you go dealing with historical tapes. In some cases, restoration is simply impossible due to the unavailability of hardware or passwords necessary to complete the process.

Many companies use backup tapes as a litigation hold device and cease recycling backup tapes when a preservation request arrives. This can be problematic because it defines the disaster recovery system as the method of preserving data. As discussed above, it can sometimes be problematic to restore data from backup tapes in a timely and cost-effective manner. A best practices alternative is to pull specific backups (usually the earliest available backup and the backup from the day of the preservation request) from the disaster recovery system and set those tapes aside in response to the litigation hold. It is then recommended that the producing party send a letter to opposing counsel stating what steps have been taken and that the client is not going to alter its disaster recovery process. This action acknowledges that the client has backup tapes and is willing to preserve information appropriately.

Legacy Systems

Consideration must be given to any prior systems that were in place to handle information during the relevant time period. It is common for companies to migrate between technologies as more desirable means to accomplish a company's objectives are developed and come to market. Backups created using legacy systems may be incompatible with the current hardware or software in place. The hardware and/or prior versions of the software may no longer exist within the company to restore this data. Similarly, the individuals with knowledge of operation of these prior systems may have left the company. An identification of these resources is essential in order to understand the extent to which third-party vendors will be required to reach historical data if this is deemed necessary.

It is also important to understand the company's current upgrade path and the schedule for any upgrades, data migration, or data consolidation that might affect the ability to utilize currently available data, or recently archived data during the course of the litigation.

Offsite and Third-Party Systems

It has become increasingly popular to store data in locations away from the primary business for security, cost-efficiency or disaster-recovery purposes. These sources should be identified if they house data potentially relevant to the dispute. Examples of this include off-site company storage facilities, co-location data centers, third-party data warehousing, or third-party tape storage (e.g., Iron Mountain, Recall, etc.)

Documentation

Carefully document all data identification efforts.

Source: EDRM (edrm.net)