eDiscovery Collection: Guidelines

Planning

Based on the results of the Identification phase, adequate planning of the search strategy is a major key to overall effectiveness in the collection effort.

Security

At the outset of a collection effort, appropriate steps must be taken to preserve the content of the electronically stored information (ESI) and its metadata. This includes ensuring that procedures are in place to preserve privileged work product from the other data collected and produced.

As part of the requirement for chain of custody, all data collected needs to be secured by the collection agent in a manner that both (a) prohibits unauthorized access to the data and (b) tracks all attempts to access the data. As discussed in the previous section, this will also ideally include the ability to audit all access activity.

To assure access is controlled, all collection operators should have a network logon account that is associated with their identity. A comprehensive list should also be maintained mapping the individuals involved in collection to any and all accounts that they have used in the collection process.

A secure computer database for collected data ordinarily requires (a) that a user interface exists to manage security and (b) administrative rights to manage all operator access rights. Operators with administrative rights should be extremely limited and all use should be auditable.

Scope of Collection

Understanding the scope of data that is available is critical to executing a series of searches spanning all available sources of evidence. Searches need to span each of the following "silos" of data:

Online/Production Data

Data in currently running production systems, including e-mail, databases, commercial off-the-shelf (COTS) applications, or other active company records.

Offline Data

Files stored on network file shares, local desktop or laptop file systems, on portable storage such as CDs or DVDs, on portable storage devices, on external hard drives, in Personal Storage Files such as a PST file.

Archive Data

Files stored in a corporate records management system or within an archive including e-mail and instant messaging data.

Backup Data

Files stored on backup media of any sort, including tapes, snapshots, file-based backups, backups of portable storage devices in any location (onsite, offsite, in transit, at employees' homes, or awaiting disposal/re-use).

Completeness of Collection

Understanding the type of data identified is important to determine the kinds of searches to use to collect data in a complete manner. For example, searching for files from a given custodian could require finding all offline files with a "last modified by" or "created by" account of "companyjoeuser." However, finding data for this same custodian in the active email system could require searching for all items "from," "to" or "cc" that contain the email address "joe@company.com" in addition to the name of the custodian. In addition, the same custodian may have multiple usernames in different mailboxes or in other databases. All of this information about a custodian must be secured to insure that a complete search result is obtained.

Accuracy of Collection

Due to the nature of the complex systems in use in most companies today, files typically undergo numerous transformations throughout their life-cycle. These transformations occur both at the hands of end users and automatically by the operating system or other software in use. Operating system and file specific metadata are added or modified, file formats are transformed, encoded, decoded, and encrypted, and many other potential changes make it difficult to assess the evidence was actually created, modified or viewed by a particular custodian.

Below is an example of a typical file transformation:

  • When a Word document is created, the data is in a binary file typically with a .doc extension. When that same document is attached to an email and sent across the internet, the attachment is encoded, typically through Base64 Multipurpose Internet Mail Extensions (MIME). Other types of MIME encoding such as 7-bit and "quoted printable" still are used, along with another form of encoding called UUEncode. The receiving email system then de-codes the MIME encoding and rebuilds the .doc binary file for the intended recipient's use.
  • While the majority of encoding and decoding occurs properly, sometimes a file will need to be manually decoded using native tools and procedures (i.e., from a software vendor such as Microsoft) or by using a third-party forensics solution. Regardless of the system used there is always a risk that decoding errors will occur.
  • In the end it is important that those involved in collection recognize the fact that an element of risk is always involved whenever data translation/conversion is undertaken, and that these transformations need to be understood and mitigated in whatever way is appropriate.

 

Scalability

Scalability of the collection mechanisms is paramount. Performing a comprehensive series of searches across any company's infrastructure can involve searching potentially large amounts of data, often resulting in tremendous volumes of data to be de-duplicated and culled, reviewed and redacted.

Because of the uncertainty of many data search results, in many situations it is simpler and less risky to break a large search into several smaller searches. This will help avoid running out of memory or other glitches during a search. Of course breaking things into bite-sized chunks requires careful management of the overall search process to ensure nothing is overlooked or otherwise left out.

Auditability

Chain of Custody Records

Regardless of the collection method employed, strict chain of custody records must be maintained for all documents, data, and objects collected so that their authenticity can be assured. Without this assurance the data may not be reliable as evidence in litigation. Every collector, whether a third-party vendor, an internal corporate representative or outside counsel representative, should document procedures for accepting, storing, and retrieving documents, in the event that he or she may be called upon to testify.

Chain of custody records should be maintained for every "touch" of each item by a search operator. Because the volume of audit history that a large-scale collection project generates can be enormous, selecting tools and processes with automated audit history and the scalability to handle all the audit data is extremely important. Technologies such as Windows Event Logs or Syslog have been proven to scale adequately. Numerous native and third-party solutions exist to parse through, analyze, summarize and report on those types of data logs.

Audit History Logs

Audit history logs should ideally include a simple means of reporting on:

  1. Searches by custodian;
  2. Searches by operator;
  3. Search list;
  4. Searches by keyword/phrase/concept; and
  5. Searches by project.

The chain of custody of actual data collected is an outgrowth of the tracking method employed through the identification of custodians. Correct identification record keeping coupled with correct chain of custody should create a seamless link from the targeted organizations through possible custodians, actual custodians, and finally, data collected and preserved.

Documentation

Any collection, whatever the method, should be accompanied by detailed documentation. Different collecting organizations have different chain of custody methods and tracking forms. For example, a hard drive computer forensic expert will normally have chain of custody forms that resemble law enforcement documents, whereas a company in the act of shipping thousands of tapes will have documentation resembling a spreadsheet.

Original Media

Chain of custody for original media, such as hard drives, backup tapes, CD, DVD, etc. should include at a minimum:

  1. A unique media ID -- This becomes the core tracking number and all of the information extraction from the media;
  2. Date and time of receipt or collection of the evidence;
  3. The name of the person(s) collecting and/or taking possession of the evidence;
  4. A description of the type of evidence (8mm tape, hard drive, etc.);
  5. A description of what the evidence represents (Exchange server--Chicago);
  6. Any label information (exact);
  7. Serial numbers;
  8. Description of the physical location at the time of possession;
  9. Areas for transferring possession of the media within the collecting organization or to a vendor;
  10. Description of collection methodology;
  11. Detailed description of data harvested on site; and
  12. Checklists related to any on-site filtering of data during collection.

Large Collection Projects

Large collection projects that are conducted over long periods of time with many custodians often call for a database application of some kind to track the information about collected data. This database can be part of the identification tracking system, which would insure dynamic integration between identified custodians/key players and their collected data.

Single Point of Contact

A best practice during the collection process is to assign a single point of contact ("SPOC") to control the chain of custody record keeping. A SPOC assures that media IDs will be unique and that other information will be maintained consistently.

Consistent Application of Directions

Correct chain of custody must show consistent application of the directions of the identification team through the collection process. Chain of custody records should include the following:

  1. Detailed and descriptive checklists for any filtering, either manual or automated, performed on-site or at a processing facility;
  2. Any logs or printouts of the contents of a custodian's data storage showing files collected and not collected -- these logs should be maintained with the chain of custody forms;
  3. Reports detailing the progress of any automated collection application;
  4. Documentation of any refusal on the part of a custodian to release data; and
  5. Information as to selection of particular records or objects from systems such as databases.

On-Site Collection

Particular care must be taken regarding chain of custody when collecting on-site.

  1. The collector should verify the information collected by the identification team with the custodian -- corrections should be documented;
  2. Any partial collecting of data must be documented, in full, as detailed above;
  3. Chain of custody forms should include a form for the custodian to sign and date showing their participation, if any; and
  4. In the event that the custodian's work has been identified for ongoing collection, the collector must be able to identify what was collected on each visit and not collect the same, unchanged data multiple times.

In the case where search terms or methodologies are reused for more than one investigation, additional metadata, including as a minimum a "project identifier," should be stored with all search audits. The project identifier needs to be included on all reports and the following report should also be ideally available.

Source: EDRM (edrm.net)