eDiscovery Processing: Audit

Audit and Chain of Custody

The key to an effective chain of custody is to have a set of procedures which are followed in practice. The procedures should be in writing and the steps ought to be documented. Chain of custody cannot be parsed into stages (data collection, processing, review, etc.); rather, the entire process must be taken into consideration from soup to nuts (or A to Z).

On a physical level, the simple act of restricting the number of hands that touch the evidence is the first step in avoiding handling errors and mistakes. On the technology side, the software used to handle the data should have appropriate reporting and audit capabilities. Automation with robust reporting capabilities is the best way to build a solid chain of custody. To that end, the entire process from the data collection to the review platform should be automated as much as is practical and the human role reduced.

Fortunately most of us have seen two good models of this process. The first is the police evidence lock-up featured on every police-related television show from Dragnet to Law and Order. In each of these there has been a sleepy-looking officer behind a counter who receives all evidence, catalogs it by item description, date of receipt, case number, the name of the officer providing it, and the inventory number. When possible the items are tagged or put in Ziploc bags labeled with a summary of the catalog information. Then the evidence officer stores the item in a locked area with limited access. This is an excellent model for procedure steps.

The second model of the process that many have experienced is the online FEDEX or UPS tracking systems. When a shipper uses one of these services, the shipper sends the recipient an e-mail with a tracking number. By inputting the tracking number on the FEDEX or UPS websites, the recipient can track the shipping progress of the package across the country or across the world. These tracking systems are updated with all key progress information, usually within minutes or a few hours of the event. This is an excellent model for transparency and auditing because interested parties can see the progress from the Web whenever they would like. Combining the important parts of these two models yields a best practices model for electronic discovery.

Practices to Consider

It is often appropriate that the collection team be trained in computer forensics to insure that the collection process is done according to forensic protocols so that all data collected is properly preserved and that you do no harm to the computer. The level of training will depend upon the complexity of the collection and computer systems. The trend is for automation of the entire collection process in order to avoid collection errors and chain of custody problems.

Have three teams or subsets of each electronic discovery provider group. The first team is the forensic investigators. It is their job to collect the evidence and document that process. The second team is in charge of logging, inventorying and safeguarding the evidence. The third team is in charge of copying the original data, fingerprinting it (via MD5 hashing) and analyzing the data. While these teams may overlap, it is generally a best practice to keep the second team small and differentiated from the other two teams. This is important because the task is very different and calls for a different skill set.

The logging and inventory personnel need to be among the most organized in the organization. The logging and inventory processing are very likely to be subject to the most challenge in litigation. When the evidence (computer or media) is physically collected, document the collection by having the collector sign a form indicating: a) the date, b) time, c) name of the person(s) from whom the evidence was collected; and d) a description of the item(s) collected, including unique identifiers (manufacturer name and serial number if possible and at least the manufacturer name and model number when the serial number is not apparent). (Important note: If the evidence is shipped to the electronic discovery provider, it should only be shipped via a carrier that provides excellent shipping and tracking documentation, insurance and high reliability. For these reasons, we generally ship via companies such as FEDEX or UPS.)

Bonded point-to-point carriers can also be utilized depending on security needs and cost. A copy of the form should be provided as a receipt to the person/company from whom the evidence was collected. Note that if a trained forensic investigator collects the evidence, he or she should complete a more lengthy form which also includes the address of the premises and lists the names and versions of any hardware or software tools used to make the collection. This form should also provide space for notes to capture the kinds of details that would help the investigator recall the events surrounding the collection should he or she ever need to testify.

As soon after the collection as is practical, the electronic discovery provider needs to take physical custody of the evidence. Following its written procedures, the employee or employees responsible for logging the evidence collection should be given custody of it immediately. We recommend using a database to capture the log information. (While not yet a best practice, in the ideal electronic discovery environment, this database log would be available to clients and other interested parties through a secure log-in via the Web.) The headings should include at least the following:

  • Electronic discovery identification and inventory number (we strongly recommend using a barcode labeling system)
  • Date received
  • Matter name
  • Client name
  • Client/matter number
  • Name of person/company/shipper delivering evidence
  • Description of item(s) (including manufacturer name, model number and unique identifier/serial number whenever possible)
  • MD5 Hash of each piece of media where possible (electronic fingerprint)
  • Name of person receiving evidence (Logged by)
  • Check Out (check box Yes/No)
    • If "Yes",
      • Date
      • Reason
      • Custodian name
        • Name of recipient (used when evidence shipped form electronic discovery provider to anyone)
        • Name of shipper
        • Shipper's tracking number
        • Date of shipment
        • Date of receipt
      • Check-in date


Copying, Fingerprinting and Analyzing Original Data

As soon as practical after logging, inventorying and safeguarding the data, the original evidence should be forensically copied using a copying tool that does not change the data in any way. Note that many forms of duplication do change the data. Even booting a computer or hard drive with its usual operating system will change the data. It is important to only use software and hardware tools that are certified for non-intrusive duplication.

It is also important that these tools only be operated by persons who have been trained to operate them. As soon as possible after collection, the evidence should be handed off to a subset of the electronic discovery provider's team who are charged with logging and safeguarding the evidence. In a large organization this team should be an entirely separate set of personnel from the collections and analysis teams.

As soon as possible after collection, using a non-destructive hashing tool, an MD5 hash[1] should be obtained from the media. This hash is a unique electronic fingerprint that allows others to verify that the original evidence was not altered from that point forward and that duplicates of the media are truly identical. This is especially important in a forensic media collection. Not only can electronic discovery tools analyze files, they can also locate files and file fragments that were deleted or are in unallocated hard disk space. The MD5 hash of the entire media helps prove the authenticity of the original media and copies and lays the foundation for claims related to information found on the media that are not in the traditional file structure.

The MD5 hash of the original media should be compared to the MD5 hash of the duplicate media. This step is documented by most media copying hardware. The receipt document generated by the hardware should be kept with the original evidence, and a copy should be kept with the duplicate media.

If possible, the electronic discovery process is subject to less criticism if the original evidence is preserved and not put back into production. If the owner requires the immediate return of the evidence, this transaction should be documented in writing with a cover letter and shipped using a highly reliable carrier. The tracking number should be contained in the cover letter and the return process should be detailed in the logging database.

After the forensic duplicate is made, the original should be tagged with a summary of the logged information or with an identifier that ties it back to the evidence log or both. (We recommend using a combination of written tagging on the evidence bag and a bar code.) The evidence should be placed in a sealed bag if possible. The evidence should then be secured in an environment with limited access and safeguarded from foreseeable mishaps such as fire or the accidental activation of fire sprinklers.

Whenever the original evidence is accessed, it should only be available to the small team in charge of logging and securing the evidence. Any activities involving the original evidence should be logged.

As part of the forensic duplication process, it is a best practice to create an MD5 hash of every file and .pst. When .pst or other compilation-type files are separated into messages or smaller segments, an MD5 hash should be created for every message or segment. Again this is a way to confirm that future duplicates have not been altered and it is the primary way that native-file productions can be tracked.

When a matter closes, the original evidence should be either returned to the client/original owner of the information or stored along with the paper portions of the file and subject to the electronic discovery provider's data retention policy. The lawyers directing your work should be consulted regarding the disposition of working copies of data. This includes all duplicates and analysis sets, including those on the network.

Generally the attorneys will direct you to deal with this data in one of three ways. They may direct you to place the duplicates and working data into the applicable data retention scheme for the rest of the file materials, they may want it offered to the client/owner or they may ask you to destroy the data. No matter the disposition, this process should be documented in the logging database.

Unfortunately, most of us do not have access to a software tool or suite that can perform all of the forensic collection and analysis. The standard today is to use a collection of individual tools. It is a best practice to re-hash or re-fingerprint a sample of your data files every time files are put into a new tool or environment, to confirm that the files remain identical.

Finally, in native-format productions where an exchange of data is being provided instead of an exchange of paper or .tif images, it is important to perform one final MD5 hash of all data files produced (as well as one of all data files received). These MD5 hashes are sometimes used as a modern equivalent of a Bates stamp. The indexes of these hashes will be crucial to determine where a given file came from.

Benefits of Annual Audits

Every electronic discovery provider should audit its own procedures and logging methods once a year and consider augmenting them. Providers should consider having an annual audit conducted by an outside auditor from an IT consultancy, a large accounting firm or a non-competitor electronic discovery provider.

The audit will have three benefits. First, it will confirm whether or not your current procedures are being followed. Second, it will be an opportunity to carefully consider your procedures and whether the procedures should be revised. Third, a positive audit report can be a powerful sales tool. Note also that some clients are requiring audit reports before retaining electronic discovery vendors. Finally, the audit process also provides the opportunity to review procedures before they are questioned by opposing counsel in litigation.


  1. MD5 was developed by Professor Ronald L. Rivest of MIT. To quote the executive summary of rfc1321:

    [The MD5 algorithm] takes as input a message of arbitrary length and produces as output a 128-bit 'fingerprint' or 'message digest' of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given pre-specified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.

    In essence, MD5 is a way to verify data integrity, and is much more reliable than checksum and many other commonly used methods.

Source: EDRM: (edrm.net)