Electronic discovery is about fact finding, accuracy, truth . . . and money. Corporate clients pay millions of dollars, in some cases, for the time and expertise of their lawyers and technologists. These costs are rising, the direct result of rising quantities of discoverable electronically stored information (ESI).
IDC keeps its finger on the pulse of the world's digital data heartbeat with an annual white paper, "The Diverse and Exploding Digital Universe," sponsored by EMC Corporation. It reports that in 2007, the digital universe ("information that is either created, captured, or replicated in digital form") totaled 281 billion gigabytes, or 281 exabytes. It is expanding so fast that it is beginning to exceed storage capacities.
We can begin to comprehend these statistics in terms of e-discovery ramifications if we consider that Fortune 500 businesses generate almost 20 trillion electronic documents each year. Then there's e-mail-the greatest part of ESI. One recent estimate is that worldwide, people send about 60 billion e-memos every day. Added to them are instant messages, text messages and faxes.
The amount of discoverable electronic data in some lawsuits and regulatory inquiries equivocates to thousands of boxes of paper documents, and it's multiplying. We all know the vast majority of it is worthless to either side. But it has to be dealt with. How do we make sense of spiraling volumes of ESI without running up impossible fees for our clients (and without losing all our hair scrambling to meet court deadlines)?
The answer in a word: pre-discovery. It's a fairly recent concept - about three years old. Already, it is becoming litigators' solution to the data explosion crisis.
Finding Our Way to the Root of the Matter
In the pioneer days of electronic discovery - circa 1998 - intrepid technologists boldly stepped into the breach, sometimes barely understanding what they were up against. Until the late 1990s, we never thought about bringing a hard drive to court. Attorneys were still researching on books, for the most part, and litigation teams literally were getting bankers' boxes of data to review. Electronic research and discovery weren't in the picture.
Then local scanning vendors had a bright idea: "Let's put all these file cabinets of paper documents on CDs." Litigation teams started to receive "file cabinets" of discovery documents on disk. Still, they had to print it all out and review it. The challenge became: How do we effectively manage electronic discovery?
Many people in the legal community dreamed up different ways to accomplish it. Everyone was running for the gold stream, and we didn't even know what the gold stream looked like. Clients were paying exorbitantly for discovery, and a lot of people who were appraising the discovery "gold" had no idea of its value. The firm - usually in an emergency - would hire data professionals to perform forensics. When they asked what it would cost, the forensics people would mull it over and come up with far-flung, inconsistent fees based on guesstimated hours, data volume and whatever technology they had available.
In one case during those days, I processed 750 megabytes of data, running it through a state-of-the-art litigation support tool. It took three weeks to process and deliver the data and we billed the client $30,000. We were charging per page.
By 2001, e-discovery vendors started to truly understand litigators' needs. For example, in any given suit, archived e-mail from diverse custodians may include not just ordinary business messages but privileged, confidential and personal memos. Dozens or hundreds of data analysts and reviewers can thrash about in these oceans of electronic information and sort out the fishes. But the organization's objective is to spend most of its time and money on litigation strategy and case preparation, not on data sifting. By 2005, the economics of e-discovery were becoming impossible.
During the past three years, attorneys and IT professionals have gotten smart. Clients were screaming, "I give! I give!" Overwhelmed by e-discovery costs, they were settling. Attorneys weren't getting as much case work.
They now have a better way in the form of pre-discovery. Until about 2004, a lot of people laughed at the idea. The size of typical data collections up to that point made pre-discovery seem irrelevant. ESI professionals just continued to run everything through the discovery process. But with the size of data collections now, and with the FRCP changes, strategists need to make culling decisions before they send the data to litigation support vendors.
Organizing a project properly is the key to successful pre-culling. To achieve it, it's important to bring the right tools in-house. The system you engage should actually force you to adhere to its first-pass culling procedures. It is protecting you, ensuring preservation and accuracy of the ESI.
When selecting litigation support technology, you should scrutinize its ability to facilitate pre-discovery. It should be able to effectively cull away redundant and irrelevant documents -- usually the greater portion of the preliminary data collection -- before the review team sets to work.
First, the system must put all of the ESI into a reviewable format. E-mail is the major type of electronically stored information. Right now, Microsoft Outlook .PST files account for probably 80 percent of e-mail. (In five years, we should expect to be dealing with four or five major e-mail formats including G-mail, Apple Mail and Yahoo Mail.) Another common ESI format today is .NSF, the Lotus Notes database file storage format.
Once it's reviewable, de-duplication of the e-mail begins the pre-culling process, in most situations. Effective de-duplication software can quickly eliminate redundant e-mail (for example, courtesy copies of the same e-memo sent to a dozen or more individuals) and other documents from the data collection. In the average gigabyte of data, we've found that 20 to 50 percent of the e-mail and other items are duplicates. In de-duplication, you obviously have to be sure you're discarding true duplicates, not unique material.
After you have cleaned up the junk and provided de-duped, accurate sets, you then filter and review the data. Some professionals like to de-dupe and filter at the same time - and that should work correctly. It depends on the nature of the ESI involved in the case and the client's preference. But we've found it a better practice to de-dupe everything first and then filter.
Your system should be flexible and powerful. When you receive your e-mail data set, the program should be able to make a "snapshot" showing you how many messages you have; how many attachments; attachment extensions; .PST sizes; most prominent TO, FROM, CC and BCC individuals; etc. It should be able to handle loose .MSG files and files from multiple custodians. You should be able to easily extract the messages you need and quickly regenerate your .PST and .NSF files after review.
Addressing Client Needs
Keep in mind that while ESI is exploding, client needs remain simple. The first thing they demand in pre-discovery, just as in old-fashioned discovery, is accuracy. We're now applying an additional process to the electronic data, which requires additional care.
The second thing they want is preservation. We need to preserve what we do with the data initially and preserve everything through the timeline, the review and the report. All along the way, internal communication among IT and other departments is vital. If mishandled, ESI can get a corporation into big trouble.
Price is probably the third most important consideration for clients. A fourth would be ease of use of the technology system they select.
With pre-discovery, lawyers can cut down review time tremendously. They're addressing the same data sets, but they're no longer sending 500 gigabytes to the vendor and paying $1.5 million; they're sending 2 gigabytes. The vendor can charge for processing just the relevant data. Pre-discovery can make ESI processing manageable and cost-effective.