Hunting for Data: Uncovering the Hidden Gems of Structured Data in eDiscovery
Attorneys involved with e-discovery know that they need to manage all different kinds of data. However, many lawyers either don't know much about structured data or they don't spend much time thinking about it. This can lead to problems throughout the life of the litigation.
Structured data is often part of the information that is relevant to a matter, and it cannot be ignored during discovery. More than that, though, this type of data may contain a treasure trove of information that can be very valuable to a litigator. The trick lies in knowing what to look for, then mining the information for precious nuggets that can inject new perspective and solid evidence into your case.
Defining Structured Data
Structured data is information that resides in discrete pieces (commonly known as fields) that are organized and grouped within a system. The data is stored in a consistent format (or structure) and can be placed into rows and columns. A simple example of structured data is a contact list. Each piece of information such as name, phone number, address, city, state, and zip code is stored in separate fields and together compile a record for each individual contact. The contact list can be sorted by name, or by state, or by any number of pieces of the data available. According to The Sedona Conference's recently published "Database Principles," "Information stored in databases differs fundamentally from discrete unstructured data, because unstructured data files tend to be static and self-contained."
There are several types of common business software systems that may contain structured data. These are places the legal team should be looking when they prepare for discovery:
- Web servers, including e-commerce sites, point of sale systems and inventory tracking systems;
- Enterprise resource planning systems, such as business process control and management, accounting, human resources, inventory management and customer relations management;
- Business intelligence systems, which can consist of data warehouses, reporting servers, dashboards and analytical tools;
- Content/information management systems like SharePoint, Lotus Notes, IBM FileNet, EMC Documentum and IBM InfoSphere;
- Third-party operated systems, including Software as a service (SaaS) and IT processes that are outsourced; and
- Legacy systems that are no longer used by the company but are maintained for reporting and other purposes.
Amidst all the other stressors of e-discovery, the idea of tackling structured data may feel overwhelming. But there really isn't an option. This type of data is potentially responsive and must be preserved and produced. It can play a far broader role than that, though. Structured data can help the team gauge the scope of an issue and move forward in a more knowledgeable way. Structured data can also provide the legal team with an overview that will allow them to see if there are gaps in the data. Your team can also use structured data to analyze proof from the other side.
Structured Data and E-Discovery
As an example of structured data, consider an employment case concerning hiring practices. The responding organization can mine their recruitment and staffing systems to isolate the information relevant to the certain department or job type that is the focus of the lawsuit. The systems may be something as large as an internal and external facing website that allows job applicants to submit their resumes online and then routes the resumes and job applications to the hiring manager and Human Resources and follows the application until the job is awarded. Or, it could be a series of spreadsheets kept by the HR department.
By relying on the data from these systems and data analysis tools the legal team can mine the applications and existing documentation in a systematic fashion instead of reading through potentially thousands of job application files to create a picture of what has occurred at the organization. The goal is to get the data you need, while winnowing out the data you don't using technology already in use in the organization.
When working with structured data like this, there are several steps you should take and questions you should ask yourself.
Step 1: Ask yourself, what the information at issue is and what questions are you trying to answer. Hold a brainstorming session with those most familiar with the facts of the case and create a bubble map or diagram that visually links the different relationships together. This is most effectively done on a whiteboard as you brainstorm.
Step 2: Identify and investigate the systems that contain the relevant structured data so you can find where it lives. Look at a number of factors, including the system owner, format of data, available date ranges, the backup and archive process for the system, the export capabilities for the data system, existing canned reports and user rights.
Step 3: Create a data map and define links between data. Links are pieces of data that are common to two data sets, which can be used to relate those two sets. A data map details how all the data relate to each other and how they are stored in the database.
Step 4: Create a data dictionary and schema. The dictionary should include table and field level lists with descriptions that detail what information is in each field. The schema, or relational diagram, you develop will help detail relationships between tables and fields.
Step 5: Determine how the data will be presented for review and production. Review tools can include customized applications, canned report reviews, spreadsheets and graphical reports. Production options may consist of delimited text files, common database programs, more complex database tools, replicated reports or screen captures in TIFF format.
Ideally, the legal team will be able to determine the production format early on during the Meet and Confer. Knowing the production format sooner rather than later can save you a great deal of time and effort throughout the discovery process.
Step 6: Extract the relevant data into a common format for review and analysis. Identify common links between data sets. You will also need to bring the data into the discovery system. The discovery system could be an entire website created specifically for this set of data, or it could be as simple as an Excel spreadsheet that the team can work on directly. The type of discovery system you use will depend on the complexity, size and necessity of the case.
Don't forget that you will need to validate the import process and the links to ensure you have maintained the integrity of your data.
Step 7: During the data review and analysis, you will need to satisfy discovery requests, of course. But the value of structured data can be leveraged far beyond that. You can use the data to gain invaluable insights into the case and perform a gap analysis to identify missing information. For instance in the employment litigation you can identify all jobs from the relevant department where a candidate was hired, but no evidence exists of interviews for the other candidates. You can use the data to quantify the facts of the cases and cut down on legwork, saving time and money. You can also analyze data produced by the other side to find weak points in their case.
While all this may seem daunting, it shouldn't. Attorneys need to know and understand their data, and there are tried and true processes for using extracting and using structured data. The right approaches will allow you to see information and patterns in ways you couldn't before.
Amy Dove, PMP, is manager of client services for IE Discovery. She works with clients to develop proactive enterprise-wide centralized discovery management systems, from planning and collection through processing, review, and production. Dove routinely provides guidance on extracting and producing structured data in complex business litigation. She is a certified project management professional. Dove can be contacted at email@example.com.