Metadata: The Ghosts Haunting e-Documents
Mercer Law SchoolBy David Hricik By Chase Edward Scott
Metadata is "data about data."1 Although it sounds quite modern, one form of metadata is no doubt familiar to every lawyer: the "fax band" on a document received by facsimile that shows the time and date the fax was received, the number from which it came, and the number of pages sent. Thus, a fax band is metadata since it is data about data. And even this simple form of metadata may reveal a lot. For example, it could be used to show that a party's claim that she did not receive a document on a certain date is incorrect.
Metadata is not new, but it has become pervasive in the digital world in which lawyers (and their clients) live. Many programs commonly used in the office create data about data and then save that unseen information along with the visible text of the document in a single file. Put simply, "invisible fax bands" commonly accompany many of the electronic documents we create on a daily basis. This unseen information is typically transferred along with the document in which it is embedded unless removed prior to transmission. Thus, generally, any time the file is transmitted, the invisible "fax bands" are also sent.
But rather than simply revealing seemingly innocuous information, such as the time and date the file had been prepared, metadata often reveals much, much more. For example, many software programs permit an author to "track changes" to the text, to save "multiple undoes" in case the author later decides to "undo" revisions made long ago, or even to insert "invisible" comments into the file. Such data could reveal a wealth of information to recipients of the electronic file, potentially resulting in a significant impact on negotiating positions, litigation strategies, and numerous other sensitive scenarios.
Recently, a lawyer relayed a purportedly true story to one of the co-authors that demonstrates the potential risks of exchanging files with embedded data. He had been negotiating a contract against a well-known software maker which, for purposes of this article, will be called "Macrosoft." During negotiations, the lawyers for each side used a common word-processing program, Microsoft Word, to edit and propose revisions to the contract, and they utilized the program's "track changes" feature to allow the lawyers to see the specific changes proposed. They e-mailed the electronic draft with this embedded information back and forth to each other between rounds of revisions. After receiving one such draft from Macrosoft's counsel, the lawyer made a few easy mouse clicks to reveal, without using anything but Microsoft Word's inherent functions, "hidden" internal comments from Macrosoft's business personnel concerning the terms of the contract, negotiating positions, and bottom-lines. Thus, had Macrosoft subsequently insisted that a noncompete clause was extremely important to close the deal, the lawyer would have been able to tell if this were true or whether it was simply a negotiating ruse. Clearly, metadata is an important consideration in today's legal environment.
This article, the first of a two-part paper, explains how metadata is created and embedded in some popular programs and analyzes what obligations, if any, lawyers have to remove this embedded material from documents that they create or send on their clients' behalf. Did Macrosoft's lawyers, for example, violate duties to their client by sending embedded data along with the text of the contract to opposing counsel? This article also provides a number of useful tips on how lawyers can remove metadata from documents created in some of the more popular office software and avoid similar situations in their own practice.
The second installment (in the next issue of the Georgia Bar Journal) will analyze the recipient's duties. If a lawyer receives a file containing embedded data that reveals confidential or privileged information of an opposing party, is the lawyer bound by the same obligations that apply as when documents in a misaddressed envelope are received or, conversely, is the lawyer free to use and review the embedded information?
The Purpose Of Metadata
Obviously, software does not embed hidden data into documents to purposely cause the disclosure of confidential information. Although the type and amount of embedded data stored will vary by the particular program used, the primary function of metadata is utilitarian: it is designed to help users revise, organize, and access electronically-created files. Typical metadata includes, for example, information about the person who authored the document and the location (drive, folder) of where the file was saved. In addition, a file can include metadata records of past revisions. A person can, as a result, examine the changes that have been made to a file and compare them visually to any hand-written revisions to ensure that they have, in fact, been made. Thus, embedded data serves a useful and legitimate purpose.
Metadata In Microsoft Word
Microsoft Word has rightly been called the "ubiquitous" software program.2 Lawyers commonly use Microsoft Word to create documents, and these files are regularly e-mailed in electronic form to clients, third parties, and opposing counsel. Unfortunately in some respects, embedded data is ubiquitous in Word. Thus, the risk in electronically transferring sensitive metadata through these documents is substantial. The following outlines some of the most common embedded information that is found in Word documents:
File Properties Information
Some of the most basic metadata in a Word document can be viewed by looking in different menu items in Microsoft Word.3 A key location is in the "Properties" item, located in the "File" menu. The "Properties" for a particular document can reveal the author, creation dates, and other information. For example, this particular article (as of about halfway through the writing process) contained the following information under File/Properties:
The metadata on just that single screen reveals that the file was created in August and was still being worked on in October, 2005. It also reveals that the document was in its 44th revision (meaning it had been opened and closed 44 times) and had been edited for a total of 205 minutes.4 Had this document been work product for a client and had the author transmitted the file to the client in electronic form, the client would have been able to access this metadata to tell whether the lawyer had worked on the document for as long as indicated in the lawyer's fee statement. If it had been a report prepared by an expert witness sent to opposing counsel, the attorney could have discerned how long the expert had spent drafting the report. If it had been a brief prepared by an undisclosed attorney and forwarded to opposing counsel, the author's identity could have been revealed.5 Metadata matters.6
Track Changes Feature
More troubling than the basic metadata found in the File/Properties screen is the other unseen data that can accompany a Word file. Foremost, "track changes" is a feature within Word that creates a record of every change made to a document. It has many uses: lawyers who exchange drafts of contracts, as mentioned in the introduction, can turn on this feature to allow prior revisions of a proposed contract to be reviewed during negotiations; word processing personnel may enable "track changes" so that they can review and ensure that they have made each handwritten edit desired by a lawyer; and so on.7
Complications can occur, however, when the author or editor of the document does not know that the "track changes" feature was turned on. Such ignorance may be commonplace because, depending on the settings of the program, Word may not actually display the tracked changes on screen. In such a case, the user would have to specifically enable an option to view those changes. For example, this paragraph was written with the track changes feature enabled.8 What you are reading now is the way the paragraph looked when I was finished editing it (i.e., even though "track changes" was turned on, Word did not reveal those tracked revisions on-screen.) Here, though, is what the paragraph looked like when the option to view tracked changes was enabled:
If the file had been e-mailed to someone, the recipient could have easily revealed the changes and seen the revisions shown above. If this document had been a contract instead of the present article, the metadata could have revealed to an opposing party the negotiator's mental process in working through revisions previously made to key proposed terms.9 Such information could clearly be valuable to the opposing party in deciding its own negotiating strategy.
Fast Saves Feature
Another form of embedded Word data is created by the use of "Fast Saves." This feature enables the user to quickly save the document without having to take the time to perform a full save. However, "fast saves" only append the changes to the end of the document file rather than replacing the actual edited material. In other words, fast saved documents may retain information that the user believes has already been deleted. Thus, when "fast saves" are enabled, "deleted information remains hidden within the document."10 Opposing counsel who receives a file that has been created with "fast saves" enabled can easily open the document and recover all of the previous revisions.11
Comments Feature
Embedded data can also be found in Word documents in the form of "comments." Comments are an incredibly useful feature for collaboration. For example, the authors collaborated on writing this article. If one of the authors had wanted to, he could have made a comment to the other to explain why he suggested a revision, included a certain concept or needed clarification on some passage. Those comments are embedded within the file and accompany it whenever it is exchanged. Below is a screen shot of a few lines from a chapter of a book one of the authors co-wrote as seen in Word with the view set to show tracked changes and comments:
Thus, like tracked changes, the "comments" feature of word can leave hidden data within an electronic document that may be valuable to opposing counsel.
Versions Feature
A final example of a type of hidden metadata in Word is created by the software's "versions" feature. If "versions" is enabled, each time the file is saved, a new version is created and stored, leaving prior versions of the document intact. Thus, once again, if the file is transmitted to an opposing party, she could review every prior version of the document to see what changes had been made to the document.12
The Duty to Avoid Disclosing Embedded Confidential Information
All of the above-listed features from Word are useful to lawyers or their word processing personnel. Lawyers need to be aware, however, of the fact that these tools embed hidden data within the file. Further, they also need to recognize that their word processing staff may enable certain features without the lawyer's knowledge.13 For example, if a lawyer is unaware that her secretary had enabled "track changes," and if the secretary failed to appreciate the problems created by transmitting the file with track changes still embedded, then disaster could strike.
But the risk of unintended disclosure has always existed, just in a different form. Not too long ago, the primary risk was that a letter intended for a client would instead be mailed to opposing counsel.14 Similarly, a lawyer might have made handwritten comments on a contract proposal drafted by the other side, and, though intending to forward the document to the client for review, may have inadvertently mailed or faxed it to opposing counsel.
In the digital age, however, new methods for creating, editing and transmitting documents have increased the risk of unintended disclosures. Instead of misaddressing envelopes, for instance, today lawyers and their staff can inadvertently send e-mail intended for a client to opposing counsel or a third party, or may accidentally forward to opposing counsel an e-mail received privately from a client.15 And, as discussed above, electronic files can now reveal more information than drafts from the past - they "can reveal a cache of information, including the names of everyone who has worked on . . . a specific document, text and comments that have been deleted, and different drafts of the document."16 Thus, due to the inherent dangers involved with transmitting such metadata, it is important to discuss what professional duties lawyers owe to their clients to safeguard this information from disclosure.
To aid this discussion, it is helpful to emphasize the distinction between confidential information that a lawyer has a professional duty to keep in confidence and information that is privileged under the attorney-client privilege. The attorney-client privilege protects against the forced disclosure of communications between the lawyer and the client.17 The privilege is a qualified one, however, because only confidential communications between the attorney and client are protected from disclosure. Thus, the privilege does not apply to information learned by the lawyer from third parties or even to the lawyer's conversations with the client if those conversations were conducted in the presence of others.18
While the attorney-client privilege protects against the forced disclosure of privileged communications by an opposing party, lawyers themselves are restricted, under duties of professionalism, from disclosing "confidential information" unless authorized to do so by their client or judicial authority. And the "confidential information" covered by this duty is far broader than attorney-client privileged information because it encompasses all information "gained [by the lawyer] in the professional relationship with a client."19 Given this broad definition, there is a substantial risk that metadata transmitted to a third party by an attorney will contain confidential information. Accordingly, a lawyer who knows a document contains embedded information generally has a duty to remove it before transmitting the file.
But what about a lawyer who unknowingly transmits a document with embedded confidential information? Has that lawyer violated the duty of confidentiality? Some may argue that because "every one knows" about metadata, any lawyer who fails to remove hidden confidential information has breached his or her professional duty.20 In the authors' experience, though, the opposite is true: the vast majority of the nearly 1,000 lawyers Mercer faculty have spoken to about this issue had never heard of metadata, let alone understood how to avoid creating such information or how to remove it. In further support of this less-than-scientific observation, documents that contain embedded data have routinely shown up on the web - some were even posted by large-firm lawyers who ostensibly should be the most educated about embedded data. In any event, the existence of metadata and the dangers it presents for unintended disclosure are becoming more widely known. As a result, lawyers will soon, if the time has not already arrived, be unable to avoid negligence claims or defend against bar complaints by pleading ignorance of the risks that embedded information creates. Thus, attorneys should take every effort to prevent the transmission of confidential information. Some simple methods to aid in this effort are detailed in the following section.
How To Avoid Creating And How To Remove Embedded Data
There are several approaches to addressing inadvertent transmission of metadata. This section surveys some of the means to do so.21
Avoid Creating Embedded Data
Obviously, the easiest way to avoid the disclosure of embedded confidential information is not to create it in the first place. As an initial matter, it is important to note that simply saving information to a hard drive or networked drive may retain information about the computer or network to which it is linked. To ensure that this metadata is not included in a document sent to a third party, attorneys should re-save each document to a 3.5" floppy disk, the desktop, or to a flash drive using "Save As" and using this copy to distribute to opposing counsel.
Beyond this simple tip, Microsoft and other developers have recognized the importance of maintaining the confidentiality of metadata in certain situations and have, in response, provided users with in-program options allowing them to alter the types and amount of embedded information that will be stored in their documents. The following describes simple measures lawyers can take to avoid creating or to limit the creation of embedded data when using some of the more commonly used office programs:
Microsoft Word
Under the "Tools" menu, select "Options" and click on the "Security" tab. The resulting dialog box allows the user to encrypt the file, edit privacy options, and change the level of macro security. Checking the box "remove personal information from file properties on save" prevents the personal information associated with your computer, network, or registration information from attaching to the document. Thus, this option should be selected when the lawyer works on any potentially sensitive documents in Word that may be transmitted to outside parties.
Other information, such as the author of the document, contained in the "Summary" tab under "Properties" within the "File" menu, may also be considered sensitive and inappropriate for opposing counsel to view. The lawyer can remove any of the offending information from the document by simply deleting the entries in the text boxes and clicking "OK" to save her revisions.22
As noted above, use of the "fast saves" feature of Word can leave hidden data in the document. To turn off "fast saves," go to the "Tools" menu, select "Options," and click on the "Save" tab. Under the "Save" tab, ensure that the "allow fast saves" box is not selected.23
As also previously discussed, Word allows users to save multiple versions of the same document, thus increasing the risk for unintended disclosure of information contained in earlier versions. To determine whether any older versions of a file exist, go to the "File" menu and click on "Versions." Any old versions attached to the document will be listed by the date/time and creator of the saved version. To remove a version, simply click on the offending entry and select delete.24
Microsoft PowerPoint
Similar to Word, Microsoft PowerPoint will track, via normally hidden metadata, personal information such as the identity of the author of the document. To remove this metadata from a PowerPoint file, go to the "Tools" menu and select "Options." Under the "Security" tab, ensure that "remove personal information from file properties on save" is checked.25 To delete the user name and initials associated with the file, click on the "General" tab in this same submenu. From here, the user can simply highlight and delete the unwanted information.26
Finally, it is important to note that PowerPoint documents often contain embedded files from other programs which may, in turn, contain their own metadata. To ensure that the embedded objects are metadata free, right click the object to be embedded and select "cut." From there, select the desired slide, go to the "Edit" menu and select "Paste Special."27 This newly created image will be free from sensitive information concerning its source.
Microsoft Excel
Many of the same processes used to eliminate metadata from Word and PowerPoint files can also be used to eliminate personal data from Microsoft Excel. However, Excel presents several unique methods for retrieving personal data that attorneys should be aware of prior to sending workbook files to opposing counsel. For instance, in Excel, users have the ability to hide individual, rows, or columns of cells from view. To view these hidden cells, hit Ctrl+Shift+Space Bar to select all of the cells in the workbook, then go to the "Format" menu and find the submenu for "Row." Under this submenu, select "Unhide." Repeat this process for the "Column" and "Sheet" submenus. This should make all hidden cells and sheets visible and capable of being deleted if the information contained therein is found to be confidential.28
Excel users can also link formulas between multiple workbooks. Though a useful tool, these formulas may contain metadata concerning the documents to which they are linked. To remove this potentially sensitive data, highlight the linking formula, right click, and select "Copy;" following this, go to the "Edit" menu and click "Paste Special;" select "Values" and click "OK." Note that this will result in the formula being deleted from the document; however, the resulting data will remain in the workbook.29
Removing Embedded Data Before Transmitting
While the above methods can help reduce the amount of metadata created and stored in the lawyers' electronic files, attorneys should also consider taking additional precautions to remove any other embedded information that has already made its way into a file before transmittal.30 There are a number of methods to accomplish this task. Because reasonable care is necessary to satisfy the lawyer's duty of confidentiality, the nature of the communication at issue will indicate what steps are required for particular communications or practices.
Large software makers know about the problems that unintentional transmission of metadata can create for lawyers, and no doubt others, and have updated their programs with additional functionality to avoid creating, avoid transmitting, and/or to remove this embedded data. Microsoft created a free add-in, which can be downloaded from the company's website and which can effectively eliminate most sensitive information from documents created in Microsoft Office programs even where the document was drafted with a metadata-creating feature turned on.31 (Remember: metadata has utility!) The installation of this add-in will create an additional option to "Remove Hidden Data" within the "File" menu in your Microsoft Office programs:
After selecting this option, the user will be asked to enter a file name for what will become the "clean" version of the document. Once a name is provided, the user will click next to start the scan:
When the scan is complete, a text file will open that contains a summary of the scanning results.32 The end result is an effective, easy, and free solution to the problem of metadata transmission via Microsoft Office documents - you just have to remember to use it!
Saving a document in Portable Document Format (pdf) will also reduce the amount of metadata stored in the file. But this process does not eliminate metadata entirely.33 For many purposes, however, simply saving a document into pdf format may suffice. Documents in pdf format often cannot be easily modified, though, thereby reducing the efficiency and functionality of document exchanges using this method.
Additionally, there are also a number of commercial software "scrubbers" available for purchase.34 While these programs have differing degrees of functionality and integration with other software (such as Microsoft Outlook integration), they can all be used to scan files before they are transmitted and remove the embedded metadata.
Unintended Disclosure Agreements
A final, less technical manner to avoid the problems associated with embedded data is to have an agreement in place with opposing counsel where the parties acknowledge beforehand that any transmission of confidential embedded data is unintentional and that any documents identified as containing such information should be deleted. Obviously, the efficacy of this option relies upon the trust of counsel, and where the mere viewing of the information would "let the cat out of the bag," such agreements may be insufficient. Thus, either ensuring that embedded data is not created, or ensuring that it is stripped out before a file is sent will normally be the only effective way to address the problems of embedded data.
Conclusion
Hopefully, this article has educated the reader about what metadata is and how the lawyer should treat confidential embedded information, including some easy-to-use methods to reduce or eliminate metadata from documents created in the more popular office programs. The next installment of this article will address an issue that has split authorities: what happens if a lawyer fails to take the steps recommended in the current article and transmits a document to opposing counsel that contains metadata? Is the recipient free to look, or not?
Reprinted with permission from the Georgia Bar Journal, Volume 13, Number 5, February 2008. Copyright State Bar of Georgia. Statements expressed within this article should not be considered endorsements of products or procedures by the State Bar of Georgia.
David Hricik is an associate professor at Mercer Law School who has written several books and more than a dozen articles. He can be reached at hricik_d@mercer.edu.
Chase Edward Scott is a JD candidate at Mercer Law School hailing from Chattanooga, Tenn. Following graduation in 2009, he intends to practice Intellectual Property or Internet Law.
1 Definition of Metadata, http://wordnet.princeton.edu/perl/webwn?s=metadata (last visited Dec. 4, 2007).
2 Andrew Beckerman-Rodau, Ethical Risks from the Use of Technology, 31 RUTGERS COMPUTER & TECH. L.J. 1, 32 (2004); Brian D. Zall, Metadata: Hidden Information in Microsoft Word Documents and Its Ethical Implications, 33 COLO. LAW. 53, 53 (Oct. 2004) (describing legal profession's widespread adoption of Microsoft Word).
3 Interestingly, most metadata is stored in the last blank space of a Word document. If, for example, you select all of a Word document except its last space (which will appear to be blank) and then copy and paste that material into a new Word document, most metadata will not follow along. For a more technical discussion of how metadata is embedded in a Word document, see Zall, 33 COLO. LAW. at 54.
4 To be clear, the file could simply have been open on the screen for 205 minutes. Thus, the amount of time indicated does not necessarily mean that the file was being worked on for all of those 205 minutes.
5 The kind and amount of information stored in the "Properties" file can be customized. To see whether your version has been customized, click on the "Custom" tab at the top of the "Properties" dialog box.
6 There are a number of other sources of metadata. For example, other tabs in the "Properties" dialog box depicted show where the file was stored on the author's hard drive and other information.
7 See generally, James Veach, Commutation Agreements: Drafting a Clear and Comprehensive Contract, 854 PLI/COMM 43 (2003) (noting that track changes can be used to aid in the drafting process).
8 To turn on "track changes," go to the "Tools" menu and to "track changes." To see whether an open document contains tracked changes, turn on track changes and then ensure that you have selected the "final showing markup" on the "Review" toolbar that appears.
9 See Zall, 33 COLO. LAW. at 55-56 (collecting hypotheticals on how metadata could harm clients and lawyers when transmitted to opposing counsel).
10 Toby Brown, Special Handling: How Paper and Electronic Files Differ, 21 GPSOLO 22, 23 (Sept. 2004). This is done by selecting "Save As" from the "File" menu, then selecting "Tools" and then "Save Options." One option is "Allow Fast Saves." Fast Saves is "very useful in the event of hardware failure because it reduces the chance of losing changes to a document." Id.
11 Id.
12 See Zall, 33 COLO. LAW. at 54-55.
13 Georgia Rule of Professional Conduct 5.3(b) requires lawyers with direct supervisory authority over a nonlawyer to "make reasonable efforts to ensure that the person's conduct is compatible with the professional obligations of the lawyer[.]"
14 See generally, Am. B. Ass'n. Formal Eth. Op. 92-368 (1992) (describing such scenarios).
15 See generally, Steven L. Nelson & Jane C. Schlicht, Upholding the Sanctity of the Attorney-Client Privilege, 77 WIS. LAW. 8 (2004) (describing hypotheticals); Emily Eichenhorn, Risks & Rewards: Resisting the Inclination to Abdicate to Technology, 63 OR. ST. B. BULL. 39, 40 (2003) (same).
16 Jason Krause, Hidden Agendas, 90 AM. B. ASS'N. J. 26 (July 2004).
17 See Bryant v. State, 651 S.E.2d 718, 725 (Ga. 2007).
18 Id.
19 Georgia Rule of Professional Conduct 1.6.
20 For example, Vincent Polley, then-chair of the ABA's Cyberspace Law Committee has been quoted as saying that lawyers can no longer "plead ignorance when it comes to this stuff any more." Krause, 90 AM. B. ASS'N. J. at 26.
21 See generally, Carole Levitt & Mark Rosch, Making Metadata Control Part of a Firm's Risk Management, 28 L.A. LAW. 40, 40 (Mar. 2005) (describing various means to remove metadata, including some of those discussed here); Storm Evans, How to Commit Malpractice With a Computer, 29 LAW PRACT. MGMT. 56 (Mar. 2003) ("If you must e-mail or otherwise deliver a Word document, consider using macros or a utility program to strip away the metadata").
22 How to Minimize Metadata in Word 2003, http://support.microsoft.com/kb/825576/ (last visited Dec. 11, 2007).
23 For more information on "Fast Saves," visit Frequently Asked Questions About "Allow Fast Saves," http://support.microsoft.com/kb/291181/.
24 http://support.microsoft.com/kb/825576/
25 How to Minimize the Amount of Metadata in Powerpoint 2002 Presentations http://support.microsoft.com/default.aspx?scid=kb;EN-US;314800 (last visited Dec. 12, 2007).
26 Id.
27 Id.
28 How to Minimize Metadata in Microsoft Excel Workbooks, http://support.microsoft.com/default.aspx?scid=kb;EN-US;223789 (last visited Dec. 12, 2007).
29 Id.
30 See Beckerman-Rodau, 31 at 32-33 (2004) (suggesting that lawyers should consider removing metadata); Gerald J. Hoenig, Technology Property, 18 PROBATE & PROP. 51 (Sept. 2004) (same).
31 To download this add-in, visit http://www.microsoft.com/downloads/details.aspx?FamilyId=144E54ED-D43E-42CA-BC7B-5446D34E5360&displaylang=en or search for "remove hidden data" on http://www.microsoft.com/downleads.
32 Control Metadata in Your Legal Documents, http://office.microsoft.com/en-us/help/HA011400341033.aspx (last visited Dec. 12, 2007).
33 See Jason Krause, Guarding the Cyberfort, 39 Ark. Law. 24, 31 (2004). Suggestions, however, that pdf files contain no metadata are incorrect. See, e.g., Hoenig, 18 PROBATE & PROP. 51; 1 RONALD E. MALLEN & JEFREY M. SMITH LEGAL MALPRACTICE § 2.26 (2005) (stating that conversion from Word to pdf "could eliminate meta-data information"). However, most pdf files do contain some metadata; thus, converting a Word file to a pdf file will simply result in different metadata being transmitted.
34 Numerous "scrubbers" can be found through Google, simply searching for "metadata" and scrubber.
Electronic Discovery
© 2008 Mercer Law School