Although the word open might imply access, many times, it does not imply transmission, reproduction, or re-use of material, as seen currently with most government open data and recently discussed at the GovDatax event.[1] Recent laws require the federal government to make their public data available and they encourage agencies to share information between them. Still, in practice, there is a competing group of laws that restrict access to these very same data with a bulk of copyright restrictions, publicity and privacy rights that might be applicable, as well as contract limitations that fill the restrictions gap when no other law is available.

Since 1960 the U.S. government has been taking steps towards more open data. The 1967 Freedom of Information Act (FOIA)[2] provided the public the right to request access to records from any federal agency; the Chief Financial Officers (CFO)[3] Act of 1990 required detailed agency accounting and financial data to treasury; the Federal Funding Accountability and Transparency Act (FFATA)[4] of 2006 required the full disclosure to the public of all entities or organizations receiving federal funds; and most recently the Digital Accountability and Transparency Act of 2014 (DATA Act) standardized and publicized the federal spending data, also considered as the nation’s first open data law.[5]

But the real breakthrough for openness in data sharing came this year with the Foundations for Evidence-Based Policymaking Act of 2019 (Evidence Act) and the OPEN Government Data Act also of 2019. The first act emphasizes collaboration and coordination to advance data and evidence-building functions in the Federal Government by statutorily mandating Federal evidence-building activities, open government data, and confidential information protection and statistical efficiency;[6] the second act requires federal agencies to publish their information online as open data, using standardized, machine-readable data formats, with their metadata included in the Data.gov catalog (Data Act).[7] 

As a counterweight, following privacy and/or national security policies, there is another group of data that is not open or available to the public.  Limitations to public data are found in the Health Insurance Portability and Accountability Act (HIPAA), the Personally Identifiable Information (PII), the Family Educational Rights and Privacy Act (FERPA), as well for information related to national security, especially if the data is military or intelligence-related.[8]

The real problem comes when excesive copyright or contractual restrictions apply to the data that is available to the public. Although works of the United States federal government generally do not have statutory copyright protection,[9] these works might be subject to protection if there are publicity or privacy rights involved, or if the works prepared for the government by independent contractors are copyright protected.[10] This copyright exception neither extends to works produced by subnational governments, such as states, cities, and other municipalities.[11]

Public Access Policy of agencies such as the National Institute of Health (NIH) explicitly manifests copyright restrictions over most of their data.[12] As found in their national library website, publishers or authors provide all of the material available from the PubMed Central (PMC) site and almost all of it is protected by U.S. and/or foreign copyright laws, even though PMC offers free access to it. There are some public domain materials. However, they may still contain photographs or illustrations copyrighted by other commercial organizations or individuals that may not be used without obtaining prior approval from the copyright holder.

Also, there is no explicit right or implied license for users to use this open data.[13]  NIH content is available to be accessed, downloaded, and read. Still, transmission, reproduction, or re-use of protected material, beyond that allowed by the fair use section[14] in the copyright law, requires the written permission of the copyright holders. So, if any third party wants to make use of these copyrighted materials, such as universities, think tanks, research institutions, libraries or museums, they would need to review the materials with regards to the recent rulings of the Supreme Court and the Federal Courts to determine a possible fair use copyright defense/exception. 

NIH PubMed also forbids the use of crawlers[15] or systematic downloading of articles that are available in their repositories, limiting most of text and data mining (TDM)[16] research activities.[17] Crawlers and other automated processes may not be used to systematically retrieve batches of articles from the PMC web site. Bulk downloading of materials from the main PMC website, in any way, is prohibited because of copyright restrictions. However, PMC has two auxiliary services that may be used for automated retrieval and downloading from the PMC archive, even though they only apply to a special subset of articles. These two services, the PMC OAI service and the PMC FTP service are the only services available for automated downloading of articles in PMC. 

In short, even though the U.S. government has moved to a more open data shared policy, the practice still shows noticeable limitations for the re-use of this data by people different from the government. Third parties, such as university researchers, think tanks, libraries, archives, or museums, are only allowed to read and download but highly limited to reproduce, distribute or modify the content for new purposes. While making re-use of materials under the copyright fair use provision could be an option, the question remains on the probability of succeeding on a copyright claim in court. Also, if the intention is to engage in a text and data mining research project, the high amount of copyright, contract, or technical restrictions would prove too complicated to comply with the law when applying this methodology. Continuing in the path of open data shared policy, legislators will eventually find a balance between the copyright right holders and the access and re-use of the data produced by or on behalf of the government.

————————

Footnotes:

1. Event organized by the Data Coalition Organization on October 30th, 2019, in Washington D.C. 

2. https://www.foia.gov

3. https://www.gao.gov/special.pubs/af12194.pdf

4. https://www.fsrs.gov/

5. Data Act, DataCoalition, available at https://www.datacoalition.org/policy-issues/government-spending/data-act/

6. Memorandum for Heads of Executive Departments and Agencies, Executive Office of the President, Office of Management and Budget, July 2019. Available at  https://www.whitehouse.gov/wp-content/uploads/2019/07/M-19-23.pdf

7. This act makes Data.gov a requirement in the statute, rather than a policy.

8.  Jennifer C. Boettcher, and K. Matthew Dames. Government Data as Intellectual Property: Is Public Domain the Same as Open Access?, Online Searcher, vol. 42, no. 4, 2018, p. 42. See restrictions to data from the Department of Defense. It is important to consider these laws under the scope of cyber security.   

9. 17 U.S.C. § 105. “Subject matter of copyright: United States Government works. Copyright protection under this title is not available for any work of the United States Government, but the United States Government is not precluded from receiving and holding copyrights transferred to it by assignment, bequest, or otherwise”.

10. See USA government site for more information https://www.usa.gov/government-works

11.Compilations of data may also be protected by copyright if originality and creative elements are met. See Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991). A compilation is not copyrightable per se, but is copyrightable only if its facts have been selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship.

12.  PubMed Central (PMC) site, PMC Copyright notice, see at https://www.ncbi.nlm.nih.gov/pmc/about/copyright/

13. Nils Dietrich, et al., Safe to Be Open: Study on the Protection of Research Data and Recommendations for Access and Usage (2013),  http://eprints.gla.ac.uk/129335Lucie Guibault

14. 17 U.S. Code § 107.Limitations on exclusive rights: Fair use

15.  A crawler is a program used by search engines to collect data from the internet. “When a crawler visits a website, it picks over the entire website’s content and stores it in a databank. It also stores all the external and internal links to the website. The crawler will visit the stored links at a later point in time, which is how it moves from one website to the next. By this process, the crawler captures and indexes every website that has links to at least one other website.” See definition in https://www.searchmetrics.com/glossary/crawlers/

16. Text and data mining (TDM) is a term that refers to computational processes for applying structure to unstructured electronic texts and employing statistical methods to discover new information and reveal patterns in the processed data. This process may lead to knowledge which can be found in the works being mined but not yet explicitly formulated. TDM has become a hugely important research tool in science and many other domains. See Maurizio Borghi, Text and Data Mining, available at https://www.copyrightuser.org/understand/exceptions/text-data-mining/, Matthew Sag, The New Legal Landscape for Text Mining and Machine Learning, 66 J. Copyright Soc’y USA, Feb. 2019, at 1–64), Bernt Hugenholtz, The New Copyright Directive: Text and Data Mining (Articles 3 and 4), Institute for Information Law (IViR), July 24, 2019. Available at http://copyrightblog.kluweriplaw.com/2019/07/24/the-new-copyright-directive-text-and-data-mining-articles-3-and-4/.

17.  Recent cases on fair use and text and data mining (TDM): Authors Guild v. Google, 804 F.3d 202 (2d Cir. 2015); Authors Guild v. HathiTrust, 755 F.3d 87 (2d Cir. 2014); Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th Cir. 2007); Ticketmaster Corp. v. Tickets.com, Inc., No. CV997654HLHVBKX, 2003 WL 21406289, at *1 (C.D. Cal. Mar. 7, 2003); A.V. ex rel. Vanderhye v. iParadigms, L.L.C., 562 F.3d 630 (4th Cir. 2009). However, neither the google or hathitrust cases addressed issues arising under contract law, laws prohibiting computer hacking, laws prohibiting the circumvention of technological protection measures (i.e., encryption and other digital locks), or cross- border copyright issues. See Matthew Sag, The New Legal Landscape for Text Mining and Machine Learning, 66 J. Copyright Soc’y USA, Feb. 2019, at 1–64)