Stephen-Gates.github.io

<p id=“contribute”>This is a draft. We need your help to make it better. Get involved, learn more, and help us improve the Open Definition:</p>

Introduction

The Open Definition has three key requirements for a work to be open: an open license, open access, and an open format. This page focuses on the open format:

  1. quoting the <a href=“#openformat”>Open Format</a> section from the Open Definition,
  2. exploring it’s <a href=“#intent”>intent</a>,
  3. <a href=“#testing”>testing</a> some real world examples,
  4. defining the <a href=“#meaning”>meaning of special words</a>,
  5. collecting <a href=“#improve”>ideas for improvement</a>,
  6. providing links to <a href=“#resources”>related resources</a>, and
  7. letting you peek at <a href=“#coming”>what’s coming next</a>.

<a id=“openformat”>1. The Open Format defined</a>

Section 1.3 Open Format from the Open Definition version 2.0 states:

The work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights. Specifically, data should be machine-readable, available in bulk, and provided in an open format (i.e., a format with a freely available published specification which places no restrictions, monetary or otherwise, upon its use) or, at the very least, can be processed with at least one free/libre/open-source software tool.

<a id=“intent”>2. The Intent</a>

We want to create open knowledge. To help achieve this, the Open Format requires:

The work must be provided in a convenient form

The work must be provided in a convenient format so that it is easy to reuse. This requires the work to be published in a format that maximises knowledge sharing and reuse. The format may vary for different media types (e.g. image, text, geographic data).

The work must be provided in a modifiable form

The work must be provided in a modifiable format so it can be reused in different ways. What is an appropriate modifiable form?

No unnecessary technological obstacles to the performance of the licensed rights

(<a href=“#contribute”>contribution needed</a>)

Data should be machine-readable

Data is machine-readable if it is in a format that can be easily read, written, parsed and displayed by a computer.

For example:

As another example:

Appropriate machine-readable format may vary by data type. For example, a machine-readable format for geographic data may be different to a format for tabular data.

This section is based on [<a href=“#machine”>OKFN</a>] and [<a href=“#od-discuss”>OD-Discuss</a>].

See also https://www.data.gov/developers/blog/primer-machine-readability-online-documents-and-data

Data should be available in bulk

The work should be provided in bulk, means the data can be accessed easily in one request.

This requirement complements the Access section of the Open Definition and together they require that:

But your data can still be open if you publish it as many individual files (however it could be argued you’re not publishing it in a convenient form).

Data should be provided in an Open Format

An Open Format for data - Definition 1:

An Open Data Format is a format with a freely available published specification which places no restrictions, monetary or otherwise, upon its use.

A freely available published specification allows:

If an open data format has no restrictions, monetary or otherwise, upon its use, then:

An Open Format for data - Definition 2:

An Open Format is:

<a id=“testing”>3. Testing some real world examples </a>

Is a National Budget in a PDF open?

The Open Format for data definitions above enable tabular data (e.g. a Nation Budget) to be published as a PDF (an open format according to the definition). However, this is not a convenient form for this type of data and, “the work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights”.

So, is a PDF of a National Budget open?

Tim Berners-Lee’s 5 Star Open Data scheme says it’s open and gets 1 star.

Based on the definition of machine readable above, a PDF of a Nation Budget isn’t open. (<a href=“#contribute”>contribution needed - is this the intent?</a>)

Non-data works

It could be argued that by prefixing the second sentence of the Open Format with, “Specifically, data should…”, this means non-data works may, but are not required to:

(<a href=“#contribute”>Contribution needed</a>) Is it OK that these requirements are all optional for non-data works?

<a id=“meaning”>4. Words with special meaning</a>

Some words in the Open Definition have special meaning and are shown in bold or italics. There meaning is defined below:

Work - denotes the item or piece of knowledge being transferred [<a href=“#OD”>OD</a>]. Examples of a work include, but are not limited to: data, music, art, images, video, literary compositions, web pages and software.

Must, Required, or Shall - an absolute requirement [<a href=“#RFC2119”>RFC2119</a>].

Must Not or Shall Not - an absolute prohibition [<a href=“#RFC2119”>RFC2119</a>].

Should or Recommended - there may be valid reasons to ignore this requirement but the full implications must be understood and carefully weighed before choosing a different course [<a href=“#RFC2119”>RFC2119</a>].

Should Not or Not Recommended - there may be valid reasons when the particular behaviour is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behaviour described with this label [<a href=“#RFC2119”>RFC2119</a>].

May or Optional - the item is truly optional [<a href=“#RFC2119”>RFC2119</a>].

<a id=“improve”>5. Improving the Open Format</a>

These improvement ideas mainly come from conversations on the discussion list.

Open Format Specification

An open format specification should be:

Retain all original detail

The work should be published in a lossless and uncompressed open format so all the original detail is retained.

A common resource for tools to reference

Tools like the Open Data Census and Open Data Certificates test to see if data is published using an open format. This improvement idea seeks to harmonise the definition of the Open Format for data so that tools could all point to the Open Definition, in the same way the tools currently point to it for a definition of an open licence and a list of conformant licenses.

<a id=“resources”>6. Resources</a>

Do you have another resource you’d like added below? <a href=“#contribute”>Make the list better</a>.

Alternative definitions and views

The links provide alternate perspectives on open formats:

Lists of Open Formats

These lists of open format have not been assessed as being conformant with the Open Definition:

<a id=“coming”>7. What’s coming next? </a>

The Open Definition version 2.1 is being drafted. At the time of writing, it states,

The work must be provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool. Data must be machine-readable and should be provided in bulk where possible.

The changes:

References

<a id=“okfn”>[OKFN]</a> <a href=“http://webarchive.okfn.org/okfn.org/201404/opendata/glossary/#machine-readable”>Open Data Glossary</a> (Archived Content) by the Open Knowledge Foundation.

<a id=“od-discuss”>[OD-Discuss]</a> <a href=“https://lists.okfn.org/pipermail/od-discuss/2015-April/subject.html#1330”>A harmonised Open Format definition</a> a discussion thread on the <a href=“http://lists.okfn.org/mailman/listinfo/od-discuss”>Open Definition Discussion list</a>.

<a id=“od”>[OD]</a> <a href=“http://opendefinition.org/od/”>Open Definition</a> by Open Knowledge.

<a id=“RFC2119”>[RFC2119]</a> <a href=“http://www.ietf.org/rfc/rfc2119.txt”>Key words for use in RFCs to Indicate Requirement Levels.</a> by S. Bradner, IETF RFC2119.