A public sector institution that creates and publishes a dataset has no obligation to offer data users additional amenities such as conversion to a suitable format, building special network service, translation etc. Nor do officials have the obligation to ensure that data is correct or up to date. Instead, the publisher has to explain in brief the nature of the data and document the expected frequency of the updates.
Licence and fee for dataset. An open dataset must have a licence that allows it to be used, processed and distributed free of charge and without restrictions, either free of charge or for a fee – at the user’s discretion. Specifically, we recommend that a creative commons licence be selected as the licensing option11. Above all, from this list, we recommend CC by 3.0 licence12. This means that in licensing a work, the licensor is the author or the copyright holder, but the licensee is the public at large. You have the right to copy a work (reproduce it), distribute, perform and direct it at the public, and to adapt, arrange and develop it otherwise, including derivative works, on condition that the author is credited.
Open data is published advisably for free download, but the publisher has the right to charge a fee for loading the data in cases set forth in legislation.
Principles for publishing datasets. When publishing data, a compromise between two objectives must be found:
• convenient usability and comprehensibility of the data for the data seeker and the downloader,
• simplicity of publishing data and minimizing the work expenses for the publisher.
To do so, the first task is to find the easiest, simplest and most rapid way of publishing the existing data as such and only then to examine ways of creating user friendliness for information seekers and downloaders. In other words, updating data, converting and other operations are to be tackled only once the dataset has been published.
Data can be updated and converted by a third party as well, who in turn receives the right to share the data free of charge of for a fee. The open dataset conforms to the following requirements13:
Tim Berners-Lee format level scheme on a “fi ve-star mug”
http://www.cafepress.com/w3c_shop
1. Integrity. All public data shall be made available. This includes all data not subject to personal data restrictions etc.
2. Comes from original source. The data has been gathered from the original source without modification, preserving their original format and level of detail. As with databases, it is not permitted to take data from a secondary database.
3. Up-to-dateness. The dataset was published as rapidly as possible to preserve its up-todateness.
4. Availability. The data is available to as wide a circle of users as possible with as wide a range of use as possible.
5. Machine-readability. The data has an understandable structure and can be automatically processed.
6. Avoidance of discrimination. The data is presented publicly, no need to register or seek access privileges in order to obtain it.
7. Use of open standards. The data is presented in an open format that is not the exclusive property of any one company or person.
8. Free licence. The data is not under copyright, patent, trademark or business secret protection. Reasonable privacy and security restrictions are permitted.
How to publish?
In what format? The main principle here is that it is much better to publish data in an inconvenient coding than to not publish them at all on the consideration that it is planned at some unspecified time to improve the coding. Secondly, a published data set can always later be published in a new, better code.
We recommend evaluating the user-friendliness of formats and coding formats based on Tim Berners-Lee’s five-star system principles14 – the more stars the user-friendlier format. The distribution of formats given Estonia’s circumstances could be the following:
* data is available online in any format (e.g.jpeg, pdf, doc, docx, xls.). Data cannot be separated from the file or it is presented in formats oriented at proprietary software;
** data is on the website in open format (e.g.txt, html, odt), but in unstructured form;
*** data is presented on the website in open and free format (e.g.csv, xml, ods files);
**** the objects in the data are identified by URIs15;
***** the data are linked to other data using URIs.
Publishing of data sets is best done in formats that can be opened and processed using freeware applications. This includes .odt format document files as well as some of the most common formats for structured data, like .csv, json and .xml.
Formats that can be opened and modified by freeware applications are also well-suited for re-use.
Situation in Estonia
In Estonia as well, there is now considerable political will to make public sector information more re-usable. And thus the government’s programme16in the section on “From E-state to I-state” has a subsection devoted to open data entitled “Putting the state’s e-resources in the service of citizens and companies”. The government programme promises the following explicitly:
• we will make the state’s spatial data public in modifiable form – this will give citizens and companies the possibility to develop purposeful services on the basis of government data;
• to increase transparency and inclusion and stimulate the private sector to develop new applications, we will make public data – i.e. state and local government data machine-readable;
• we will set the aim of making databases created collaboratively between private and public sector available to companies and individuals for development.
Estonia is home to an open data community17, and it has a page on Facebook. A movement called Garage48 is active in preparing services, and their motto is “less talk, more action”. The Association of Information Technology and Telecommunications initiative is also to be reckoned with their 2011 conference “From vision to solution” focused on obstacles to developing new e-services.
In Estonia, the availability of public information can be rated exemplary. As the fairly liberal Public Information Act18 makes it obligatory to release to the public, via a government department’s website, document register and databases, the department’s unrestricted information, more information than in most other countries is subject to being made public. For instance, every public sector institution must release information on their structure, salaries, document register, reports, statistics, budgets, development plans. The Public Information Act distinguishes between 32 types of information to be published. Considering that Estonia has 2,000 public sector institutions and each one of them should publish an average of ten datasets, the volume of reusable information is at least 20,000 datasets.
But unlike most countries, the public sector is not required to publish information in reusable form. The published datasets are not always in open formats. The primary formats used are PDF and MS Office (proprietary software) oriented formats. Thus this is predominantly one-star data.
Public sector information is stored in databases. But public databases and their open service interfaces are unspecified and thus hard to re-use. There is no legal requirement that descriptions of such registers and their services be published in the state information system’s management system (RIHA). For instance, the Government Office’s document register has been realized in model fashion, its output is xml data, but the data and