Step 1: The quickest and easiest way to make data available on the Internet is to publish the data in its raw form (e.g., an XML file of polling data from past elections). However, the data should be well-structured. Structure allows others to successfully make automated use of the data. Well-known formats or structures include XML, RDF and CSV. Formats that only allow the data to be seen, rather than extracted (for example, pictures of the data), are not useful and should be avoided.
Step 2: Create an online catalog of the raw data (complete with documentation) so people can discover what has been posted. These raw datasets should be reliably structured and documented, otherwise their usefulness is negligible. Most governments already have mechanisms in place to create and store data (e.g., Excel, Word, and other software-specific file formats). Posting raw data, with an online catalog, is a great starting point, and reflects the next-step evolution of the Internet – “website as fileserver”.
Step 3: Make the data both human- and machine-readable:
- enrich your existing (X)HTML resources with semantics, metadata, and identifiers;
- encode the data using open and industry standards – especially XML – or create your own standards based on your vocabulary;
- make your data human-readable by either converting to (X)HTML, or by using real-time transformations through CSS or XSLT. Remember to follow accessibility requirements;
- use permanent patterned and/or discoverable “Cool URIs“;
- allow for electronic citations in the form of standardized (anchor/id links or XLINKs/XPointers) hyperlinks.
These steps will help the public to easily find, use, cite and understand the data. The data catalog should explain any rules or regulations that must be followed in the use of the dataset. Also, the data catalog itself is considered “data” and should be published as structured data, so that third parties can extract data about the datasets. Thoroughly document the parts of the web page, using valid XHTML, and choose easily patterned and discoverable URLs for the pages. Also syndicate the data for the catalog (using formats such as RSS) to quickly and easily advertise new datasets upon publication.
- RT @kathkeating: Thank you for spotlighting me @SheSaysBLDR ! twitter.com/shesaysbldr/st… 5 hours ago
- RT @WIRED: Starsky Robotics has unleashed its truly driverless truck in Florida: wired.trib.al/cvGWoN5 https://t.co/G15L6wTiGX 6 days ago
- Yes yes a thousand times yes twitter.com/18F/status/973… 6 days ago
- RT @brainpicker: “The rediscovery of action and the reemergence of a secular, public realm of life may well be the most precious inheritanc… 1 week ago
- Cool deal twitter.com/mashable/statu… 1 week ago
- An error has occurred; the feed is probably down. Try again later.
Sleepy Tagsbig ideas communication creativity data design thinking Dyslexia education fatherhood gov 2.0 government 2.0 labels Law layoffs leadership legislative data life linked data management meaning motivation open government open government directive plain english policy rmcamp sematic web SMpolicy socialmedia stereotypes TEDTalks the future of work transparency welcome why words