Step 1: The quickest and easiest way to make data available on the Internet is to publish the data in its raw form (e.g., an XML file of polling data from past elections). However, the data should be well-structured. Structure allows others to successfully make automated use of the data. Well-known formats or structures include XML, RDF and CSV. Formats that only allow the data to be seen, rather than extracted (for example, pictures of the data), are not useful and should be avoided.
Step 2: Create an online catalog of the raw data (complete with documentation) so people can discover what has been posted. These raw datasets should be reliably structured and documented, otherwise their usefulness is negligible. Most governments already have mechanisms in place to create and store data (e.g., Excel, Word, and other software-specific file formats). Posting raw data, with an online catalog, is a great starting point, and reflects the next-step evolution of the Internet – “website as fileserver”.
Step 3: Make the data both human- and machine-readable:
- enrich your existing (X)HTML resources with semantics, metadata, and identifiers;
- encode the data using open and industry standards – especially XML – or create your own standards based on your vocabulary;
- make your data human-readable by either converting to (X)HTML, or by using real-time transformations through CSS or XSLT. Remember to follow accessibility requirements;
- use permanent patterned and/or discoverable “Cool URIs“;
- allow for electronic citations in the form of standardized (anchor/id links or XLINKs/XPointers) hyperlinks.
These steps will help the public to easily find, use, cite and understand the data. The data catalog should explain any rules or regulations that must be followed in the use of the dataset. Also, the data catalog itself is considered “data” and should be published as structured data, so that third parties can extract data about the datasets. Thoroughly document the parts of the web page, using valid XHTML, and choose easily patterned and discoverable URLs for the pages. Also syndicate the data for the catalog (using formats such as RSS) to quickly and easily advertise new datasets upon publication.
My Twitter Feed
- Three Myths about What Customers Want - Karen Freeman, Patrick Spenner and Anna Bird - Harvard Business Review lnkd.in/YX_9Kg 2 days ago
- BBC News - US firms put social values before big profits lnkd.in/8zaRfS 1 week ago
- 4 in 10 Charities Claim No Fundraising Costs on Tax Forms lnkd.in/zxDfGG 1 week ago
- RT @latimes: NAACP endorses same-sex marriage, says it's a civil right lat.ms/Jtiihz 1 week ago
- Great Businesses Don't Start With a Plan - Anthony Tjan - Harvard Business Review lnkd.in/BY-WTX 1 week ago
My Google Reader- Back to the Start (author unknown)
- Back to the Start (author unknown)
- Matthew Taylor - Left Brain, Right Brain: Human nature and political values (author unknown)
- Matthew Taylor - Left Brain, Right Brain: Human nature and political values (author unknown)
- Apple's Future Computer: The Knowledge Navigator (author unknown)
Sleepy Categories
Sleepy Tags
big ideas communication creativity data design thinking Dyslexia education fatherhood gov 2.0 government 2.0 labels Law layoffs leadership legislative data life linked data management meaning motivation open government open government directive plain english policy rmcamp sematic web SMpolicy socialmedia stereotypes TEDTalks the future of work transparency welcome why wordsPast Posts
- February 2011 (1)
- January 2011 (1)
- November 2010 (2)
- October 2010 (2)
- August 2010 (1)
- March 2010 (3)
- February 2010 (2)
- December 2009 (2)
- November 2009 (11)
- October 2009 (5)
- September 2009 (1)
- June 2009 (1)

