Step 1: The quickest and easiest way to make data available on the Internet is to publish the data in its raw form (e.g., an XML file of polling data from past elections). However, the data should be well-structured. Structure allows others to successfully make automated use of the data. Well-known formats or structures include XML, RDF and CSV. Formats that only allow the data to be seen, rather than extracted (for example, pictures of the data), are not useful and should be avoided.
Step 2: Create an online catalog of the raw data (complete with documentation) so people can discover what has been posted. These raw datasets should be reliably structured and documented, otherwise their usefulness is negligible. Most governments already have mechanisms in place to create and store data (e.g., Excel, Word, and other software-specific file formats). Posting raw data, with an online catalog, is a great starting point, and reflects the next-step evolution of the Internet – “website as fileserver”.
Step 3: Make the data both human- and machine-readable:
- enrich your existing (X)HTML resources with semantics, metadata, and identifiers;
- encode the data using open and industry standards – especially XML – or create your own standards based on your vocabulary;
- make your data human-readable by either converting to (X)HTML, or by using real-time transformations through CSS or XSLT. Remember to follow accessibility requirements;
- use permanent patterned and/or discoverable “Cool URIs“;
- allow for electronic citations in the form of standardized (anchor/id links or XLINKs/XPointers) hyperlinks.
These steps will help the public to easily find, use, cite and understand the data. The data catalog should explain any rules or regulations that must be followed in the use of the dataset. Also, the data catalog itself is considered “data” and should be published as structured data, so that third parties can extract data about the datasets. Thoroughly document the parts of the web page, using valid XHTML, and choose easily patterned and discoverable URLs for the pages. Also syndicate the data for the catalog (using formats such as RSS) to quickly and easily advertise new datasets upon publication.
- RT @ggreenwald: This 30-second clip is one of the most vivid distillations of American Exceptionalism ever: the US has the full, unfettered… 2 days ago
- Enough said twitter.com/neiltyson/stat… 2 days ago
- RT @MichelleObama: Congrats to the entire #blackpanther team! Because of you, young people will finally see superheroes that look like them… 2 days ago
- From the history books: 5 billion year old picture of the cosmos care of Hubble and NASA! lnkd.in/ewGXQ_E 2 days ago
- For every action... twitter.com/theintercept/s… 1 week ago
- An error has occurred; the feed is probably down. Try again later.
Sleepy Tagsbig ideas communication creativity data design thinking Dyslexia education fatherhood gov 2.0 government 2.0 labels Law layoffs leadership legislative data life linked data management meaning motivation open government open government directive plain english policy rmcamp sematic web SMpolicy socialmedia stereotypes TEDTalks the future of work transparency welcome why words