Step 1: The quickest and easiest way to make data available on the Internet is to publish the data in its raw form (e.g., an XML file of polling data from past elections). However, the data should be well-structured. Structure allows others to successfully make automated use of the data. Well-known formats or structures include XML, RDF and CSV. Formats that only allow the data to be seen, rather than extracted (for example, pictures of the data), are not useful and should be avoided.
Step 2: Create an online catalog of the raw data (complete with documentation) so people can discover what has been posted. These raw datasets should be reliably structured and documented, otherwise their usefulness is negligible. Most governments already have mechanisms in place to create and store data (e.g., Excel, Word, and other software-specific file formats). Posting raw data, with an online catalog, is a great starting point, and reflects the next-step evolution of the Internet – “website as fileserver”.
Step 3: Make the data both human- and machine-readable:
- enrich your existing (X)HTML resources with semantics, metadata, and identifiers;
- encode the data using open and industry standards – especially XML – or create your own standards based on your vocabulary;
- make your data human-readable by either converting to (X)HTML, or by using real-time transformations through CSS or XSLT. Remember to follow accessibility requirements;
- use permanent patterned and/or discoverable “Cool URIs“;
- allow for electronic citations in the form of standardized (anchor/id links or XLINKs/XPointers) hyperlinks.
These steps will help the public to easily find, use, cite and understand the data. The data catalog should explain any rules or regulations that must be followed in the use of the dataset. Also, the data catalog itself is considered “data” and should be published as structured data, so that third parties can extract data about the datasets. Thoroughly document the parts of the web page, using valid XHTML, and choose easily patterned and discoverable URLs for the pages. Also syndicate the data for the catalog (using formats such as RSS) to quickly and easily advertise new datasets upon publication.
- RT @theintercept: Nearly 40 percent of terrorism defendants were caught up in FBI stings. Explore the database: trial-and-terror.theintercept.com htt… 18 hours ago
- RT @HillaryClinton: Happy #EarthDay, and thanks to math and science for all you've given us! March on! 🎉🔬🔭 billmoyers.com/story/neil-deg… 18 hours ago
- RT @ggreenwald: Because only rational response is: "yes." Once you start criminalizing publication of secret docs, all media outlets are en… 1 day ago
- Interesting. Not sure about these predictions, but their guess is as good as mine! twitter.com/kathkeating/st… 1 day ago
- How Child Care Enriches Mothers, and Especially the Sons They Raise nyti.ms/2pFNrfs via @UpshotNYT 2 days ago
- An error has occurred; the feed is probably down. Try again later.
Sleepy Tagsbig ideas communication creativity data design thinking Dyslexia education fatherhood gov 2.0 government 2.0 labels Law layoffs leadership legislative data life linked data management meaning motivation open government open government directive plain english policy rmcamp sematic web SMpolicy socialmedia stereotypes TEDTalks the future of work transparency welcome why words