Tag Archives: data

School Finance Data

The following is a letter I sent to member of the Colorado General Assembly regarding HB10-1036, which calls for school districts to publish financial data on-line.  A good thing, but how the data is published in important.  This issue is of importance in light of a recent Denver Post article on School District spending.  The bill will be up for hearing on Thursday, March 11, 2010.  The hearing start at 1:30 pm in Senate Committee Room 354 and the bill is the second item on the Senate Education committee’s calendar.   You may listen to the hearing here.
My issue is with the fiscal analysis and the assumption that school will (or should?) publish the district financial data is PDF.  I am not trying to pick on PDF, but rather with the PDF creation process.  (As with many things, the problem is a user issue not a technology issue).  Please read my letter and if you contact the committee members (list here)  and the Senate sponsor, Senator Chris Romer.
– – – – – – – –
Dear Senator or Representative;
 
I am writing you today in support of HB10-1036, the Public School Financial Transparency Act.  However, I would encourage the Colorado General Assembly to more explicitly recommend the use of open standards based technologies when publishing government data. This bill is a good step in furthering financial transparency and in increasing public accessibility to financial data.  As the bill declares, all Coloradans have an interest in knowing how moneys are being expended in the pursuit of quality public education.  A critical issue to public accessibility to financial data is how the data is published on the Web.  The fiscal note to HB10-1036 states that “it is assumed that financial documents can be electronically converted to portable document format (PDF) . . . and posted to online at minimal cost.”  This statement is correct in both respects.  There is no doubt that providing PDF versions of these documents on-line would be a step in the right direction and would give citizens access to information that would not be easily accessible today.   Furthermore, PDF can be one of the most flexible human-readable electronic formats invented and can provide one of the richest possible electronic formats ever devised in terms of capabilities.  However, in many cases the process of creating PDFs limits the usefulness of the data contained in the PDF.  Therefore, publication of PDFs to the exclusion of other formats limits the value of government data. 
 
I support this bills intended goal of giving citizens access to public information that would be otherwise relatively inaccessible.  Open government data and transparency is more than accessibility.  In fact the W3C e-Government Interest Group’s (e-Gov IG) draft document on “Publishing Open Government Data” states that “sharing government data enables greater transparency; delivers more efficient public services; and encourages greater public and commercial use and re-use of government information.”  What PDF provides in accessibility it can lack in usability and re-usability.  That is why PDF only or strong reliance on PDF versions of government data should be augmented.  I do not wish to belabor the pitfalls of the publication of open government data in PDF.  Instead, I want to share what steps can be taken to provide complete openness and transparency to government information.
 
The W3C, the Sunlight Foundation, and other open government advocates recommed that government’s should use open standards based technologies when publishing data.   Furthermore, in some cases the data or information that is converted to PDF is already in an open format, such as XML.  The W3C e-Gov IG’s Publishing Open Government Data document makes the following initial recommendations for publishing government data: 

Step 1: The quickest and easiest way to make data available on the Internet is to publish the data in its raw form (e.g., an XML file of polling data from past elections). However, the data should be well-structured. Structure allows others to successfully make automated use of the data. Well-known formats or structures include XML, RDF and CSV. Formats that only allow the data to be seen, rather than extracted (for example, pictures of the data), are not useful and should be avoided.

Step 2: Create an online catalog of the raw data (complete with documentation) so people can discover what has been posted. These raw datasets should be reliably structured and documented, otherwise their usefulness is negligible.  Most governments already have mechanisms in place to create and store data (e.g., Excel, Word, and other software-specific file formats). Posting raw data, with an online catalog, is a great starting point, and reflects the next-step evolution of the Internet – “website as fileserver”. 

 Step 3: Make the data both human- and machine-readable:  

  • enrich your existing (X)HTML resources with semantics, metadata, and identifiers;
  • encode the data using open and industry standards – especially XML – or create your own standards based on your vocabulary;
  • make your data human-readable by either converting to (X)HTML, or by using real-time transformations through CSS or XSLT.  Remember to follow accessibility requirements;
  • use permanent patterned and/or discoverable “Cool URIs“;
  • allow for electronic citations in the form of standardized (anchor/id links or XLINKs/XPointers) hyperlinks.  

These steps will help the public to easily find, use, cite and understand the data. The data catalog should explain any rules or regulations that must be followed in the use of the dataset. Also, the data catalog itself is considered “data” and should be published as structured data, so that third parties can extract data about the datasets. Thoroughly document the parts of the web page, using valid XHTML, and choose easily patterned and discoverable URLs for the pages. Also syndicate the data for the catalog (using formats such as RSS) to quickly and easily advertise new datasets upon publication.   

The ultimate goal is to make any data published by government both human and machine readable.  Machine readability is import because it allows interested parties to more easily parse the data.  Furthermore, machine readability is import because it helps to create opportunities for citizens and organizations to develop new and creative tools to give the data even greater value. 
 
The use of PDF in government and in the private sector is persistent.  Therefore, it is highly advisable that when a PDF is created that steps must be taken to include metadata formats, file attachments, and other features that will add value to the document and allow the data in the PDF to be more machine readable.  If PDF is going to be the dominate form of publication, then the creation process should aim to create greater interoperability to forward the goal of usability and re-usability.
 
Again, I support this legislation’s goal of creating transparency and openness in public school finances.  However, I would strongly encourage the Colorado General Assembly to more explicitly recommend the use of open standards based technologies when publishing any government data.   

Bookmark and Share

The Power of Linked Data

I recently began particpating in the W3C eGov Interest Group and learned about the concept or practice the connecting data to other data avialable on the Web.  It is a concept that Tim Berners-Lee calls “linked data“.  The potential is huge.  I don’t fully understand the details, but I was pointed to following presentation by Tim Berners-Lee at the TED Conference.  The video introduces the concept and the underlying potential.

Link to the video: http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

Transparency – when a good thing goes bad!

In an October 9th article in the New Republic, Lawrence Lessig authored an important critique of the transparency movement. Primarily asking the questions – transparency is all fine and good, but what is the end?

Lessig’s article and a related article by Jill Lepore, which takes a critical look at the history of scientific management, in The New Yorker point out that what seems to be a good idea at one time may not be in the long run, especially if not examined.  As the saying goes “the road to hell is paved with good intentions.”  Thus, without careful review the rush to transparency may lead to unintended consequence that we do not want and may jeopardizes the good that could have been achieved by opening up government data.

In fact, Lessig points out the non-contextual nature of raw data.   Without context, data can saying anything.  An example that comes to mind is the Denver Post’s effort to publish the salaries of all Colorado state government workers in the Post’s data center (the information is no longer available).  As a pure transparency matter, such publication is not a bad idea.  However, this data cannot speak the whole story alone.  The data that was release simply stated the name of the employee, the employee’s agency or department, the employee’s state classification, and salary.  The data did not say if the employee worked for a full year or not.  The data did not say if the employee was promoted or demoted during the year.  The data did not say what the employee’s job was.  The data was without context.  So yes a few department of transportation drivers appear to have huge salaries.  Why?  They drive snow plows and work lots of over time.  Was the information useful?  Yes.  But without the context it does not tell the whole story. 

Neither Lessig’s article nor this post on O’Reilly Radar, which highlights both Lessig and Lepore articles, condemn open government data/transparency, rather they point to the need for goals and to ask the important question of “what is the end?”  I support transparency fully and think government does need to open up more information.  But we do need to think about what will be the benefits and the costs.  And if we are republishing, redisturbing, or mashuping any of the data for use as a policy tool, we must provide context and avoid bias whenever possible.

Legislink.org – legislative links made readable

Finding legislative materials is often have the battle of staying informed on what law makers are doing.   Sites like opencongress.org and govtrack.us are aimed at making congressional information more accessible and do a good job at achieving this goal.  A new project, legislink.org, recently started with the same aim of making legislative and statutory information more accessible.  Legislink goes about this task differently by creating human readable URL that direct the user to the legislative information found on the government’s site. 

Legislink is not necessarily a better product.  Just a different way of getting to important information.  (I should disclose that I have contributed to the legislink project on bring it to the state level in Colorado).   What I think legislink does extremely well is get the user to the source of the information, the government.   Now republishing the legislative material on a new site like Opencongress or govtrack is fine, but why not go to the source of the information first.

Legislink does this by creating URL that can be cited much more easily than the official URLs provide by legislative sites. 

For instance,  the official Colorado legislative URL for House Bill 08-1266 (a bill I also worked on) is http://www.leg.state.co.us/clics/clics2008a/csl.nsf/fsbillcont3/A254528A18722054872573D1006E26DA?Open&file=1266_enr.pdf.  The legislink URL is http://legislink.org/us-co?HB-08-1266.   I think that is easier, but you be the judge.  In addition, legislink allows users to go directly to  a specific section of the piece of legislation, such as section 8 of HB08-1266, http://legislink.org/us-co?HB-08-1266-8.  If a bill of interest is long or if you are looking for a specific section of a bill, this feature is extremely helpful.

As a I state above, legislink is not necessarily the answer.  Just as opencongress and govtrack are not the answer.   But these sites are a step in the right direction to making government information more accessible.  

So in the spirit of Chris Brogan, I must applaud Joe Cramel, a retired IT manager for the US Congress, for getting it right by creating the legislink project.   I am also encouraging others to participate in building and expanding legisllink by joining the conversation and contributing to the effort on the project wiki.