Thursday, October 22, 2009

Free Address Validation Tool

Today I am announcing release of another freebie from – Address Validation Tool. It is an online application for geocoding and validating address information in a semi-automated fashion. It is built with Google Maps API and Google geocoding engine and is suitable for handling small to medium volume of data.

Geocoded geographic coordinates of points of interest can be adjusted manually by repositioning location marker on the map (latitude and longitude will be updated from corresponding map coordinates). Address and accuracy code details can also be edited manually before saving the record. All saved records can be processed into CSV, KML or GeoRSS output format on completion. Individual records in input data are identified with a sequence number which is maintained throughout the entire process to facilitate easy reconciliation of output file with original information.

Geocoded information is classified according to accuracy, eg. “address level accuracy”, “street level accuracy”, “town/ city level accuracy” etc. Counts of records in each accuracy level are maintained during the process and all saved points can be previewed on the map at any time.

Address validation is a 3 step process:

Step 1. Paste list of address or locations to be geocoded and validated into a text area in “Input” tab and click “Press to Start/ Reset!” button to initiate the process.

Step 2. Edit geocoded information in “Edit” tab and save the result (one at a time). “Save” button saves current record and geocodes the next from the input list. Any text and accuracy code edits will be saved as well. Use “Next” button to skip to the next record on the input list without saving (skipped record will not be included in the final output file).

Step 3. Generate output from saved records to reconcile with the original information. CSV is the most obvious choice for updating original dataset. Although KML and GeoRSS outputs generated by the tool can be used with Google Map or Google Earth without further edits, it is recommended that you update content of at least "title" and "description" elements to improve presentation of the information.

Useful tips:
  • Include “country” field in the input data to improve geocoding accuracy if you are getting too many results from incorrect parts of the globe.
  • You have a chance to preview saved locations and to make final positional adjustments by selecting any of the options from “Show saved records by accuracy:” pull-down menu in “Edit” tab. Please note, all makers displayed on the map can be moved however, any changes in latitude and longitude coordinates will be saved automatically and cannot be undone.
  • Composition of address detail will differ depending on geocoding accuracy level. For ease of further processing, avoid mixing various accuracy levels in the final output file if you intend to split address details into components.
  • Geocoded address information is written into CSV file as a single text field but it can be split further using spreadsheet's “Data/Text to Column” function if you require individual address components as separate fields.

Address Validation Tool is a replacement for my earlier application - Bulk Geocoder - which was also built with Google geocoding engine. Since Google terms of use changed earlier this year, it is now prohibited to run fully automated batch geocoding using free Google service. To comply with those restrictions this new tool allows to geocode only one point at a time. And if I interpret the wording correctly, the information itself can only be used with Google applications.

I have submitted this application as my second entry in the MashupAustralia contest (the first one was Postcode Finder). I hope that it will be a handy resource to help improve spatial accuracy of data released for this competition and beyond. Any comments, feedback and suggestions greatly appreciated!

Friday, October 16, 2009

MashupAustralia contest update

MashupAustralia contest I mentioned in my earlier post has been running for a week and a bit now. Only five entries so far (in descending order, from the newest to the oldest):

Your Victoria Online: Google Map based application to locate the nearest Emergency Services, Government Services, Schools and Public Internet etc.

Victorian Schools Locator: Google Map based application, created with map maker, this application shows locations of over 2,000 schools in Victoria.

Broadband Locator: the site is using Google Maps and Street View to display address information - visitors can enter the address and the application will show what broadband services are in their area.

Geocoded List of Medicare Office Locations: a geocoded ATOM feed of Medicare offices.

Postcode Finder: my first entry into the contest – with postcodes and suburbs boundaries and Victorian Police Stations as POI. I am planning to add more POI, eventually. Unfortunately, the data supplied for the contest is all over the place and cannot be just “plugged in” without major rework (I mean, to show consistent information or reasonable spatial accuracy).

Judging by the number of visitors coming to my page from site and the number of entries to date the contest is not as widely embraced as many may have hoped for, but these are still early days. Hopefully, my blog can bring a bit of extra publicity for this contest. It is after all a very worthy cause.

The closing time for lodging entries into the contest has been extended to 4PM Friday, 13th November 2009 so it gives plenty of time for building complex applications. There will also be a number of mashup events over the next few weeks which should bring plenty of exciting new entries:

I can already claim one “conciliation prize” in this contest – being the first entrant into the competition! It does not come with any formal accolades nor a cheque but this will do me just fine. I am not really in contention for any prizes. Just wait till you see what is cooking in garages around the country and what master chefs - cream of Australian GIS industry - will soon start to serve!

Wednesday, October 14, 2009

Mapping Stimulus Projects in Oz

Last month in my post on Google tools for public sector I provided a few examples of how Australian government departments and organisations are using Google maps to present various information. Today another interesting example: a map showing where and what projects billions of dollars committed by the government in the economic stimulus package is spent on. Information is available for six different expenditure categories: education, community infrastructure, road and rail, housing, insulation and solar. Zoom to your local area to find out what is actually happening in your neighbourhood with the allocated money.

Some of the information available on the Nation Building - Economic Stimulus Plan site has also been released under Creative Commons - Attribution 2.5 Australia (CC-BY) licence and can be freely used for various mashups and analysis. In particular, you can access information on all current community infrastructure, and road and rail projects across Australia. And if you have a great idea on how to use this data you can enter Mashup Australia contest for great prizes. It is run by Government 2.0 Taskforce for a limited time.

Tuesday, October 13, 2009

Ed Parsons on Spatial Data Infrastructure

I recently attended Surveying & Spatial Sciences Institute Biennial International Conference in Adelaide and was privileged to see Ed Parsons’ presentation. For those who don’t know Ed, his bio describes him as “… the Geospatial Technologist of Google, with responsibility for evangelising Google's mission to organise the world's information using geography, and tools including Google Earth, Google Maps and Google Maps for Mobile.” He delivered a very enlightening and truly evangelistic presentation outlining his views on the best approach to building Spatial Data Infrastructures. The following paragraphs summarise the key, thought provoking points from the presentation – with some comments from my perspective.

The essence of Ed’s position is that the currently favoured approach of building highly structured and complex to the n-th degree “digital libraries” to manage spatial information is very inefficient and simply does not work. There is much better framework to use – the web – which is readily available and can deliver exactly what is needed by the community, and in a gradual and evolutionary fashion rather than as a pre-designed and rigid solution.

I could quote many examples of failed or less than optimal implementations of SDI initiatives in support of Ed’s views. There is no doubt that there are many problems with the current approach. New initiatives are continuously launched to overcome the limitations of previous attempts to catalogue collections of spatial information. And it is more than likely that none of the implementations is compatible with the others. The problem is that metadata standards are too complex and inflexible and data cataloguing software is not intelligent enough to work with less than perfectly categorised information. I recently had first hand experience with it. I tried to use approved metadata standards for my map catalogue project, hoping it will make the task easier and the application fully interoperable, but in the end, I reverted to adding my own “interpretations and extensions” (and proving, at least to myself, that “one-fit-all” approach is almost impossible). I will not even mention the software issues…

Ed argued that most SDI initiatives are public sector driven and since solution providers are primarily interested in “selling the product”, therefore by default it all centres on data management aspect of the projects. In other words, the focus is on producers rather than users, on Service Oriented Architecture (SOA) rather than on “discoverability” of relevant information. All in all, current SDI solutions are built on the classic concept of a library where information about the data (metadata) is separated from the actual data. Exactly as in a local library, where you have an electronic or card based catalogue with book titles and respective index numbers and rows of shelves with books organised according to those catalogue index numbers. For small, static collections of spatial data this approach may work, but not in the truly digital age, where new datasets are produced in terabytes, with myriad of versions (eg. temporal datasets), formats and derivations. And this is why most SDI initiatives do not deliver what is expected of them at the start of the project.

Ed made a point that it is much better to follow an evolutionary approach (similarly to how web developed over time) rather than strict, “documentation driven” process, as is the case with most current SDI projects. The simple reason is that you don’t have to understand everything up-front to build your SDI. The capabilities may evolve as needs expand. And you can adjust your “definitions” as you discover more and more about the data you deal with. In an evolutionary rather than prescriptive way. It is a very valid argument since it is very, very hard to categorise the data according to strict rules, especially if you cannot predict how the data will evolve over time.

[source: Ed Parsons, Google Geospatial Technoloist]

The above table contrasts the two approaches. On one side you have traditional SDIs with strict OGC/ISO metadata standards and web portals with search functionality - all built on Service Oriented Architecture (SOA) principles and with SOAP service (Simple Object Access Protocol) as the main conduit of information. Actually, the whole set up is much more complex as, in order to work properly, it requires formalised “discovery” module - a registry that follows Universal Description, Discovery and Integration (UDDI) protocol and a “common language” for describing available services (that is, Web Service Description Language or WSDL in short). And IF you can access the data (big “if” because most public access SDI projects do not go as far) it will most likely be in a “heavy duty” Geographic Markup Language (GML) format (conceived over a decade ago but still mostly misunderstood by software vendors as well as potential users). No wonder that building SDI based on such complex principles poses a major challenge. And even in this day and age the performance of such constructed SDI may not be up to scratch as it involves very inefficient processes (“live” multidimensional queries, multiple round trips of packets of data, etc).

On the other side you have the best of web, developed in an evolutionary fashion over the last 15 years: unstructured text search capabilities delivered by Google and other search engines (dynamically indexed and heavily optimised for performance), simple yet efficient RESTful service (according to Ed Parsons, not many are choosing to use SOAP these days) and simpler and lighter data delivery formats like KML, GeoRSS or GeoJSON (that have a major advantage – the content can be indexed by search engines and therefore making the datasets discoverable!). As this is much simpler setup it is gaining a widespread popularity amongst “lesser geeks”. US government portal is the best example of where this approach is proving its worth.

The key lesson is, if you want to get it working – keep it simple and do not separate metadata from your data to allow easy discovery of the information. And let the community of interest define what is important rather than prescribe upfront a rigid solution. The bottom line is that Google strength is in making sense of chaos that is in cyberspace so it should be no surprise that Ed is advocating similar approach to dealing with chaos of spatial data. But can the solution be really so simple?

The key issue is that most of us, especially scientists, would like to have a definite answer when we search for the right information. That is: “There are 3 data sets matching your search criteria” rather than: “There are 30,352 datasets found, first 100 closest matches are listed below…” (ie. the “Google way”). There is always that uncertainty, “Is there something better/ more appropriate out there or should I accept what Google is serving as the top search result?... What if I choose the incomplete or not the latest version of the dataset?”… So the need for highly structured approach to classify and manage spatial information is understandable but it comes at a heavy cost (both time and money) and in the end it can serve only the needs of a small and well defined group of users. “The web” approach can certainly bring quick results and open up otherwise inaccessible stores of spatial information to masses but I doubt it can easily address the issue of “the most authoritative source” that is so important with spatial information. In the end, the optimal solution will probably be a hybrid of the two approaches but one thing is certain, we will arrive at that optimal solution by evolution and not by design!

Saturday, October 3, 2009

Mushup Australia Contest

A few days ago Australian Government 2.0 Taskforce announced an open invitation to any "able and willing body" to create mashups with nominated datasets from various Federal and State jurisdictions in Australia. It is a contest so there will be prizes for winning entries:
* $10,000 for Excellence in Mashing category
* $5,000 for Highly Commendable Mashups
* $2,500 for Notable Mashing Achievements
* $2,000 for the People’s Choice Mashup prize
* $2,000 for the Best Student entry
* $1,000 bonuses for the Transformation Challenge

Anyone in the world is eligible to enter but prizes will only be awarded to individuals from Australia or teams where at least one member have Australian credentials. The contest is open from 10am October 7 to 4pm November 6 2009 (Australian Easter Standard Time - GMT+11.00).

I will be entering at least two of my applications that have been running on site for the last couple of years and are already used by quite a few people. These are: bushfire incidents map (part or my larger Natural Hazards Monitor concept) and Postcode Finder with links to Australian demographic information from the Australian Bureau of Statistics. If you have an application that is suitable to present information nominated in the contest rules and need Australian representative on the team or would like to access some of the data from my site I invite you to partner with me in this competition.

Region shaken by tragic events

Several terrible tragedies struck our region in the last few days. I was travelling for most of the week so only now am able to put together a few words regarding the events. There were deadly tsunamis on islands of Samoa, Tonga, American Samoa - caused by an earthquake in early morning of 30 September - and later that day two devastating earthquakes in Indonesia killing thousands. It is very sad news… Despite all the scientific advancements and sophisticated technology to monitor and predict such events, as well as significant infrastructure investment in warning systems, humanity is still very vulnerable to natural disasters.

My thoughts go out to families of those killed, and all injured and affected by these events.