Monday, July 25, 2011

Shp data and Fusion Tables

A lot of geographic data in public domain is distributed in SHP format. However, Fusion Tables application supports geographic data only in KML format. Google has recognised the opportunity and is now providing a link to a “translator/loader” application to facilitate uploading of SHP files into Fusion Tables. Shpescape has been implemented with GeoDjango framework and is aimed at facilitating the process of converting and loading that vast resource of GIS data from SHP format into Fusion Tables. It should potentially improve uptake of Fusion Tables by GIS as well as broader application development community.

The concept behind Shpescape is great but, for now it fails in terms of performance. I tried the application with a modest size SHP datataset (40MB) and the result was not impressive. It took extremely long time to upload the data to the server, process it into KML and load into the Fusion Tables (short of an hour!). I know from my own experiments that converting SHP into KML takes only a few seconds with basic PHP script. Allowing for download and upload time (since 2 separate servers are involved), the whole process should be finished in a matter of minutes and not almost an hour. The biggest disappointment was that the algorithm used in Shpescape enforces generalisation of polygons and does not process “point for point” from SHP to KML [correction, it’s is actually undesirable Fusion Tables feature and not Shpescape fault]. It resulted in some polygons being converted incorrectly and/or corrupted in the process (as per image below).

Shpescape will work with small SHP files, with simple geometries but, as it stands, Fusion Tables are unable to handle full resolution datasets. Therefore it may be better to generalise SHP files before loading into Fusion Tables via Shpescape.

Related Posts:
Converting shapefile to KML
Converting csv data into shapefile


Unknown said...

Try using Quantum GIS for this conversion, it's a very convenient open source GIS-programm.

At least on my pc, the shp->kml conversion was very simple and quick.

All Things Spatial said...

Indeed Eric, it’s a very easy process with QGIS – I tried it myself earlier today! I have to flag this option in the related post. Thanks for bringing it up.

If I could only figure out how to add text to be converted to KML at the same time (for info windows). I know I can add a column to attribute table but not sure how to populate it with relevant HTML tags and text. There seem to be a limit of 255 characters. Any suggestions?

Josh said...

Hey - thanks for the post and good feedback on shpEscape.

A few comments:
- The site isn't officially supported by Google; just something I built one weekend

- In terms of performance, I do some row by row analysis which is definitely not optimal, but also helps ensure compatibility with Fusion Tables (for example the requirement all geometries are less than 1M characters in length after conversion to KML, which means I have to simplify some of them).

- If the geometry is valid and doesn't require simplification it just uses GDAL/OGR to do the shp-> KML conversion, so there shouldn't be any additional corruption of geometries from the shpEscape code

- Good point about slowness: As of v1.9 GDAL supports a Fusion Tables driver which I'll use instead of manually going through each row. One advantage (in my mind) of shpEscape is it adds some attributes for each feature that let you more easily style things in Fusion Tables, but most people may not care about this so speed should be the default

Thanks again for the comments.

All Things Spatial said...

Hi Josh and thanks for your comment.

Having spent more time playing with Fusion Tables I now better understand the limitations. Indeed, it’s not shpEscape that is the issue but rather Google Fusion Tables itself. It generalises some polygons while plainly “refusing” to process others - 1 million characters limit is really an issue that may hold back some potential users since it requires additional processing of input files. And when the row fails, the rest of the rows is not loaded either.

I believe shpEscape has a great potential, especially if you allow for some “user input” (eg. selection of attributes to transfer) and strictly implement checks for 1 million character limit per cell so all polygons are loaded. But the challenge will be to maintain topological consistency (ie. if one polygon is generalised and the adjoining is not - it may result in mismatch in boundaries)

I amended my post based on your comment and my further investigations. Keep me posted when you have the next version ready!