Tuesday, April 30, 2013

Maps for Excel

The idea of integrating maps and spreadsheets is not new - both Microsoft Excel and Lotus 123 (for those who remember!) had maps embedded as far back as 1990’s. Yet, the usage of those maps never gained widespread popularity amongst spreadsheet users. On the other hand those who really needed to map their data were prepared to pay good money for the functionality. Some integrators did very well out of this, selling their plug-ins for up to $10,000 a user. Now Microsoft is making another attempt to integrate Excel and maps in 2013 version of the product. Microsoft is positioning this as part of its BI solution. Reportedly, it will be able to handle “big data” to the tune of "one million rows of data from an Excel workbook." The new functionality will allow users to geocode data as well as view geospatial and temporal data by creating thematic and heat maps.

[image courtesy of Excel Blog]

Google is already offering integrated spreadsheet, fusion tables and map solution through their Docs application so it is just a catch up for Microsoft. If history is anything to judge by, mapping data in spreadsheets will not become a mainstream use of Excel – simply because users will need to geocode all those “millions of records” in the first place - no indication so far as to how much this service will cost. Also, all those who deal with geocoding records know that it is not as simple as “pressing a button”. Not to mention that challenges with “processing of spatial data for 1 million points” have not been quite solved yet without a quite complex server setup so chances of doing anything useful with such vast amount of information, even within a desktop version of Bing Map, appear rather limited (in fact, some users have already commented about the performance). Spatial education within general population is another factor – general knowledge is increasing but is still not enough to deal with issues of “projections and datums” for spatial data and more advanced forms of analysis, beyond simple presentation of points on a map…

In my opinion Microsoft would have much better chances of success developing specific tools in support of specific business activities, with narrowly focused spatial functionality, rather than providing a generic DIY mapping capability. This strategy failed in the past but it seems to be a preferred approach by big players, including Google. Time will show if this time the outcome will be different.

Revisiting thematic map theory

There has been a long standing view that choropleth maps (from the Greek words choros - space, and pleth – value, or simply thematic maps) should not be used to present absolute values like, for example, counts of persons per postcode, unless mapped areas are of similar size. Ideally, only normalised values should be mapped (for example, proportions such as density of people per postcode area). People have been conditioned to perceive that “bigger means more” so, when we present information in a spatial content the effect of size of spatial units should be eliminated for a meaningful comparison.

For example, two postcodes may have exactly the same number of people residing in them, so both would be coloured the same way on an thematic map, but if one polygon is significantly larger than the other, it would give an impression of a greater importance (ie. larger size = more important). The argument goes that using ratios rather than absolute values eliminates the influence of area, so that the map becomes meaningful by portraying accurately the distribution of features within each area. In our example, if we use ratio of counts to area size, larger polygon would have a smaller density value hance would be coloured in a lighter shade, “demoting” the importance of that area in relation to the other.

[Source: Wikipedia]

There are all sorts of other concerns about inadequacy of thematic mapping for presentation of spatial information, including the more recent one that maps in Mercator projection, ie Google and other online maps, are particularly bad because they introduce extra distortion of areas the further you move from the equator. Not to mention there is a general concern that since users “get clues” as to hierarchical importance from both, the value attached to a polygon (shown with colour intensity) AND the size of respective polygon, they may not be able to interpret correctly the “hierarchy” of information being presented. That is, users are potentially unable to rank the order from “the smallest” to “the largest” because of conflicting messages presented on a map: large polygons with small absolute values (light colours) and small polygons with large absolute values (dark colours), or vice versa. Yet, many novice cartographers are either unaware of these issues or are totally ignoring “experts”, happily mapping absolute values on Google maps. 

However, as it turns out, there is some validity in this non-expert approach to thematic mapping. In particular, “…psychological studies have shown that size and color (or at least, size/value and size/hue) are a ‘separable’ combination: that is, variations in the size of graphic elements do not considerably interfere with our ability to determine their color [sequence], or vise-versa. So, theoretically, there should be no problem with using choropleths on a Mercator projection: the distorted areas shouldn’t mess up our ability to determine a region’s color, which is what choropleth mapping is all about.”

So, there you go – since only colour counts in thematic mapping and users are able to distinguish colour hierarchy independent of size of polygons,  it should ok to use absolute values with unequal polygons after all! Happy mapping!