Wednesday, May 15, 2013

Security standard for developers

I have been writing quite a lot about the standards recently - mostly about their deficiencies -  but today a few words about one standard that is really worth investing the time and effort into:  Application security  ISO/IEC27034.

It has been around for about 18 months so, it’s a relatively new concept but already adapted by the largest software development companies. Chances are it will become prominent enough to be demanded from contractors developing solutions for government and larger corporate clients.

From the introduction to the Standard:
ISO/IEC 27034:
a) applies to the underlying software of an application and to contributing factors that impact its security, such as data, technology, application development life cycle processes, supporting processes and actors; and
b) applies to all sizes and all types of organizations (e.g. commercial enterprises, government agencies, non-profit organizations) exposed to risks associated with applications.

The purpose of ISO/IEC 27034 is to assist organizations in integrating security seamlessly throughout the life cycle of their applications by:
   
• providing concepts, principles, frameworks, components and processes;

• providing process-oriented mechanisms for establishing security requirements, assessing security risks, assigning a Targeted Level of Trust and selecting corresponding security controls and verification measures;

• providing guidelines for establishing acceptance criteria to organizations outsourcing the development or operation of applications, and for organizations purchasing from third-party applications;

• providing process-oriented mechanisms for determining, generating and collecting the evidence needed to demonstrate that their applications can be used securely under a defined environment;

• supporting the general concepts specified in ISO/IEC 27001 and assisting with the satisfactory implementation of information security based on a risk management approach; and other standards.

A good starting point to learn about secure coding practices is a series of free courses/ video presentations offered by The Software Assurance Forum for Excellence in Code (SAFECode). You can read more about the Standard in a very interesting article by Lia Timson (The Age): If it's worth coding, it's worth securing.

Tuesday, May 14, 2013

Rift in GIS ranks

There is a troubling development in GIS community that threatens to flare up old animosities and split participants along the lines of big versus small as well as proprietary versus open source. In the centre of it are Open Geospatial Consortium (OGC) and its members, representing open source camp, minor proprietary solutions vendors and 800 pound gorilla in GIS – ESRI. Ultimately, the whole incident can undermine, one way or another, OGC stronghold on all things spatial…

Firstly, a short history of OGC, from my perspective. In the beginning OGC was just an obscure “public benefit” organisation focused on developing and promoting open and interoperable standards to facilitate exchange of spatial information. There was a real problem that required urgent attention – that is, finding an easy way to exchange vast amounts of valuable spatial data stored in “walled gardens” and in obscure proprietary formats.  So, many were prepared to put a lot of effort in their spare as well as “employer paid time” in order to come up with a solution to this problem. This early period attracted a lot of prominent participants from government organisations and academia across the world, as well as support (often financial) from several industry sectors hoping for significant cost savings in the future. It was a community wide initiative with roots firmly established in an open source movement.

When OGC standards started gaining traction and wider acceptance amongst key buyers of GIS technologies the organisation attracted to the cause members from a proprietary solutions camp as well. OGC reached then the peak of its relevance.  A new era started where a united GIS community worked together towards a common goal – achieving full interoperability of spatial data, regardless whether data originated in proprietary or open source formats, or using proprietary or open source tools.

However, the relevance of OGC and its initiatives is under threat now - from outsiders as rapid technological advancement demands more flexible, easier to implement and performance focused standards (like those developed by organisations and individuals operating on fringes of GIS  - Open Street Map initiative and GeoJSON are the two most prominent examples), but also from within its own ranks as the organisation is forced to deal with conflicting interests of its members.

In particular, ESRI had put forward a new GeoSpatial REST API standard for endorsement by OGC. It covers a wide set of web services including catalogue, map, feature, imagery, geometry, geoprocessing, and geocoding web services.  However, a significant number of OGC members lodged an objection to it and demand the documentation not to be adopted as an official OGC standard.

The essence of the objection is that the proposed standard is not a community designed scheme but rather just a documentation of an implementation of proprietary web services by a single commercial vendor.  No other vendors will be able to build solutions against that standard, and if endorsed, it will dilute relevance of existing OGC standards.  When it becomes an OGC endorsed standard there is a danger that buyers will demand that all the vendors of spatial solutions support that standard in their products, hence giving an unfair advantage in the market to vendors of ESRI solutions. The objections are fully documented on OSGeo website.

Unless a workable compromise can be found, then regardless whichever way OGC decision goes it will alienate a significant group of its members and ultimately, it will split GIS community. That is, it will upset either open source and non-ESRI vendors or ESRI and thousands of its affiliates whose livelihood is almost exclusively dependent on ESRI’s fortunes.  If things get rough ESRI is dominant enough in the market to go its own way, without the blessing from OGC. But will OGC be able to retain its relevance without such a prominent member as ESRI? Potentially some troubling times ahead for GIS industry...

Wednesday, May 8, 2013

Geoscience Australia new catalogue

Geoscience Australia has just released a new, map based catalogue to assist visitors to the site in the discovery and access to a diverse range of data, products and services available from the organisation.  Discovery and Delivery System is in beta release for public testing so, unfortunately not everything is working properly in Internet Explorer. However, you will still be able to find a lot of goodies there like, for example, over 100,000 historic Landsat-7 and Landsat-5 images over Australia captured between 2000 and 2010. These can be viewed as WMS overlays on the map or can be downloaded as data (in NetCDF format, but be careful, each scene is more than a gigabyte in size).


Mapping functionality of the catalogue is built with Google Map API. Backdrop map is a standard satellite overlay but users have a choice of several custom layers provided by GA, such as topographic map, land cover map, geology map or gravity map. 

The catalogue lists publications as well as data for use with GIS software. Advanced search option allows refining search criteria by specific theme, geographic location and time frame. Links provided in search results take users directly to a download page. The catalogue is a convenient way to discover what is available from Geoscience Australia although, at first, the choice may appear a bit overwhelming!

Related Posts:
Australian floods mapped
Google unlocks Landsat archives
Landslides in Australia
Continental reference image

Tuesday, April 30, 2013

Maps for Excel

The idea of integrating maps and spreadsheets is not new - both Microsoft Excel and Lotus 123 (for those who remember!) had maps embedded as far back as 1990’s. Yet, the usage of those maps never gained widespread popularity amongst spreadsheet users. On the other hand those who really needed to map their data were prepared to pay good money for the functionality. Some integrators did very well out of this, selling their plug-ins for up to $10,000 a user. Now Microsoft is making another attempt to integrate Excel and maps in 2013 version of the product. Microsoft is positioning this as part of its BI solution. Reportedly, it will be able to handle “big data” to the tune of "one million rows of data from an Excel workbook." The new functionality will allow users to geocode data as well as view geospatial and temporal data by creating thematic and heat maps.

[image courtesy of Excel Blog]

Google is already offering integrated spreadsheet, fusion tables and map solution through their Docs application so it is just a catch up for Microsoft. If history is anything to judge by, mapping data in spreadsheets will not become a mainstream use of Excel – simply because users will need to geocode all those “millions of records” in the first place - no indication so far as to how much this service will cost. Also, all those who deal with geocoding records know that it is not as simple as “pressing a button”. Not to mention that challenges with “processing of spatial data for 1 million points” have not been quite solved yet without a quite complex server setup so chances of doing anything useful with such vast amount of information, even within a desktop version of Bing Map, appear rather limited (in fact, some users have already commented about the performance). Spatial education within general population is another factor – general knowledge is increasing but is still not enough to deal with issues of “projections and datums” for spatial data and more advanced forms of analysis, beyond simple presentation of points on a map…

In my opinion Microsoft would have much better chances of success developing specific tools in support of specific business activities, with narrowly focused spatial functionality, rather than providing a generic DIY mapping capability. This strategy failed in the past but it seems to be a preferred approach by big players, including Google. Time will show if this time the outcome will be different.

Revisiting thematic map theory

There has been a long standing view that choropleth maps (from the Greek words choros - space, and pleth – value, or simply thematic maps) should not be used to present absolute values like, for example, counts of persons per postcode, unless mapped areas are of similar size. Ideally, only normalised values should be mapped (for example, proportions such as density of people per postcode area). People have been conditioned to perceive that “bigger means more” so, when we present information in a spatial content the effect of size of spatial units should be eliminated for a meaningful comparison.

For example, two postcodes may have exactly the same number of people residing in them, so both would be coloured the same way on an thematic map, but if one polygon is significantly larger than the other, it would give an impression of a greater importance (ie. larger size = more important). The argument goes that using ratios rather than absolute values eliminates the influence of area, so that the map becomes meaningful by portraying accurately the distribution of features within each area. In our example, if we use ratio of counts to area size, larger polygon would have a smaller density value hance would be coloured in a lighter shade, “demoting” the importance of that area in relation to the other.

[Source: Wikipedia]

There are all sorts of other concerns about inadequacy of thematic mapping for presentation of spatial information, including the more recent one that maps in Mercator projection, ie Google and other online maps, are particularly bad because they introduce extra distortion of areas the further you move from the equator. Not to mention there is a general concern that since users “get clues” as to hierarchical importance from both, the value attached to a polygon (shown with colour intensity) AND the size of respective polygon, they may not be able to interpret correctly the “hierarchy” of information being presented. That is, users are potentially unable to rank the order from “the smallest” to “the largest” because of conflicting messages presented on a map: large polygons with small absolute values (light colours) and small polygons with large absolute values (dark colours), or vice versa. Yet, many novice cartographers are either unaware of these issues or are totally ignoring “experts”, happily mapping absolute values on Google maps. 

However, as it turns out, there is some validity in this non-expert approach to thematic mapping. In particular, “…psychological studies have shown that size and color (or at least, size/value and size/hue) are a ‘separable’ combination: that is, variations in the size of graphic elements do not considerably interfere with our ability to determine their color [sequence], or vise-versa. So, theoretically, there should be no problem with using choropleths on a Mercator projection: the distorted areas shouldn’t mess up our ability to determine a region’s color, which is what choropleth mapping is all about.”

So, there you go – since only colour counts in thematic mapping and users are able to distinguish colour hierarchy independent of size of polygons,  it should ok to use absolute values with unequal polygons after all! Happy mapping!

Monday, April 29, 2013

Update on Google Adsense saga

For years I took Google at face value and accepted that they will “do no evil” but recent development prompted me to apply some scrutiny to the relationship. And I am finding it quite disturbing that Google is not playing fair anymore…

As mentioned in another post, a few months back Google started regularly cutting my share of earned Adsense revenue. I complained and after two months I finally got a response from Google:

…This is due to a larger portion of your received clicks being identified as invalid. Please be assured your account is still in good standing.


Intriguing but it gets better… In a typical for the company fashion, it is me and visitors to my site that are at fault – me by putting ads too close to the map (apparently) and visitors by using wrong browsers… (but I am assured to be “still in good standing”!).

…We suggest placing ads at least 150 pixels away from the map units." Really? And what about those ads which are actually on the map using standard Google Map API functionality?

…These invalid clicks were generated by users who accidentally clicked on ads, specifically from users who have visited your pages with certain browsers.


But then there is an admission that:


…Users visiting with those browsers are experiencing issues with our ad rendering, which causes users to unintentionally click on ads when they are navigating pages that contain map frames that span the majority of the page and content. We are working on the ad rendering issue internally, and should have it resolved shortly.

Wow, that is a revelation. So, the problem has been known to Google for at least the last 5 months (ie. this is how long Google “is acting on it” by ducting payments from my account). But there is a good chance the problem was there for a long time and that Google engineers knew about it much earlier than 5 months back (I did not make any changes to my site for quite a while)… And nothing has been done about it so far!

All in all, Google has been providing defective service that generated billions in revenue from invalid clicks in the past and when finally the problem was discovered, it takes many months to rectify it. In my books, and undoubtedly for many, the reputation of Google is getting quite close to the gutter. The company better come clean on this… 

I will be requesting Google to provide detailed logs of dates and counts of disallowed clicks and to demonstrate which pages specifically contribute to the invalid clicks problem. That would provide at least some transparency about how Google operates…  Otherwise, one will always wonder, is this a genuine problem or just a quick way of plugging the revenue outflow - actually, are there any advertisers who had their expenses refunded at the end of the month because of invalid clicks? Anyway, it looks that my relationship with Google will keep deteriorating rather than improving...

Related Posts:
Google's evil ways continue
Expose: Google's evil toys Panda and Penguin

Thursday, March 21, 2013

GIS metadata standard deficiencies

My recent post on GIS standards dilemma generated quite an interest so, as a follow up, I am publishing today a post explaining my position in more detail and illustrate deficiencies of one of the standards with concrete examples.


The conception of GIS metadata standard was a long awaited breakthrough and raised hopes of the entire spatial community that finally it will be possible to have a consistent way of describing vast amount of geographic information created over the years but also new data generated on a daily basis. The expected benefits of the standard were far reaching because it would allow consistent cataloguing, discovery and sharing of all the information. In other words, if successfully implemented, it would deliver a great economic benefit for all. The concept of Spatial Data infrastructure (SDI) was borne… That was more than a decade ago.

Fast forward to 2013. Creation of SDI has been a holy grail of GIS community for quite a long time so why, despite all the good intentions and millions of dollars poured into various initiatives, we still don’t have one in Australia? Why we don’t even have in place the first element of that infrastructure – a single catalogue of spatial data? In my opinion the answer is simple – because we are trying to act on a flowed concept.

At the core of the problem is a conceptual flaw in the underlying metadata standard that makes it impossible to implement successfully any nation wide or international SDI. In other words, SDI concept will never work beyond a closely controlled community of interest, with a “dictatorship like” implementation of the rules that go far beyond the loosely defined standards. Until that flaw is widely acknowledged we cannot move forward. Any attempt to build a national SDI, or even a simple catalogue based on flawed ISO 19115 standard is bound to fail and is a total waste of money. The reason why follows...

For years many were led to believe “follow the standard and everything will take care of itself”. But the reality check provides a totally different picture. For a start, it took years to formalise Australian profile of ISO 19115 standard. Then everybody started working on their own extensions because it turned out it is quite hard to implement the standard in a meaningful way for all the data types as well as historical data which lack many details about it. But the true nature of the problem lays somewhere else... 

You see, the standard prescribes the structure of the metadata record, that is, what information should be included, but to a large degree, it does not mandate the content. The result is a “free text” like entry for almost everything that is included in a metadata record. Just to illustrate, access constraint is specified as “legal” and “use” related, and both are limited to the following categories: “copyright, patent, patentPending, trademark, license, intellectualPropertyRights, restricted, otherConstraints”. But the information is optional so that metadata element may also be empty. Now, consider a case of a user who tries to find free data… impossible.

Inclusion of so much free text in metadata information means the key benefit of creating a structured metadata record in the first place is almost entirely lost. Yes, it describes the dataset it refers to but in a totally unique way, which means searching a collection of records can only be limited to very generic criteria – in practice, with any certainty to only time and location (ie. a bounding box for the dataset). The problem is compounded if you start looking across different collections of metadata records, created and maintained by different individuals, with a different logic of what is important and what is not… But don’t blame the creators of metadata records for this – the standard does not prescribe the content in the first place!

The second problem is that the current metadata standard is applied primarily to collections (like, for example, TOPO-250K Series 3 topographic vector data for Australia or its raster representation) but it is generally not applied to individual data layers within a collection (which, in case of TOPO-250K Series 3 data would be any of 92 layers that comprise the collection). Therefore, a simple search for say, “road vector data in Australia” will not yield any results unless you revert to free text search option and “roads” happen to be specifically mentioned somewhere within metadata record (more on this below).

Not to mention that it would be almost impossible, from a practical point of view, to apply the metadata in the existing format to individual features or points making up that feature. This aspect of information about spatial data, especially important for the data originators and maintainers, has been totally overlooked by the creators of the metadata standard. 

Then there is a data user perspective. The key benefit of a comprehensive metadata record is that it provides all the relevant information enabling user to firstly, find the data and secondly, decide whether it is fit for intended purpose. In the most general terms, the users apply “when, where and what” criteria to find the data (not necessary in that specific order). In particular, they specify the reference date (relatively well defined in metadata records so, the least of the problems), location (which is limited only to a bounding box but data footprint concept is also addressed within existing metadata standard) and some characteristics of the dataset … and this is where things are not so great because each data type will have its own set of characteristics and these are mostly optional in an ISO 19115 compliant metadata record (so may not be implemented at all by data providers).

Take for example cases where users are interested in “2m accuracy roads dataset for Bendigo, Vic”… or “imagery over Campbelltown, NSW acquired no later than 3 months ago and with under 1m resolution”. It is virtually impossible to specify search criteria in this way so the users have to fit their criteria to information that is captured in metadata. That is, location becomes the bounding box constraint, time criterion becomes date constraint (either specific or as a range from – to) and the characteristics of datasets can only be specified as keywords…

And this leads me to the final point - the need for ISO 19115 compliant metadata in the first place. Since the only truly comprehensive way to find what you are looking for is to conduct free text search, the structured content of the metadata record is obsolete. The result would be exactly the same if the information is compiled into “a few paragraphs of text”. That is the essence of the argument Ed Parsons, Geospatial Technologist of Google presented to the Australian spatial community as far back as 2009 but which remains mostly ignored to this day…

There is only one practical use for all the metadata records already created. You can dump the entire content of the catalogues, the ones that contain the information about the data you care, into your own server and reprocess it to your liking into something more meaningful, or just expose it to Google robots so that content can be indexed and becomes discoverable via standard Google search. Unfortunately, this totally defeats another implied benefit of SDI - that metadata records will be maintained and updated at the source and that there will be no need for duplication of information…

I believe it is time to close the chapter on a national SDI and move on. Another failed attempt to create “an infrastructure that will serve all users in Australia” cannot be reasonably justified. The bar has to be lowered to cater only for the needs of your own community of practice. Which also means, you have to do it all by yourself and according to your own rules (ie. most likely creating your own metadata standard). That’s the only way to move forward.


Related Posts:
Ed Parsons on Spatial Data Infrastructure
Data overload makes SDI obsolete

GIS standards dilemma