In 1930s Australia one of the most important agricultural products in mass farming was Sugar Cane.  While it wasn’t an indigenous plant to Australia, the climate and vast amount of available land and cheap labour in Australia, meant that the country was ideal for industrial farming. The Australian sugar cane  output made up a large proportion of the country’s exports to the rest of the world, and was a vital part of the economy of the country following the Great Depression.  So valuable was the crop to Australia, that anything that threatened growth was a critical issue that needed to be dealt with quickly and effectively.  As the amount of land that was being turned over to sugar cane grew, so did the incidence of cane beetles.

The beetles’ larvae grew in the roots of the plants, eating them and destroying the plant above the surface which created major issues for the cane production industry and impacted on harvests.

The response to the sugar cane crisis was the introduction of the natural predator of the cane beetle – the cane toad:

The Cane Toad: Ugly

The Cane Toad: Ugly

In 1935, just 102 cane toads were released into the wild in a bid to control the beetle population.  Now, 75 years later, there are more than 200 million cane toads in Australia, and their impact on native species in the country has been massive, pushing certain animals like the quoll close to extinction in some areas, and devastating industries in unforseen ways.

That’s the problem with introducing new species into an existing infrastructure, you don’t necessarily know what the ultimate results will be.

Chaos Theory

Chaos Theory describes unpredictable behaviour in complex systems.  A small change in the initial conditions of a system can have unforeseen consequences that seem disproportionate to the cause.   Chaos Theory is pretty cool – if a little mind blowing, and uses probabilistic calculations to create models that can be used to forecast likely outcomes from a starting position.  You see examples of chaotic systems every day – clouds, rivers, turbulence around bridges.  You’ve probably also seen movies that use chaos as a precept or plot device.  Chaos is pretty much everywhere – scratch the surface of any seemingly stable system, and you will see that the stability is an approximation that hides massive turbulence below the surface.

Google’s results are a chaotic system.  Especially since Caffeine was launched.  Small changes to content, new links being discovered, or errors becoming system wide lead to massive changes in the results that a user sees, but in most cases, because a user only sees a given results set once, at a single time, the results appear stable.  they’re not.  Here’s a graph showing the changes in ranking within the top 20 sites for a particular keyword over a month or so:

changing keyword rankings

keyword rankings over time: unpredictable

Google apparently consider more than 1000 variables in calculating relevance – a small difference in any one of them for one of the thousands of websites competing for a particular search term can have huge results.  When you have a vertical that is highly competitive with a lot of activity happening, the system becomes inherently unstable.

Rel=Canonical

I could pick from dozens of examples where a small change in Google has led to unforseen changes, but here’s one that is pretty current.  In February 2009, Google introduced Rel=canonical to webmasters to allow them to specify which version of a page that could be reached by multiple URLs was the correct one to display to users in the search results.  This, like many Google changes was a short term fix to an issue that in the longer term they would want to be able to handle algorithmically.  Through the use of this modifier, Google would be able to pick up signals within websites that could then be used to correctly determine the canonical page on sites that did not make use of the technique.

It soon became apparent though that there was a loophole in the way that rel=canonical could be used:  A spammer could inject code into a website to add rel=canonical <link> tags into the content of the page that would tell Google that the correct version of the page that was being crawled was actually on a separate domain – for example, one might add the following code to this web page:

Spamming with Rel Canonical

Spamming with Rel Canonical

This would tell Google that the actual content they should display for a particular post was actually on SEOMOZ, rather than here.  In theory, their blog would rank in place of mine.  An application for this might be an affiliate working on “behalf” of a retailer injecting rel=canonical onto every product page of the retailer’s website and directing the search engines to the version of the page hosted on their own domain.  Add in some clever cloaking to prevent anyone accessing the site from the IP address associated with the business you were hijacking, and you’d be able to steal a significant chunk of their business.  It’s vaguely similar to the 302 hijacking bug that Google had a few years ago.

Matt Cutts warned followers via Twitter, which sparked a big discussion, and also blogged about what Google is doing to sort out the issue.

Overall though it’s a warning that within a system as complex as the web, making a small change can have a massive impact, and yet another reminder to webmasters that they need to maintain absolute security over their content and website.

Tagged with:
 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.