Dealing with legacy content

The biggest problem faced by large organisations with numerous content providers is legacy content. How do you deal with ageing content on a website with little in the way of central control?

At Headscape we work with a lot of organisations who have content heavy websites updated by large numbers of content providers. In many cases these sites have little in the way of central editorial control and so quickly become bloated with huge amounts of legacy content.

This amount of content creates some serious problems.

The problem with legacy content

With a lot of different people adding content to the website but few considering whether old content needs to be removed, it is not unusual for some site to have hundreds of thousands of pages. This creates two distinct problems; the ‘needle in a haystack’ scenario and out of date content.

A needle in a haystack

With some much content on the website it becomes increasingly hard for users to find the content they are looking for. Navigation becomes verbose and difficult to navigate. Search results return so many results that the chances of a user finding something they want is significantly reduced. In short users are left trying to find the proverbial needle in a haystack.

Needle in a haystack

Image provided courtesy of Shutterstock (Timothy Boomer)

Out of date content

With so much content on the website it is hard to ensure that everything is up to date. Old news stories and event listings long since past are only the tip of the iceberg. There is also content that is no longer accurate or now presents the organisation in the wrong light.

Although in theory each content provider should be responsible for ensuring that their own content is up-to-date this simply doesn’t work in practice. People leave, are too busy or simply forget to check the relevancy of content regularly.

In an ideal world there would be a team of central editors checking the pages on a regular basis to ensure the content is still relevant. However there are rarely the resources to do so. Even when their is a central editorial team they are normally too busy checking new content to worry about stuff already online.

Website showing out of date event

The other problem central editorial teams face is that when they suggest removing content they encounter political objections. Many content providers are defensive about their content even if they do not maintain it properly. They don’t like the idea of others telling them what they can and cannot have online. In other words they don’t like somebody telling them what to do.

The solution proposed by many content strategists would be a complete audit of the site. However, this involves checking every single page and that is just not practical in most cases. It also doesn’t solve the problem of politics. What is required is a solution that is automated.

An automated solution

An automated solution is good for two reasons. First, it doesn’t require anybody manually checking all of the pages. Second, it doesn’t require one person telling another that their content is going to be taken down. The whole thing just happens. People are much more likely to agree to an automated policy for content control than they are to being singled out as somebody who hasn’t maintained their content properly.

So how would this automated approach work in practice?

Automated review points

Essentially a review of a particular webpage would occur when certain criteria are met. This review could happen automatically or manually depending on your preference. However, in either case it requires your content management system being able to identify pages that have reached a certain age (or a certain time since they were last reviewed). In most cases this is something that already exists in a CMS or could easily be added.

An alternative to time based review points would be traffic based. This is designed to remove content that is not really used by users rather than out of date content. This review point would be triggered if the traffic to a page falls below a certain threshold over a given period. This would indicate that the page is of little interest and is simply making it more difficult for the majority of people to find what they are after.

Image of the word policy being highlighted with pen

Image provided courtesy of Shutterstock (Aaron Amat)

This is a lesson Microsoft had to learn with its support pages. They had support pages for every conceivable issue. However, instead of helping users most of this content just cluttered up the site and made it harder for users to find what they really wanted. In the end they removed less frequented pages and their customer satisfaction shot up.

How often you choose to review pages or how low the traffic trigger is, is entirely up to you. This will depend on how often your site/organisation changes and how much you want to ask of your content providers.

When a page is identified for review an email is sent out to the owner of this page (either manually or automatically) asking them to check the page. Ideally this should simply involve the content provider logging into the CMS and editing the page in question. A simple check box saying that the page is up-to-date is all that is required. If that is not possible a reply by email saying that the page is up-to-date would be just as good.

Sample email

If the content provider fails to identify the page as up-to-date within a set time period, this triggers a cleanup event (see below). Notice the default here. At the moment the majority of websites defaults are organised so that if the content provider does nothing the content remains online. This approach turns that on its head. No action leads to content being marked for cleanup.

What happens when a cleanup is triggered?

How you choose to handle the cleanup of webpages is up to you. However, here is my recommended process:

Mark the page as being old content

The first step would be to mark the content as old and potentially out of date. This can be done by automatically inserting a banner at the head of the main content telling the user that this content is potentially out of date. Below is an example of how this might look.

Example notification banner

You might wish to also send an email update to the content owner of that page saying that the page has been marked as out of date.

Remove the page from the site’s navigation

If the content provider still hasn’t checked the page after a set period you might then choose to trigger a further event that removes the page from the navigational structure of the site. This will reduce the clutter that users need to navigate through to find the page they want. However, for those who still really want to access these pages they are still findable via search.

Remove the page from the search results

Of course there is also the option to prevent pages being returned in search results too. It can be hard to find the right page when searching a large site simply because of the amount of content being returned. If a piece of content is out of date then it makes sense not to return it in the search results.

The Dell search results page showing 22000 results

This effectively orphans the page but keeps it online. You may wonder what the point of this is. Surely you would be better deleting the page entirely?

Delete the page altogether

There are mixed opinions about deleting content entirely. On the surface it seems like the most logical thing to do. If content is horribly out of date or is rarely visited what is the point of it being online?

As I see it there is no harm in keeping it online if it is clearly labelled as out of date and it no longer prevents users from finding content they really want. However, removing it can be damaging.

For a start there maybe third party links to that page let alone hard coded links within your own website. The last thing you want to present a user with is a ‘page not found’ error.

The only time I would recommend removing a page entirely is when the user can be automatically redirected to an alternative page that serves their needs better.


I am not suggesting that this approach is perfect. There is nothing stopping a content provider just checking the ‘this page is up-to-date’ box without properly reviewing the content. However, it does put the onus on the content provider to take action. This should automatically remove huge amounts of content from the site without battling with each content provider individually.