Lou Rosenfeld shares how the search terms used on our websites can reveal a lot about our users.
Paul: So joining me today is Lou Rosenfield, good to have you on the show Lou
Lou: Thanks Paul
Paul: So just in case one of the three people in the world that have never heard of you before is listening to this show right now, do you want to give yourself a bit of an introduction, just a little bit about who you are and how you come to be in the world that is web.
Lou: Sure, I once was a librarian and I moved into Information Architecture at a time when it was sort of seen as librarianship for the web and sometime in the mid nineties I co-wrote a book with Peter Morville called Information Architecture for the world wide web for O’Reilly, which is now in it’s third addition and a lot of people look to that book when they want to learn about information architecture. I have been involved in a lot of things in the IA community and more recently in the broader UX community User Experience and one of those things is as a publisher of User Experience books. My company is called Rosenfield Media, with seven titles out hopefully the eighth or ninth will be my new book which I am co-writing with Marko Hurst on site search analytics.
Lou: so that’s what we are going to be talking about today and hopefully the book will be out by the end of the year.
Paul: ahh, that’s brilliant stuff, so I mean you have just done a virtual seminar for Jared Spool on this kind of subject as well and it is something that kind of peaked ny interest so I thought it would be great to get you on and talk about this subject, in someways you are very honoured if I may say so Lou because you are the first of our kind of one-off interviews that aren’t a part of the main Boagworld show because we are not doing that at the moment. But it was such an exciting area and something that really interested me that I was really keen to get you on and talk about this. So why don’t you tell us how analytics generally, not just search analytics but analytics generally how do you feel they inform the user experience ? Why should we be caring about this ? What difference does it make ?
Lou: Well a lot of people who do user experience work are both on the design side and the evaluation and research side we are not necessarily all that comfortable with the numbers and in fact many of us are in the neck of the woods of this profession that we are in because there is no a huge pressure to do statistical analysis and I think that is too bad I think that one of the things that really hurts us is that we are numbers adverse and with something like site search analytics you can actually learn quite a bit of information that will help inform your design work, with just an excel spreadsheet and a little bit of data that you already have, if you have a search engine, at some point there is some way, you may already be gathering, some way of gathering the data, you may already be gathering it somewhere and you don’t have to be a statistician. In fact one of the really interesting things about this type of data is that it is not all numbers it is actually very semantically rich. So what I am talking about with site search analytics is we are basically harvesting users’ search queries that are being executed on our own site search engines and they are telling us in their own words what it is they want from us and so we are not just doing statistical analysis we are actually looking at the semantic nature of what their interests are
Lou: what their information needs are, so it is interesting as that is a little different from most analytics so if you shy away from analytics you might think about taking a special look at site search analytics and if you are an analytics person what I found paul is most analytics people don’t pay much attention into this area either, some are certainly good at SEO and looking at web searches that are drawing people to a particular site but once people are searching on the site it is sort of someone else’s table if you will. So site search analytics is kind of like the orphan child of these two fields that don’t pay much attention to it,web analytics and user research and I am hoping that our book helps change that a bit.
Paul: So what would, lets be clear what we are talking about here. We are talking about taking the search terms that people have searched for on your site and analysing that. To what end ? What benefits do you get from that ?
Lou: Well there are so many that I almost don’t know where to begin but a few basic ones, one is that you can very quickly by simply sorting the queries from most frequent to least frequent, in other words from like the ones that got three thousand instances, in other words a query that was search three thousand times last week might be your most frequent query and at the other end of the long tail the one that are only used once. if you know what that short head is you can actually by improving performance for those top few most frequent queries really improve the user experience overall. So we find that if you map it out that, I wish we could do this visually but it is something that is like a hockey stick curve, it is called the zip distribution, what you start seeing is that something like you top ten most frequent queries may account for something like ten or twenty percent of all the search activity in a given time period, and you know the top twenty or thirty queries you are still are talking about a pretty huge volume of all of search activity. So by doing things like for example adding best bet search results to the most frequent queries or by looking for queries that are finding nothing that are really common and plugging the content gaps or improving the meta data, labelling that content as or should as so as that content gets found you can really very quickly make a very big improvement and it gets even better than that Paul because if you start doing that type of tuning for the most frequent queries the ones that people most care about and do that on a regular basis say every month you are doing a great tuning process adjusting your sites performance to your users needs. The more you do things like that the more you can avoid what for me is just like the most dirty word in the industry and that is redesign.
Paul: [laughs] yeah absolutely.
Lou: so if we can do tuning the more we are going to fight of the urge to throw a million dollars or pounds at a big problem that we are really take on the wrong way so tuning over redesign any day. Site search analytics is a great component in the tuning process.
Paul: yes I talk about evolution rather then revolution, instead of redesigning which is this huge undertaking continually evolving your site and anything that helps to inform that has got to be incredibly valuable
Paul: But I mean a lot, something that a lot of people may already be doing is they may already be looking at the google results they are getting, the analytics they are getting there, what are the advantages of looking at your own internal search rather than the results that have been generated by google.
Lou: well if you look at the most common google or world wide keywords that are bringing people to your site often what you are going to see as the top ones are the name of organisation or some variant on it. So now lets say you just get rid of those, because those are not that interesting and you want to look at the things that are more open ended searches where people happen to not be looking for use to specifically find their way round your site, they are certainly going to be a lot of overlap but the sense I have is that first of all the people who are searching your site have more specific needs. They already know something about you it maybe that they are a different type of user we don’t only care about bringing people to our site and make sure they get there we also worry about the people who are native to your site who maybe repeat visitors and they may already be loyal customers we care about them retaining the customers is a lot less expensive then recruiting one.
Lou: so eh what can we do to learn about their needs specifically and I will tell you exactly, I have a theory I haven’t really been able to prove it yet but I think that the nature of the queries that come into your site on your search engine are going to be more specific and more finely grained than those that are coming in from the web that being the case you know what more specific, it is almost like a predictor what web searches are going to be in the future. In other words the assumption is that peoples’ searches get more specific over time so you could probably use your site search terms to help you figure out more specific and less expensive keywords to bid on in adwords
Lou: So there might be a secret little approach you could take there to do a better job than instead of bidding on the general search terms in google that are going to be really expensive and not really going to be helping you that much
Paul: I guess there is also going to be an element of the fact that for somebody to arrive at your site from google on a particular search term then your site has to already have to have content to have been listed on google for the search term they typed in. While with internal search engine they could quite easily type in something that isn’t a term that you use on your website and as you talked about earlier you need to plug the gap of that, but you are never going to get that from google because they wouldn’t have been referring to your site if you did not have any content relating to that particular term. Does that kind of make any sense.
Lou: that is absolutely right and then there is another important way that you are going to benefit from analysing searching within your site again we don’t just care about getting people to a site, we care about their experience once they are there. One of the things we can learn about is where navigation fails. So let’s imagine that we know your site has, we sell thirty different products on your site and each product has it’s own main page, it’s overview page it is really interesting to do apple and apple comparisons of pages and what types of queries those pages create. Let me put it in a slightly better way if you are looking at your product over a few pages and look at the queries that start from those pages.
Lou: You start learning about our patterns of information needs once people have found their way to a particular product and you may see that that kind of deep horizontal or contextual navigation which you are generating raw, you can start seeing patterns where people are saying I am on our product page and I don’t see the navigation that is going to get me to the review page or to the forums page or whatever
Lou: So maybe there are links that you have there but they are not prominent enough or you are not labelling them well or maybe those links aren’t there at all. So now you can start coming up with some ideas hypotheses what the problem is there and just go and think about it and say I trust my hypotheses here that you know we are missing links or you can start doing some qualitative research, you could do some user testing to validate your hypotheses so it depends how you are going to make your decisions but you have got some great hypotheses and by the way that’s what analytics is really for it is not going to tell you why, it is not going to validate your hypotheses it is going to help you come up with good hypotheses that are data driven and analytics tells you what is happening, it tells you about behaviour and not why things are happening, that’s were you really need to bring in qualitative research.
Paul: Yes, that was really the next question I was going to ask is how does analytics sit along side traditional approaches to improving usability which is like user testing basically,
Paul: because traditionally when we want improve the user experience we turn to user testing we sit users down we show them stuff and there is a real value in that kind of back and forth dialogue that you have and you don’t get that from analytics you are saying they perform different roles.
Lou: They absolutely perform different roles and this is one of the things that I am finding in my consulting, i’m a publisher but I still have to make a living so I do a lot of consulting still and I am seeing organisations that have incredible staff and resources in their analytics groups and separately their user research groups and too infrequently the twain meet and there is a big disconnect there, that they are suffering from because they have, I mean many organisations have just great analytics now they have great tools like armature and they only can really know about what is happening they can not really know why things are happening but they don’t … there is a disconnect in terms of them have the people who can do the qualitative user research and take the next step and actually do some testing and try and learn about this hypotheses and show which ones are actually real so sometimes it is really straightforward like erm you know you want to do task analysis, which is a qualitative approach it is not like any other user testing it is not cheap to do but if you had informed the kind of task that you are going to be testing by looking at your top queries you would be doing a better job it is going to point you and help you devote that expensive work, that qualitative research budget in a more efficient and effective way. What about when you are developing personas why not take if you can, get that audience segmented queries and start building those as a expression of the information needs for each, within each persona. You know we were just talking there it reminded me of a great story years ago I believe it was at Lands End, a US clothing retailer
Lou: and they were looking in their search logs and they saw a preponderance of SKUs, er product ids but those codes were not on their website and they were really confused
Lou: so they didn’t know why those were there and they knew what to do which was to start supporting the inclusion of SKUs into product pages so people would actually be able to get that information right away but then they followed up the site search analytics work they did with an ethnographic study where they went out into the field and watched how their customers were interacting with the information in the columns and what they found was, it was probably about ten years ago, that people were not comfortable using the website catalogue to do research they were used to using the printed catalogue, it is familiar it is high res easy to use so they would do their shopping browsing the catalogue but then they did not want to use the catalogue mail order or the 800 number ordering systems instead they went onto the website for Lands end and entered the SKUs and did their shopping there
Paul: aaahhhh yes
Lou: so there is interesting things that all types of data when folded in with user research can tell us and certainly site search analytics is no exception to that.
Paul: hmmm, I mean the thing is, is that collecting the data is the easy part and there are so many great tools out there you know and so many free great tools that enable you to collect this search data or other analytics data but collecting data is easy interpretation on the other hand is much harder I think a lot of people when they are faced with this kind of information are a little bit overwhelmed on where to start or how to get information out of it, earlier you were saying that this is relatively easy but it doesn’t feel like it when you are faced with it so what should we be looking at to better understand how our sites are being used, what should we be doing with this data.
Lou: Right here’s the beauty of the what I described early the zip distribution is that it really promotes scalability in terms of your efforts so if you have an hour I recommend looking at those top ten queries
Lou: and seeing what is going on, even just testing them out and see how they are performing and that is something you can do in a very small amount of time, maybe you only need the top five. It is not a lot of work and it has a real big impact.
Paul: Sorry so when you say testing them out what do you mean by that how would you test them out.
Lou: so you casn take those queries and just go ahead and search them on your site
Paul: and see what’s returned
Lou: and see what’s returned and do you think they are returning good search results do you think there are important things that are being missed if so why? Start testing it out and actually hmmmm, there is a really great case study that Vanguard did, Vanguard is a US based financial services company and they have really invested heavily in this we are actually profiling their work in our book and what I can do is provide you a url for the case study and there is a presentation and it is really eye-opening, I will give you the url Paul so maybe you can share it along with the podcast.
Lou: but I don’t have it at my fingertips right now
Paul: no that is fine, that’s OK
Lou: your question was?
Paul: essentially yes, that, erm no I have forgotten it myself now [laughs] brain’s gone dead
Lou: I was just going to say there is not only this issue of just a little bit of work will go very far but a lot of times people are overwhelmed when they see the analytics reports and part of the problem is those reports are canned reports some of them are pretty universally useful and interesting regardless of what kind of organisation you are in so it is good to see things like your most recent queries or which queries are failing, retrieving zero results but I really encourage people to get at the data and roll their sleeves up themselves and basically wade in and play with the data. So get beyond the canned reports and if you got just even get your hands on a couple hundred of your top queries and put them in excel and then just play with them. By play with them I mean looking for clusters or categories and just things that might emerge like wow there is some unique outliers here there is interesting queries, like Lands End did finding a lot of SKU searches in their logs, that is not necessarily, there is no right or wrong way to do it just the idea of just sort of experimenting and doing what the statisticians call exploratory data analysis so you are really literally just playing with the data. You might even map it out and chart it out in Excel and just sort of see what comes from it.
Lou: So one thing I encourage people to do is to try to categorise the data in other words gee it seems like there is a lot of queries here about physical places, maybe our organisation has different offices or campuses or different buildings, look for things that seem to be people or different topics that emerge what you start doing is that you force yourself to get very close to the way users are thinking because you are looking at what their needs are, and actually it is a good way of looking at what sort of metadata your site ought to have and what kinds of content type people seem to be asking for and it might even help you do things like prioritise your next content migration because you start getting a sense of what are the really important content types that people seem to be requesting when they are searching so there are other things which you might delve into. Queries, you might see a lot of queries that are like dates, and I know the Financial Times did that and they now support sorting search results by date, filtering them by date. You know one of the things the Financial Times does, it is a great example, is they look for spikes in names of people and companies.
Lou: and when they see that it is all of a sudden this person is being searched for a lot they compare those names, the spikes to the recent week of editorial coverage and if there is a discrepancy they bring this up to the editorial board
Lou: and they say hey, you know we are getting a lot of searches for so and so or this company and then the editors can decide if they want to have their beat reporters look into it, in a way it is almost a way of predicting the future.
Paul: So it is even more than informing the website, it is actually informing their editorial policy
Lou: Absolutely and of course those things are increasingly one and the same in many organisations
Paul: Yes absolutely
Lou: so when you look actually when you do this over time they see a very strong seasonal effect in many organisation’s cases so what you might find is that even at different times of day people are looking at different types of information and that can inform the way you do things like put information on your main page or in other parts of your site that are high traffic pages and by the way one of the nice things about doing that is it helps to start beating down decision makers who want the main page to be about them, you know it is a political thing and if you can bring data that shows what users really want to those type of discussions you have a much better case, likelihood of heading off political battles over prime real estate on your site.
Paul: yeah, absolutely. Do you think the trouble with analytics is that it can be you know read in so many different ways and do you think there are occasions where your analytics can be misleading in understanding a site’s user experience.
Paul: if so where do things often go wrong ? What should we be looking out for ?
Lou: Well one of the problems is that you know there is no one tool that should solve all of our problems, and I am the first to say that site search analytics is not the end or the be all it is one thing that should influence our decision making alongside a nice robust collection of research methods qualitative and quantative, behavioural and attitudinal that should make up our research toolkits. That said you know when you are interpreting data one of the real mistakes I think we make is we leave that interpretation to one person for a huge organisation and I am a big proponent of merging both the quantative data that is often in the hands of a single analyst with the type of user research we are doing in other areas. In other words put the data in the hands of users
Lou: So when I was describing to you for example a moment ago looking at top queries and sort of doing clustering and sorting with it and playing with it yourself what I actually really recommend is you have a bunch of users or subjects to do that.
Lou: So I am going to learn something about what my taxonomy out to be, or what types of meta data I should support or my content types might be I can do that myself but I would rather something along the lines of a modified card sort
Lou: with five or ten users, you know maybe I do not have to do that every month, if I do that every year or two. But when I put that in the hands of users and that is just a beautiful hybrid of the best of quantative and qualitative research.
Paul: hmmm, yes I like that a lot, kind of combining those approaches and yes that makes a lot of sense to me. I mean we talk very much so far about search analytics but obviously a lot of these principals you are talking about apply broader than that into kind of general analytics
Paul: but I am interested in what your thoughts are on you know even broader analytics tools or related analytic tools things like polls and surveys, I mean you see a lot of organisations have feedback widgets or they have, they do surveys and polls on your website and I am always a little unsure of the value of these things you know. On one had I can see
Lou: I am too
Paul: oh you are too? I am glad it is not just me, so what is your perspective on that kind of thing
Lou: Well in general I think all tools have there purposes and the real problem comes when we try and make one tool you know a hammer not only hammer nails put somehow put screws in and so forth
Lou: So I am really not a big fan of like relying on polls and surveys as a way to get a comprehensive view of users needs because of the self selection bias they introduce right off the bat
Paul: What you mean because there, you tend to get polarised results people are either really exceptionally happy and feel the need to tell you or more commonly they are just very pissed off
Paul: you know if they are in the middle ground they are not likely to bother filling in the survey or a poll
Lou: Right, so I would say well you know that is useful information take it with a grain of salt like you just described I would then want to have a few other methods. Each of these methods is a lens on reality but an incomplete lens it’s a, it’s the blind man with the elephant nobody has the full picture you need to have a few that said something like polling or asking for feedback can be done more intelligently if it is, you think about doing it in a real contextual way. So for example if someone does a search you have seen these widgets that say did you find what you are looking for
Lou: I think that is a better way of doing it, because you are not just asking did you like us
Paul: Yes, absolutely yes
Lou: it is not before you leave were you happy ? of course you are going to get a really polarised maybe too open ended kind of data from that but if it is focused and contextual then you can knowingly better data but then you can also ask for a little more. if someone says I did not receive what I… I did not find what I was looking for when I did a search that is really a good time to ask for more information like what did you want to find when you did this search and then you are actually sort of closing pretty important feedback loop no just between the user and the site but you can take that feedback and send it back to the appropriate content owners side of the organisation and draw them into the process. So that is one of the big problems is that we have great content but our content owners don’t seem to want to get beyond the fact that their content is part of a much larger collection of information and that’s why they don’t bother following labelling guidelines or titling guidelines or applying meta data well. So if we can start showing them that their content isn’t just their content but it is part of something bigger, it is part of a natural marketplace of information that makes up the site and start showing them that their information is not being found when it should be found and showing them some data that suggests that you know they are falling down on the job then we have a much better change of getting them to follow content authoring and labelling and tagging policies and procedures that we may have set up for them. So a lot of organisations try to force people to them and it never seems to work too well but if you can educate people by showing them some data that suggests that their content would for example succeed if they would do something differently like following policies then it actually works.
Paul: yes there is something very powerful about presenting users, not not users internal stakeholders with data to back up your arguments and what they need to be doing. There is one question I kind of want to end up with really it is something I am really interested to hear your perspective on someone who spends a lot of time with analytics which is almost really a kind of moral question that you do feel that we can gather a huge amount of analytical data on our users and there are even tools, I don’t know if you have come across click tales.
Lou: hmmm hmm
Paul: that can go further and record user action and see people using and moving around the site is there a kind of line here as to what is kind of acceptable to do and what is not. It is almost a moral question in a way and I was just interested in your perspective on how much can we pry into user’s behaviour.
Lou: Right, so that is a really good one Paul, and if fact in terms of search analytics there was a really interesting yet somewhat unpleasant case about three years ago that made the front page of the New York Times you may have come across there were some AoL researchers who had a whole bunch of search data from the AoL site web search engine and they released it for research purposes however that data had people’s user ids not there names but their user id and this is like hundred’s of thousands of users and millions of queries and what people started doing within a day was to grab the data from the database and start searching just for one id and based on looking at all the queries associated with that one user id they could figure out who that user was
Paul: shee whizz
Lou: as well as what they were searching and in fact the New York Times reporters tried doing it themselves and they identified a woman in Georgia here in the states and they called her up and said is this you and she said yes. I am sure it was a disquieting moment for her
Lou: and much more disquieting for the individuals that were doing things like searching on child pornography
Lou: so erm I think the issue is if you are going to do this work you have to be really careful to look at it as an exploration of collective behaviour, you have to be very careful to block opportunities for data to leak out or for someone else to get hold of data in way that will show individual behaviours and help identify who an individual is, and with or book what we are doing is looking at sets of data that tell us nothing about individuals and only looking at it collectively and that is how we think you should do it. Let’s worry about serving the majority of the people that are visiting our site the major audiences and not worry so much about individuals and let’s protect their privacy.
Paul: How does that relate mind when you start looking at e-commerce sites, you know that obviously use analytics heavily to recommend products and stuff. I am always a bit torn over that one because you know one one hand that obviously provides a real benefit to users and I quite like the fact that when I go to Amazon it will remind me the latest Battlestar Galactica DVD is out because it knows I like that but there is a line isn’t there where that analytics data steps over when it stops being useful to when it becomes creepy I am not really sure where that is – it is funny that isn’t it.
Lou: I think it has so much to do with how much you as a user or customer trusts that organisation.
Lou: So although they drive me nuts as a publisher Amazon is great to it’s customers and I think they manage to us that data in ways that delight us because they have some really smart and really careful designers and researchers there that are sensitive to these types of issues and they have got a great track record of customer service so we would feel a little differently maybe if we went to whatever large organisation that we are uncomfortable with at the moment, wether it is the government or I don’t know I can’t think of a good example but I think it comes down to your personal feelings about who is using that information. I would like to see in many respects more organisations doing just that I mean image your university experience if you could see the courses you were taking were taken by, what others were taking the courses you were taking you could learn a little bit more in a very disciplined way, that would be a delightful way to use that information yeah absolutely, but do you trust that institution – hopefully we trust the institution that is educating us in that particular example.
Paul: One would hope so [laughs]. Lou that has been absolutely brilliant it has been so fascinating to think through some of the power of the data that we are collecting and you know we are all collecting this data but I don’t think we are utilising it or know what to do with it so it has been absolutely fascinating to talk to you and thank you so much for taking the time to come on the show and hopefully we will get you back again soon.
Lou: Paul it’s a pleasure I really appreciate the opportunity thanks very much.
Paul: Thank you.
About Lou Rosenfeld
He has been instrumental in helping establish the field of information architecture, and in articulating the role and value of librarianship within the field.
Lou has helped such organizations as PayPal, AT&T, Caterpillar, Ford, Microsoft and the CDC make their information easier to find.
He is co-author of Information Architecture for the World Wide Web, considered the bible of the field, and has been a regular contributor to Web Review, Internet World, and CIO magazines.
Lou is co-founder of the Information Architecture Institute and helped found the Information Architecture Summit. He blogs regularly at www.louisrosenfeld.com, and tweets even more regularly @louisrosenfeld .