A Universal University Search – Development in Jeans and a T-Shirt

During my time as a student I was faced with a great many challenges which involved some form of searching. Lots of it was academic, relying on the Library, Google Scholar and (gasp, horror, revoke his grades) Wikipedia. However, a lot of it was about more mundane stuff. Where exactly is AR1101? Who is John Smith, and what’s his phone number? What’s for lunch today?

The problem was that to find this information, you already had to know where to find it. Maps of the University are available on the Portal… if you know where to look. Phone numbers can be looked up… if you know the address for the service. The weekly menu is on the Portal… if you know where to look.

We’re left with a simply astonishing number of things which people may want to know about, but which is locked away as an image of a screenshot embedded in a Word document stored 14 levels down in Portal behind a page which nobody has access to, unless you happen to have asked for it. Rooms, events, books, journals, the Repository, blogs, people, news posts, lecture notes, the weekly menu and more are all available somewhere within the depths of a system. So, in traditional Nick fashion, I spent a few minutes in the shower this morning working out how to fix it whilst being refreshed by some particularly minty shower gel.

The first step is getting the data in the first place. For some things this is easy – It took a few minutes for some of my colleagues to work out how to dump data from the Repository. Blogs won’t be far behind, since it exposes data as RSS feeds. Locations and events we’re building new data stores for which we can expose how we want, and stuff from the Library will be exposed through Jerome. Unfortunately getting data from Portal in any kind of useful format is about as easy as getting data from the bottom of the Mariana Trench. Which has been stored in the middle of a concrete box. Written on toilet paper. Backwards. Using a microscope and a toothpick. In Swahili.

Aside from Portal, we’ve got a good set of data to be getting on with. During our playing around with Sphinx for Jerome I discovered that it can support multiple distributed indexes, making it perfect for institution-wide searching. There is no requirement to build one super-index, we simply create a new search index for each individual service then combine them later. This even works across multiple instances of the search daemon, and even across multiple machines. For example, Jerome will be running its own search service which we can include in Universal Search. Should one individual search machine fail or experience problems it is only that part of the overall system which is effected, so if we hypothetically take Blogs offline for maintenance all the blog posts will silently disappear from search (with no fuss). Once it comes back they reappear as though nothing had happened.

Another bonus of Sphinx is that we can include various attributes within the index which are returned with results. This means we can include additional details alongside the result itself, for example a library book in the search results comes with title and author, whereas a lookup in the directory has a phone number and a room code will return a map. All these attributes can be mixed and matched so we can return completely useful results for everything, and then take a user off for more information if they so choose.

And what of the particularly minty shower gel which inspired this? The CWD is named after whisky, lncn.eu has biscuits, Jerome has locomotives and Universal Search is kicking off with v1.0α “Peppermint”. Coming soon to a university near you.