Data-driven. Iterative. Awesome.

If you’re a member of staff at the University you will soon be hearing loads more about the Directory, the planned replacement for the University’s phone search system and staff profiles.

Whilst the Directory itself is rather cool, how it’s been built is of somewhat more interest. First of all, it’s driven entirely by data from other sources. The Directory itself doesn’t store any data at all, save for a search index. This means that unlike the old staff profiles on the corporate website it helps to expose bad data where it exists — since we soft-launched the Directory we’ve been barraged by requests from people to ‘fix their profile’, when in fact the thing that needs ‘fixing’ often lies at a far higher level. In some cases it’s literally been a case of people having misspelt job titles in the University’s HR system for years, data which is now corrected. This whole cycle of exposing bad data and not attempting to automatically or manually patch it up at the Directory end helps lead the University to have better data as a whole, making lives easier and people happier.

Secondly, the Directory is a perfect example of why iterative development rocks. The very first version of the Directory arrived over a year ago, and since then has been improved to include semantic markup, a new look, faster searching, staff profiles, more data sources, open data output formats and more. Over the last couple of weeks as it’s started to be integrated with the corporate website it’s been subject to even more refining, fixing formatting, typos, incorrect data and more. These changes happen quickly – a new version is released with minor changes almost daily – and are driven almost exclusively by real users getting in touch and telling us what they think needs doing.

The upshot of doing things this way, harnessing data that already exists and letting people feed back as quickly as possible, leads to products and services which reach a usable state far faster, are a closer match to user requirements, and help to improve other systems which are connected or exist in the same data ecosystem.

Told you it was awesome.

The Toolchain: First Pass

Today I’ve been kicking around the ICT office with Alex, figuring out how to make Jenkins (our wonderful CI server) build and publish the latest version of the CWD with all the bells and whistles like compilation of CSS using LESS, minification, validation of code and so-on. As part of this we managed to fix a couple of bits and pieces which had been bugging me for a while, namely the fact that GitHub commit notifications weren’t working properly (fixed by changing the repository URI in the configuration) and the fact that Campfire integration wasn’t working (fixed by hitting it repeatedly with a hammer).

This brought me to thinking about how our various things tie in together, so I set about charting a few of them up. After a while I realised the chart had basically expanded into a complete flowchart of the various tools and processes that hang together to keep the code flowing in a steady stream from my brain – via my fingers – into an actual deployment on the development server. Since it may be of interest to some of you, here’s a pretty picture:

This is (approximately) the toolchain I currently use for Orbital, including rough details of what is being passed around

The beauty of this is that the vast majority of the lines happen completely by themselves — I get to spend my days living in the small bubble of my local development server and dipping in and out of Pivotal Tracker to update stories. The rest is magically happening as I work, and the constant feedback through all our monitoring and planning systems (take a look at SplendidBacon for an epic high-level overview) means that the rest of the project team and any project clients can see what’s going on at any time.

Directory Data

If you haven’t guessed it already, we love data in open formats. Good quality, easily accessible data makes our lives easier, and causes children across the nation to beam with joy at the idea that they won’t have to copy a table from a Word document buried in a Zip file attached to an email.

In a continued drive to making all our data 5-star quality, I’m pleased to announce that we’ve made a few improvements to our Staff Directory beta. In addition to getting hold of people’s profiles in HTML using your browser (for example, see mine) you can now request them in three other delicious formats: JSON, RDF/XML and vCard.

The first two, JSON and RDF/XML will make the developers amongst you über happy. You can request them either by slinging appropriate http-accept headers to the usual URI for a person (http://lncn.eu/me/{account_name} is the canonical one), where application/json or application/rdf+xml will get you what you desire. Alternatively, you can hit up http://lncn.eu/me/{account_name}.json or http://lncn.eu/me/{account_name}.xml for the same thing.

The vCard format is of more interest to all, and provides a stupidly easy way to get a person’s details into your address book. By visiting http://lncn.eu/me/{account_name}.vcf (or by clicking the link at the bottom of any Directory profile) you’ll be given that person’s vCard, presenting their name, job title and contact details all in the industry standard machine readable format. It’s literally a matter of one or two clicks (or taps) to get information from the Directory into your computer’s (or phone’s) address book. If you want, you can download mine to see what I mean.

Continue reading “Directory Data”

Unlocking Gateway

My most recent project (following on from Jerome, slotting in around the rest of the Summer’s “oh God, students are coming back, fix everything” mayhem has been to look at Gateway, more specifically to take Gateway and give it some extra awesome based around exploratory work we did with MPath. Here’s a quick breakdown of what you can expect when it’s turned on.

Very pretty. Very fast.

We’ve moved from a CWD 2.3 based design to our brand new CWD 3.0. This gives us a huge number of improvements in just about every area; layout, typography, readability, accessibility, compatibility, mobile readiness, new JavaScript frameworks, massively improved speed optimisations and more. We’ve compressed files and shaved off unnecessary bytes from almost the entire framework, making it load astonishingly quickly even over mobiles.

CWD 3.0 is also served over a blazingly fast content delivery network. Specifically we’re using Rackspace Cloud Files, who pipe their content to end users over the Akamai network. Put simply, this means that content such as the styling and images is delivered to your browser from a point much closer in internet space, regardless of where you are. If you want to access the Gateway from Bhutan then instead of serving all the content from a box in Lincoln some of it will come from whichever one of Akamai’s 84,000 servers happens to be closest. The result is a blisteringly fast experience, and since a lot of the Akamai servers are hooked straight into providers’ networks then it’ll still be quick even on mobile devices.

As mobile as your mobile.

We’ve made sure that Gateway optimises itself on the fly for most modern mobiles, and since it uses the CWD for its underlying design it’ll instantly take advantage of future improvements as we deploy them. We’ve sat down with our desks full of phones and tablets and tested things to make sure they’re easily read and are simple enough to use with just one finger.

Smarter.

Gateway now runs on a brand new system, meaning we can give it some extra smarts. If you visit Gateway and we’ve noticed there’s a problem with Blackboard then you’ll be told about it, meaning less clicking a link and waiting whilst it doesn’t load. It can tell you the local weather forecast, show you which trains are running late and even give you notices specific to your location, all in one place.

It’s all about you.

Sign in to the Gateway or use any of the services using Single Sign-In and it’ll gather all kinds of information you might find useful and display it for you. Your next lecture, assessment deadlines, how many library books you’ve got out and more is right at your fingertips.

Rock solid.

Gateway has moved from one very resilient platform to another even more resilient platform. Located off-campus on a world-class hosting platform Gateway can survive snow, flood and even builders cutting through power lines to provide you with updates even if everything else is going wrong.

The Re-Architecting of Jerome

Over the past few days I’ve been doing some serious brain work about Jerome and how we best build our API layer to make it simultaneously awesomely cool and insanely fast whilst maintaining flexibility and clarity. Here’s the outcome.

To start with, we’re merging a wide variety of individual tables ((Strictly speaking Mongo calls them Collections, but I’ll stick with tables for clarity)) – one for each type of resource offered – into a single table which handles multiple resource types. We’ve opted to use all the fields in the RIS format as our ‘basic information’ fields, although obviously each individual resource type can extend this with their own data if necessary. This has a few benefits; first of all we can interface with our data easier than before without needing to write type-specific code which translates things back to our standardised search set. As a byproduct of this we can optimise our search algorithms even further, making it far more accurate and following generally accepted algorithms for this sort of thing. Of course, you’ll still be able to fine-tune how we search in the Mixing Deck.

To make this even easier to interface with from an admin side, we’ll be strapping some APIs (hooray!) on to this which support the addition, modification and removal of resources programmatically. What this means is that potentially anybody who has a resource collection they want to expose through Jerome can do, they just need to make sure their collection is registered to prevent people flooding it with nonsense that isn’t ‘approved’ as a resource. Things like the DIVERSE research project can now not only pull Jerome resource data into their interface, but also push into our discovery tool and harness Jerome’s recommendation tools. Which brings me neatly on to the next point.

Recommendation is something we want to get absolutely right in Jerome. The amount of information out there is simply staggering. Jerome already handles nearly 300,000 individual items and we want to expand that to way more by using data from more sources such as journal table of contents. Finding what you’re actually after in this can be like the proverbial needle in a haystack, and straight search can only find so much. To explore a subject further we need some form of recommendation and ‘similar item engine. What we’re using is an approach with a variety of angles.

At a basic level Jerome runs term extraction on any available textual content to gather a set of terms which describe the content, very similar to what you’ll know as tags. These are generated automatically from titles, synopses, abstracts and any available full text. We can then use the intersection of terms across multiple works to find and rank similar items based on how many of these terms are shared. This gives us a very simple “items like this” set of results for any item, with the advantage that it’ll work across all our collections. In other words, we can find useful journal articles based on a book, or suggest a paper in the repository which is on a similar subject to an article you’re looking for.

We then also have a second layer very similar to Amazon’s “people who bought this also bought…”, where we look over the history of users who used a specific resource to find common resources. These are then added to the mix and the rankings are tweaked accordingly, providing a human twist to the similar items by suppressing results which initially seem similar but which in actuality don’t have much in common at a content level, and pushing results which are related but which don’t have enough terms extracted for Jerome to infer this (for example books which only have a title and for which we can’t get a summary) up to where a user will find them easier.

Third of all in recommendation there’s the “people on your course also used” element, which is an attempt to make a third pass at fine-tuning the recommendation using data we have available on which course you’re studying or which department you’re in. This is very similar to the “used this also used” recommendation, but operating at a higher level. We analyse the borrowing patterns of an entire department or course to extract both titles and semantic terms which prove popular, and then boost these titles and terms in any recommendation results set. By only using this as a ‘booster’ in most cases it prevents recommendation sets from being populated with every book ever borrowed whilst at the same time providing a more relevant response.

So, that’s how we recommend items. APIs for this will abound, allowing external resource providers to register ‘uses’ of a resource with us for purposes of recommendation. We’re not done yet though, recommendation has another use!

As we have historical usage data for both individuals and courses, we can throw this into the mix for searching by using semantic terms to actively move results up or down (but never remove them) based on the tags which both the current user and similar users have actually found useful in the past. This means that (as an example) a computing student searching for the author name “J Bloggs” would have “Software Design by Joe Bloggs” boosted above “18th Century Needlework by Jessie Bloggs”, despite there being nothing else in the search term to make this distinction. As a final bit of epic coolness, Jerome will sport a “Recommended for You” section where we use all the recommendation systems at our disposal to find items which other similar users have found useful, as well as which share themes with items borrowed by the individual user.

Student as Producer, meet Nucleus

I’ve not done a theoretical, academic(ish) blog post for a while, choosing instead to focus on the more technical sides of what I’m doing. However, that doesn’t mean that what we’ve been doing is driven purely by the technology.

What I’m talking about in this blog post is our Nucleus platform – a collection of data stores, APIs and authentication mechanisms which, when put together, allows anybody within the University to interact with data in exciting new ways. Of particular interest is how Nucleus meshes with Student as Producer, our new institution-wide pedagogy. Put simply, Student as Producer is all about empowering students to become part of the production and provision of teaching and learning, rather than just consumers. Students are involved in research, course creation and much more on the academic side. It’s already seen some awesome results, and it’s becoming a massive part of how Lincoln does business.

So, how does Nucleus fit in? The answer lies in the potential to unlock the University’s inner workings for Students to mash up as they like. At the moment if the University doesn’t offer a service, students can’t do anything about it. Want a way to automatically renew books if nobody else has requested them? Nah, can’t do that. Want to mash up room availability with your classmates timetables to find a perfect study session and a room to put it in? Tough.

Understandably, as a former student, this isn’t good enough. So part of our Nucleus platform is trying to open as much of this data and functionality as we can up to anybody who wants to have a go. Obviously it’s still held within an appropriate security framework, but we believe that if a student can come up with a better (or different) way of doing something, they should be encouraged every step of the way.

We’ve got some really exciting stuff coming down the pipeline to help us offer support and resources to students (and staff) who want to explore the possibilities. Stay tuned!

So, what’s going on?

Good question. It’s been a while since I’ve blogged, so here’s a really quick overview of what I’m currently working on, pretending to work on, worked on but haven’t done anything with, or planning to work on.

  1. Linking You, our current JISC project on institutional identifiers. Finishing up next week, and currently causing Alex and myself epic amounts of beating our heads against the desks.
  2. Jerome, our other JISC project on making libraries slightly more awesome.
  3. Zendesk Phase 2, including bits and pieces of integration work to make it smoothly flow through everything else we’re doing.
  4. Nucleus (and assorted fluff), our epic store of everything, being brushed up, pinned down and fully documented.
  5. Authentication being made even cooler, and more reliable, along with support for more stuff like SAML.
  6. GAME, our application management environment, being made more awesome.
  7. Room Bookings will be coming over the summer, allowing people to find and book rooms faster than ever before.
  8. Lots of QR Code goodness all over the place, including on room labels (this hooks up to room bookings for added goodness).
  9. Possibly a bit of hardware hacking in the Library with RFID stuff.
  10. CWD updates to version 3. Faster, lighter, more accessible and generally good.
  11. Total ReCal rollout to replace our legacy Timetable system (we hope).
  12. Replacing the legacy phone book with the new one (we hope).
  13. Data, data, data.
  14. A bit of mucking around with telephony, just for kicks.
  15. Taking another look at our Student Communications project to try and address a few annoyances.

Now Linking You to more places

It’s with great joy that I announce a fix for one of the more niggling little bugs in Linking You, a glitch caused by us failing to correctly encode our own strings when we pass them to be minified. What this meant, put simply, was that we’d ignore anything in a URI after the first ampersand or hash symbol.

This is now no longer the case, and all your URIs which are minified in future will work exactly as expected. A friendly reminder to anybody using the API – please make sure you’re correctly encoding URIs when you pass them as a parameter or things won’t work properly.

On a related heads up for API users (including all end users who are using lncn.eu/api to minify addresses in things like Twitter clients), we will soon be requiring user-specific API keys for these to function. More details next week!

There’s not an app for that.

Recently there’s been a lot of noise made about mobile applications for universities and colleges. Apparently what students want to see is a dedicated app for their institution, providing them with bits and pieces of information on just about everything. There are plenty of examples, a quick search of the iTunes App Store reveals several universities which are keen for you to download their slice of application goodness. Entire products have sprung up to address this market, and some places have even gone all out and written their own.

All this is good. After all, who wouldn’t want to be able to check things like their timetable, their library fees and the state of the university’s IT services from their phone? What could be cooler than tapping a button and being told where your nearest free PC or copy of a book is? We like the concept so we’re having a look at mobile stuff, especially given that according to our analytics an appreciable fraction of our users are now trying to access services from their mobile.

However, we’re not entirely convinced about the route of apps. Sure they let you hook straight into things like geolocation and local storage, but with HTML5 so can a website. Apps also need to be made for the whole range of devices out there. iOS and Android are the big players, but you’re then cutting out Blackberry, Windows Phone 7, WebOS and Symbian devices. Apps also have an approval process to go through, or if not then they have a slightly complex installation route. There’s also a requirement either to pay someone a lot of money to make an app, or to spend a lot of money on in-house development.

All this means that we’re steering away from apps as much as possible, but we still want to make sure we provide kick-ass mobile services. “How?” I hear you cry. The answer is amazingly simple – we’re going back to mobile-optimised websites.

Continue reading “There’s not an app for that.”

How Staff Directory Search Works

I’ve had a couple of people ask how my lunchtime project today actually works behind the scenes, so here’s the lowdown in easily-digestible speak. I should point out that I am relying heavily on two frameworks which we’ve already built at Lincoln. These are Nucleus – our heavy-lifting data platform – and the Common Web Design – our web design and application framework. These two gave me a massive head-start by already doing all of the hard work such as extracting data from our directory and making the whole thing look great. Now, on with technology.

Continue reading “How Staff Directory Search Works”