The Genome project

http://genome.ch.bbc.co.uk
http://genome.ch.bbc.co.uk

I like a good list. I like a well-constructed and clear database that is, when all is said and done, the optimal expression of an extensive list. I’ve produced a lot of lists in my time, personally and professionally, and I’ve had a hand in producing a number of databases that have aimed to help people find things in a form that is useful to them, and I’ve worked a lot with databases good, bad and middling. And so it is that I’m delighted to see the publication of the BBC Genome Project, a database built out of listings data for the Radio Times 1923-2009. It’s a great list and a great database.

Back in 2006 I put together a funding bid to digitise the Radio Times 1923-1991. It was a serious proposal, put together in consultation with the Radio Times, and the product of a lot of thought and calculation. It didn’t receive the funding we sought, and reading the document now I can see that if it had been put into practice it would have been a disaster. It asked to do too much in too short a space a time for too little money, and its proposed solution for getting over the third party rights issues – an optimistic licensing scheme – was a guaranteed failure. However the bid was turned down not for any of these reasons but because the would-be funder was uncertain of the educational value of a digitised Radio Times with database (yes, that’s what they actually thought) and because they couldn’t see why the BBC or the Radio Times couldn’t pay for it themselves. Which was a reasonable thought, of course.

So we wind forward through time to 2014, and a Radio Times database has become a reality, courtesy of the BBC. It’s not a digitised Radio Times, however. A wise decision was made not to attempt to go down that route, on account of all the complexities of ownership and clearances that would be required, not the least of which is that the BBC no longer owns the Radio Times – it was acquired by Immediate Media in 2011.

Instead what they have produced is a plain database derived from the listings information for BBC radio and television programmes that have been broadcast since 1923. So no articles, advertisements, illustrations, letters or Roger Woddis poems, but what you do get is the core information about each programme as it was planned to be when the weekly magazine went to press. Of course programmes sometimes change from what was advertised, through overrunning, last-minute cancellations and the like, and the BBC is asking for people to contribute corrections to the Genome database – corrections of fact, and corrections of text, since the database has been created through a process of Optical Character Recognition (i.e. scanned from the pages themselves and then converted into text). The crowd will take over where the machines leave off.

radiotimes

Each record supplies date, time of broadcast, title of programme, synopsis, credits and channel. The Radio Times has covered non-BBC programmes since 1991, but Genome is restricted to BBC programmes. There are plenty enough of those – currently the database boasts records of 4,423,653 programmes, taken from 13,212 issues or 350,622 scanned pages.

The searching is admirably clear, with advanced searching options by date, time and medium, and browsing by medium, year or issue. Search results allow you to refine by channel and to sort results by relevance or oldest/newest first. Fascinatingly, when the database was announced and millions started making use of it, the thing many chose to look up was what was being broadcast on the day of their birth. I don’t think the good folk at the BBC were expecting such an eventuality, and it does seem odd for people to seek out first programmes that they most definitely did not see. Just for the record, I can report that nothing was being broadcast at the time of my birth, because there weren’t early morning programmes on BBC television in those dim and distant days, but just as soon as broadcasting did start that day the first two programmes were two educational programmes on the history of cinema. So maybe there’s something in this birthday-searching lark after all.

Genome has been warmly welcomed and much used already. It follows on from an earlier BBC effort in the mid-2000s to make its in-house Infax database available online, free to all. It got taken down after a year or so because people complained about some personal information being released. The BBC is on safer grounds with Genome, because it is based on published information, though there has been some removing of sensitive information, not least people’s addresses or other contact details.

But among all the praise few have noted what is perhaps Genome’s most significant feature. The database provides a single web page, or URL, for every single programme listed. It’s not quite a record for each individual programme as produced, because repeats are given as separate records, but this is a huge step forward for the BBC in creating a definitive listing of its broadcast output with a unique address for each. It has such a system in place for current programmes, which can be found under /programmes on the BBC website. The aim of /programmes is “to ensure that every TV & Radio programme the BBC broadcasts has a permanent, findable web presence”, and the next stage in Genome development must surely be to make its records comply with /programmes to create, eventually, a single database encompassing all of the BBC’s output – the perfect list.

From such a list great things will come, since it can become the backbone for a web infrastructure that delivers (where possible and with rights and licenses permitting) the BBC’s broadcast archive, past and present. Genome isn’t just a great database – it’s laying the foundations for the BBC as an archive for the nation.

Links:

About

View all posts by

Leave a Reply

Your email address will not be published. Required fields are marked *