This is the text of a blog post on the archiving of news which I wrote recently for the British Library’s Newsroom blog. Wherever possible – or wherever it interests me – I’m reproducing texts here which have been written on other platforms.
The news that The Independent and The Independent on Sunday are to close down, and the spin-off newspaper i to be sold off to Johnston Press, has led to much discussion about the future of newspapers. Is the decision to cease print and develop the independent.co.uk website a sign of the death of print newspapers? Was The Independent squeezed out of the market and hence a failure, or is the shift to web-only a timely strategic move, supported by growing use of what is apparently a profitable website, which other newspapers will inevitably follow in turn? Can a newspaper brand survive when it no longer has a newspaper?
Here at the British Library we have seen many newspapers come and go. We take in 1,400 newspaper publications of one sort or another every week, but we have some 35,000 newspaper titles listed on our catalogue. A lot of newspapers have ceased publication over the past 400 years, and The Independent is merely the latest. But the gradual transference of a news industry from print to digital has major implications for what we collect in this area, and how. It is something that we are studying closely.
At present we acquire newspapers in print form. We would like to acquire more newspapers digitally, and newspapers today are produced in digital form (PDF usually) from which the print copy is then generated. But these digital copies are not published as such, consequently they do not qualify as ‘publications’ and cannot be collected under Legal Deposit. Of course some newspapers are available in digital facsimile form on their websites, which we could collect via our web archiving operation, but the PDFs that are available are of lower image quality (less suitable for preservation), and there is a shrinking number of these, as newspapers turn instead to aggregation services such as PageSuite to deliver their digital copies for them.
So why not just archive the websites? They have the same content as the newspapers, don’t they? Well, no they don’t. We recently conducted a study in regional newspapers and their web equivalents, to see how similar content was between the two forms. We found that the typical UK regional newspaper, in any one week, had roughly 40% of stories unique to print, 15% unique to online, and 45% that the forms shared (we looked only at news editorial, not advertising, arts coverage or sport). Some newspapers are closer to their web equivalents than others, but in general the two forms are not the same, and are diverging all the more.
Print issue date
No. of news stories
% unique to print
% unique to online
% shared by both
Analysis of stories published by selected UK regional weekly newspapers
Nevertheless, we are archiving the websites. Since 2013, when non-print Legal Deposit legislation was introduced, enabling the British Library and the other Legal Deposit libraries in the UK and Ireland to start archiving the UK Web, we have been gathering in UK news sites. And whereas we archive most UK sites once a year, for news sites we are archiving them on an either daily or weekly basis. It’s not just newspaper sites, however – we are capturing web-only news sites, community journalism sites (hyperlocals), news broadcasters’ sites, news parody sites and more. We have 1,800 on our list so far, and we are still adding titles. It’s not easy to keep track of all of the UK news sites, because there is no definitive list. The numbers keep shifting. Publications such as Benn’s Media Directory keep a track of registered news publications on an annual basis, and the handy Local Web List database tries to identify all of the self-produced community journalism sites out there, but, frankly, trying to keep the Web in bibliographic order is like herding ants.
All newspapers and other serial publications in the UK are assigned a unique identifier, the ISSN (similar to the ISBN for books). The ISSN underpins the collecting of newspapers under Legal Deposit in this country, not least because it is key to issuing barcodes. So it is that we manage to keep track with nearly every UK newspaper as it is published.
The problem is that websites don’t need barcodes. There is no operational ID system for pinpointing a news site, beyond knowing the URL, and that doesn’t help when it comes to keeping track of all different variants, spin-offs (blogs, microsites etc) and other changes that a news site and its digital family may go through. And how do you define what a news website is in any case, if you start to include blogs, hyperlocals, forums and so on?
Then there is the problem of referring to ‘news websites’, as though capturing these through archiving would be the answer. Newspapers were once the sole form in which printed journalism appeared. Today, newspapers have become apps. News is produced by publishers across diverse platforms which draw their material from content management systems. A newspaper is one such output; a website another; a mobile application another. To build the news archive of the future, we might need to think less of capturing print or digital publications, and more about preserving the engines that have generated them (and the digital content fed into such engines). Then we could generate how the news looked to anyone at a particular place and point in time, according to the applications and devices that would have been available to them.
And that highlights another problem for the future archiving of news. A newspaper is predicated on the understanding that all who purchase it will share in the same news. They might have differing opinions about that news, yet they still share in it.
But news is not like that anymore. News is tailored to the individual, who has increasing editorial control over what news matters to them, as our news content strategy notes. Twitter feeds, Facebook news aggregation, multiple TV news channels, all put the selection of news in the hands of the consumer (even if the algorithms of Facebook and Google do much of that tailoring for us). No one sees the same news any more. This is why the archiving of Twitter is such a conundrum. There is no one Twitter out there – everyone’s experience of it is different. How are we to recreate such an experience, to make our future news archives valid?
Theorising over the nature of news is all very interesting, but we have to make practical decisions to ensure that we archive newspapers, their digital derivatives and competitors in a multimedia news market, as thoroughly and efficiently as we can. We continue to take in most UK newspapers in print, and we are close to doing the same with news websites. We archive TV and radio news, albeit selectively. It would help a lot if we knew if and when newspapers are to disappear, but no one does.
Or maybe one person does. Speaking to the Leveson enquiry on 25 April 2012, Rupert Murdoch gave his thoughts on the future of print newspapers.
Every newspaper has had a very good run … It’s coming to an end as a result of these disruptive technologies … I think we will have both [internet and print news] for quite a while, certainly ten years, some people say five. I’d be more inclined to say 20, but 20 means very small circulations.
And that could be it. Newspapers will last a generation. It may not be an even decline, because what applies to the nationals is not necessarily the same as the regionals. The UK’s national newspapers, of which there are around 30, attract a particular kind of advertiser and have the potential to reach out internationally (which is what The Independent‘s owners are pinning their hopes upon). For regional newspapers, of which there are around 100 dailies and 1,200 weeklies, there seems to be a different model operating, one where local advertising combined with consumer loyalty may keep some titles in print for a good while yet, even as the same titles increase their digital foothold (the recent ranking of UK media publications for 2015 by SimilarWeb shows regional publisher like Kentonline, Birmingham Mail and Liverpool Echo among those titles with the highest percentage of mobile traffic share).
However long it takes, time is running out for print newspapers. It will be our job to ensure that we continue to serve the needs of UK research by increasingly gathering news in its digital forms. We need a smooth transition, with consistency of representation. The challenge is that the mechanisms to do so are not fully in place as yet. The technologies are; the means to ensure capture, identification and continuity are not. But we’re working on it.