The Corpus of Nineteenth-Century Newspaper English (CNNE)

The English used in nineteenth-century newspapers is of considerable interest to linguists for at least two reasons. First, studies of late twentieth-century English have shown that newspaper language is comparatively responsive to language change of the type known as colloquialization, that is, a tendency for some written genres to become more similar to informal speech in their linguistic make-up. This raises the question of whether such tendencies can also be identified in nineteenth-century newspaper English. Secondly, the newspaper became an increasingly central written genre in nineteenth-century Britain. Owing to factors such as advances in printing technology, the repeal of taxes, and increases in literacy, newspapers were bought and read by a far higher proportion of the British population in 1900 than had been the case 100 years previously. Consequently, nineteenth-century newspaper English is of central importance not only in order to understand how twentieth-century English developed, but also in order to describe the English of the 1800s in itself.

The Corpus of Nineteenth-century Newspaper English (CNNE) will enable scholars to study the language of English newspapers from the nineteenth century; in addition, the division of the corpus into two periods makes it possible to trace language change across the 1800s. The period division reflects a number of extralinguistic changes in the middle of the nineteenth century that had important consequences for the newspaper business, such as the repeal of the so-called Taxes on Knowledge (the stamp duties on newspapers and the customs and excise duties on paper) in 1855 and 1861 and the formation of the Press Association in 1868. As CNNE contains texts from the decades before as well as after these changes, it is possible to use the corpus to study the potential effect of the changes on newspaper language.

CNNE is being compiled by Erik Smitterberg as part of his project “Colloquialization in Late Modern English”, funded by the Royal Swedish Academy of Letters, History and Antiquities, with financial support from the Knut and Alice Wallenberg Foundation. Newspaper texts for the corpus are selected from the online database 19th Century British Library Newspapers; the PDF files selected are converted to machine-readable text with the aid of OCR software complemented by manual proof-reading. An interim aim of the compilation process is to produce a pilot version of CNNE that can be used for the project during 2011. The corpus will then be enlarged further at a later stage.

Page 5 from Lloyd’s Illustrated London Newspaper, 27 November 1842.