As a publishing house committed to innovation and improvement in language technology, we have special terms for academic use. If you would like to license Collins material for academic research within a University research group, please contact us, giving full details of your affiliation, what the data will be used for, the nature of the research and how many people will require access to it. An annual fee will apply and you will be required to sign an agreement regarding usage.
We work very closely with language professionals and recognise the value of corpora to the language technology community. This knowledge has led us to develop a unique resource: At the heart of the Collins business is our 4.5-billion word corpus of the English language, the first to be pioneered in the 1980s and now the biggest of its kind in the world. The Collins corpus keeps track of new words, language trends and usage by constantly monitoring newspapers, online articles, everyday speech, reports and presentations. As part of News Corporation, Collins has unique access to all international media content ensuring that our content represents real-time, global English. The Collins Corpus for Linguistic Research provides objective evidence about the English which most people read, write, speak and hear every day of their lives. This ground-breaking initiative opens up exciting new opportunities in language analysis for commercial language research.
All our datasets have been linguistically annotated and part-of-speech-tagged by our in-house lexicographers. This allows powerful grammatical analyses to be undertaken over more than 4.5 billion words of modern English.
Dynamic and evolving
It's growing. Every year we add more components as we recognise the need to reflect the evolving nature of English, as opposed to static corpora, such as the BNC, which rapidly date and reflect a snapshot of English from several years ago. And it all comes in the same stable format which you can use to update your collection.
A Customizable Corpus
We appreciate that researchers have particular needs and we aim to serve each client by tailoring the corpus to their own needs, so if you require a particular mix of sources or dates we will try to accommodate you.
News categorization by subject field
Text-typing of news stories by subject field (such as sport, stock market, politics etc) allows much greater refinement in linguistic analysis.
All accented characters, source tags and newspaper metadata tags have been standardized throughout the sources.
Foreign Language Corpora
Collins also holds corpora for the other major European languages.
If you are interested in finding out more, please contact us to discuss your requirements.