2.2 Comparative overview of the corpora quoted

Parent Previous Next


The following features some additional information on the British National Corpus (BNC) and the Freiburg-Lancaster-Oslo-Bergen Corpus of British English (F-LOB), which are used by Hoffmann and Smith & Seoane respectively.



BNC

FLOB

Language variety

  • British English
  • British English

Size (words)

  • 100 million
  • 1 million

Period

  • End of compilation: 1991-1995
  • End of data collection: 1993
  • End of compilation: 1991-1996
  • End of data collection: 1991

Written/spoken proportion

  • written (90%),
  • spoken (10%)
  • written (100%)

Linguistic annotation

  • POS-tagged
  • POS-tagged

Manual check

  • no
  • checked manually

Corpus software

  • XAIRA
  • BNCweb
  • CQP
  • Wordsmith*

*the Freiburg-Lancaster-Oslo-Bergen Corpus can be used with independent concordancers, such as Wordsmith



PART-OF-SPEECH TAGGING

Part-of-speech tagging (see Glossary for definitions of the main terms) is the most common kind of grammatical annotation, normally performed by automatic POS-taggers. The POS-tagging tools and tagsets used for the BNC and the FLOB are different.



BNC

FLOB

Tagset

  • CLAWS 5
  • CLAWS 8

Number of POS-tags

  • 61 POS-tags and 30 ambiguity tags
  • 141 tags


The different number of POS-tags already implies that the tagsets are not similar to one another, though the basic word class categories are normally marked in the same way. Note that both the BNC and the FLOB are annotated with CLAWS taggers. Compare the following POS-tags for general adverb and singular common noun:



BNC

FLOB

General adverb

  • AV0
  • RR

Singular common noun

  • NN1
  • NN1


These peculiarities are especially important when carrying out a search in different corpora (one for training and one for testing the query), because additional conversion of queries consisting of part-of-speech tags is required.


The part-of-speech tagsets for these two corpora can be accessed at:


Created with the Personal Edition of HelpNDoc: Write eBooks for the Kindle