The following features some additional information on the British National Corpus (BNC) and the Freiburg-Lancaster-Oslo-Bergen Corpus of British English (F-LOB), which are used by Hoffmann and Smith & Seoane respectively.
BNC |
FLOB |
|
Language variety |
|
|
Size (words) |
|
|
Period |
|
|
Written/spoken proportion |
|
|
Linguistic annotation |
|
|
Manual check |
|
|
Corpus software |
|
|
*the Freiburg-Lancaster-Oslo-Bergen Corpus can be used with independent concordancers, such as Wordsmith
PART-OF-SPEECH TAGGING
Part-of-speech tagging (see Glossary for definitions of the main terms) is the most common kind of grammatical annotation, normally performed by automatic POS-taggers. The POS-tagging tools and tagsets used for the BNC and the FLOB are different.
BNC |
FLOB |
|
Tagset |
|
|
Number of POS-tags |
|
|
The different number of POS-tags already implies that the tagsets are not similar to one another, though the basic word class categories are normally marked in the same way. Note that both the BNC and the FLOB are annotated with CLAWS taggers. Compare the following POS-tags for general adverb and singular common noun:
BNC |
FLOB |
|
General adverb |
|
|
Singular common noun |
|
|
These peculiarities are especially important when carrying out a search in different corpora (one for training and one for testing the query), because additional conversion of queries consisting of part-of-speech tags is required.
The part-of-speech tagsets for these two corpora can be accessed at:
Created with the Personal Edition of HelpNDoc: Write eBooks for the Kindle