1. Chapter summary

Next

Research Methods: Home


Welcome to the webpage accompanying Chapter 8 "Using web-based data for the study of global English" by Marianne Hundt!


In her chapter Marianne Hundt introduces two basic approaches of how to use the World Wide Web for linguistic research:




Using the two Web-based linguistic approaches might help to find out how often a certain structure appears in a specific variety of English. Moreover, slightly exotic varieties of English like Bangladeshi or Pakistani English, which are not yet represented in standard corpora, can be studied carefully.

The Internet boasts several advantages over standard corpora. First, the Internet is much larger than a normal corpus: the Brown Corpus, a standard corpus of present-day American English, consists of 1,014,312 words (cf. Brown Corpus Manual), whereas the WWW offers an almost unlimited number of expressions which can be used for linguistic studies. Second, standard corpora are quickly out of date and therefore not useful for research on recent or ongoing change. The Internet has also created and is still creating new text types such as e-mails, chat-room-discussions or blogs, which can be studied. Third, as mentioned above, slightly more ‘exotic’ varieties of the English language can be investigated. But at present, the disadvantages seem to outweigh the advantages.

Strictly speaking, the Internet is not a corpus in the sense of “a large and principled collection of natural texts” (Biber et al. 1988: 4). It is unorganized and the search results are not reliable as two or more identical searches are likely to produce different results; the search is not repeatable. Commercial search engines influence the search in a non-linguistic way: they cannot access all webpages, are locally biased and build up profiles of users, which affects the search. Moreover, not all material to be found in a particular domain has been produced by speakers from that region, which influences the overall statistics. The biggest problem is the occurrence of duplicate documents: exactly the same PDF can be found on two or even more webpages, which means that the number of retrieved hits in a Google search is not usually very revealing when it comes to linguistic detail.

Created with the Personal Edition of HelpNDoc: Free Web Help generator