Figure 2: Google.co.uk advanced search option
This example search aims at finding out the frequencies of the constructions lest he be (lest + subjunctive), lest he is / was (lest + indicative) and lest he should be (modal periphrasis). Similar to the case study in Hundt's chapter, different constructions were entered in the advanced search field of Google.co.uk. Each combination was searched for within four different domains in order to obtain results for the following varieties: British English (.uk), American English (.us), New Zealand English (.nz) and Australian English (.au).
The respective top-level domains can be set at the bottom of the advanced search option page.
Domain |
“lest he be” |
“lest he is/was” |
“lest he should be” |
.uk (BrE) |
514.000 (286) |
2.990.000 (44) |
792.000 (103) |
.us (AmE) |
290.000 (109) |
10 (10) |
278.000 (25) |
.nz (NzE) |
34.000 (55) |
9 (9) |
70.500 (16) |
.au (AusE) |
229.000 (137) |
10 (10) |
247.000 (27) |
Table 1: Example search I with Google.co.uk (11.09.2013)
The table above shows the results retrieved by Google.co.uk and displays the frequencies of the single constructions in the four different varieties of English. The numbers in brackets show the most relevant hits according to Google. In this example, the numbers have been converted into percentages for illustrative purposes.
Domain |
“lest he be” |
“lest he is/was” |
“lest he should be” |
.uk (BrE) |
11.965 % |
69.6 % |
18.436 % |
.us (AmE) |
51.005 % |
0.002 % |
48.943 % |
.nz (NzE) |
32.533 % |
0.009 % |
67.458 % |
.au (AusE) |
48.108 % |
0.002 % |
51.89 % |
Table 2: Percentages of example search hits: total number of results
The calculated percentages give a better impression of the distribution of the search items in the different varieties of English. Table 2 clearly shows that British English seems to prefer the use of the indicative whereas American English prefers the use of the subjunctive. Nevertheless, these results are not representative as they contain many similar results. In order to provide a more representatvie picture the following table shows the percentages of “the most relevant results”, where the unimportant hits have been omitted by google's built-in duplicate content filter device which automatically kicks in after clicking the search button: “In order to show you the most relevant results, we have omitted some entries very similar to the 35 already displayed. If you like, you can repeat the search with the omitted results included ” (Google.co.uk). It, however, needs to be kept in mind that while doing so, the search results are already distorted, not least of the fact that we do not know how Google chooses its results (cf. Croft, Metzler & Strohmann 2009). As expected, the numbers in Table 3 differ markedly from the ones displayed in Table 2.
Domain |
“lest he be” |
“lest he is/was” |
“lest he should be” |
.uk (BrE) |
66.051 % |
10.162 % |
23.788 % |
.us (AmE) |
75.694 % |
6.994 % |
17.361 % |
.nz (NzE) |
68.75 % |
11.25 % |
20 % |
.au (AusE) |
78.736 % |
5.747 % |
15.517 % |
Table 3: Percentages of example search hits: with omitted results
In general, these results present a very homogeneous picture with each variety preferring the subjunctive over the modal periphrasis and the modal perphrasis over the indicative. Other than that, the variation within the respectice columns is too small for there to be a descernable pattern. Furthermore, these results present only a momentary snap-shot. Results retrieved with Google are not reliable, which makes it practically impossible to obtain the same results twice, and which should rule out Google as a tool suited for serious scientific research. Consider for instance the following table which illustrates the results retrieved from a second search conducted on 19 September 2013 which differ from the ones received on 11 September 2013 due to the fact that pages are constantly being added, updated or deleted and the indexing and search strategies may have been modified over time. Aside from the fact that Google search results lack reproducibility, it must also be kept in mind that the commercial search engine is obviously not tailored to the specific needs of linguists. With Google it is difficult to search for a particular linguistic structure and the level of accuracy in regards to the retrieved results (i.e. precision and recall) leaves much to be desired, due to automatic normalization-processes and duplicates etc. Also, Google provides only limited access to meta-data, which can for instance be important for any kind of diachronic study of a particular language feature.
Domain |
“lest he be” |
“lest he is/was” |
“lest he should be” |
.uk (BrE) |
493.000 (287) |
2.920.000 (39) |
889.000 (103) |
.us (AmE) |
279.000 (105) |
10 (10) |
271.000 (25) |
.nz (NzE) |
36.100 (54) |
10 (10) |
66.600 (14) |
.au (AusE) |
221.000 (131) |
11 (11) |
234.000 (27) |
Table 4: Example search II with Google.co.uk (19.09.2013); numbers in brackets: most relevant results
Two major problems of the Internet have been demonstrated thus far: the issue of duplicate documents and the unreliability of the search results. These obstacles make it necessary to manually post-edit the retrieved results in order to gain a detailed overview of pages which might have a distorting effect on the final results due to duplication or inaccessibility.
Created with the Personal Edition of HelpNDoc: iPhone web sites made easy