First, the user must select a search engine through which WebCorp performs the search. For this purpose, the program offers a choice of six different crawlers, of which Google, as the biggest among them, is the default setting.
The next choice is between two case options. The default setting Case Sensitive considers capitalization within the query, where Case Insensitive ignores it.
The user can also choose between three different output formats: HTML Tables (KWIC), HTML and Plain Text (KWIC). KWIC stands for Key Words In Context and is the standard output format for corpora.
The default setting is HTML, which is illustrated in Figure 7. The two KWIC formats are presented in Figures 8 and 9.
The Web addresses option allows the user to decide whether or not (s)he wants to see the Web addresses of all the pages accessed while processing the query. That way (s)he can access the pages without concordance lines personally in order to assure himself or herself of their uselessness or usefulness for the search at hand. The default is set on only showing the URLs for the retrieved concordance lines.
WebCorp also provides a setting for the concordance span. This determines how many words are shown to the left and the right of the search term. The user can either choose up to 20 words or set the span to a maximum of 200 words.
The number-of-pages-to-retrieve option, as the name implies, limits the number of pages retrieved after each search. The lower the setting, the faster the results are produced. The default setting is 100 pages per search.
Like the big commercial crawlers, WebCorp allows the user to deliberately limit his or her search to a specific site domain and country. For this purpose, the program offers a selection of the most popular domains and countries, which relieves the user of the necessity of knowing the exact abbreviations.
Apart from specifying the site domain, it is also possible to specify a textual domain, provided Open Directory is used as search engine. The Open Directory Project, as the name implies, is an online open content directory, which means that essentially anyone can copy or modify any information listed within this directory. It assigns webpages to different categories, which provides the basis for WebCorp's textual domain feature. The different categories can be seen in Figure 14.
The last advanced search option in WebCorp is the word filter. It allows the user to specify a general search term by naming a certain context for it. If one searches, for example, for the term plant but is not interested in anything biological, the filter can be used to exclude pages featuring the word plant in a botany-related context and include pages that present the search term accompanied, for instance, by the word nuclear. In this case, one simply has to type, for example, “flower-nuclear” in the word filter box.
Created with the Personal Edition of HelpNDoc: Easily create HTML Help documents