CREATING A POS-TAG QUERY
Exercise 1: Create a POS-tag-based query for the split infinitive with one adverb between the infinitive marker to and the bare infinitive of a verb (to+adverb+infinitive).
- Find the relevant POS-tags in the BNC tagset:
Click The CLAWS 5 tagset in the left menu column on the main query page.
From the table of the POS-tags choose the appropriate tags for “infinitive marker to” (TO0), “general adverb” (AV0), “the infinitive form of lexical verbs” (VVI). Go back to the BNCweb main page.
- Click Simple Query Syntax help to the right of the query window to open instructions on making a query.
Find the answers to the following questions in the PDF- file:
- how to define the selected POS-tags
- how to enter a combination of POS-tags or words and POS-tags (the separate elements should be divided by a blank space; an underscore character ( _ ) should be added before a POS-tag).
- Go back to the Standard query page and try to type a query for the split infinitive in a query window: “_TO0 _AV0 _VVI”
- Before running your query, check the default query settings:
- in Query mode select Simple query (ignore case). This will include instances of the split infinitive at the beginning of a sentence
- in Number of hits per page you can either define how many hits per page will be displayed or select Count hits if you do not want to be provided with a list of the search term's individual instances and are solely interested in the overall number of retrieved hits.
- in Restriction choose None in order to search the whole corpus, without restricting the query to either the written or the spoken section
- Click Start query
- On the top bar of the query result page the statistical information on the query is presented: number of hits, dispersion in texts and normalized frequency (per million words)
Your query "_TO0 _AV0 _VVI" returned 4097 hits in 1577 different texts (98,313,429 words [4,048 texts]; frequency: 41.67 instances per million words)
Exercise 2: Think of other modifiers that can appear between “to” and the infinitive. Create queries to check if your assumptions are relevant.
You may check the following options: split infinitive with ordinal numeral (ORD), wh-adverb (AVQ), negative particle not.
Exercise 3: Make a query for the split infinitive with the option of either an adverb or the negative particle not in between the infinitive marker and the bare infinitive:
- Learn how to enter optional query elements by using Single Query Syntax help (elements must be included in round brackets and separated by a vertical line without blank spaces in between them)
- Check if there is a POS-tag for “not” (XX0)
- First, enter your query with not as a word “_TO0 (_AV0│not) _VVI”;
- Next, enter the query as a combination of POS-tags: “_TO0 (_AV0│_XX0) _VVI”
- Run both queries and check if the same number of hits is retrieved.
INVESTIGATING POS-TAGGING ERRORS
Exercise 4: Make a POS-tag-query for “to more than double”.
- Using the descriptions of POS-tags under The CLAWS 5 tagset, try to define POS-tags for all the elements of the phrase. For example, how can more be tagged? As a general adverb (AV0) or as another part of speech?
- Consulting Simple Query Syntax help, create a query as a combination of words with assigned POS-tags (in order to search for a word as a certain part of speech, it should be presented as “word_POS-tag” without any gap).
- Make a query as follows: to as an infinitive particle followed by more as a general adverb, followed by than as a subordinate conjunction followed by double as the infinitive of a lexical verb. Your query should look like this: “to_TO0 more_AV0 than_CJS double_VVI”.
- Run the query. Does it retrieve the construction needed?
- In order to check how a word is tagged in the corpus, insert it as a simple query (e.g., than) and check its respective POS-tag in the list of received concordances on the query result page by pointing at it. (than is tagged as a subordinating conjunction (CJS)). You can do the same for the other words of the target expression to check if there are some alternative POS-tags assigned (e.g., more can be tagged as a determiner (DT0))
- Using the information on possible POS-tags assigned to the target words, try creating a query entering alternative POS-tags “adverb” (AV0) or “determiner” (DT0) for more.
- Run the query – does it retrieve the structure needed?
- Think of any other ways to create a POS-tag-query in order to find “to more than double”. Check your assumptions. If you are not successful, enter to more than double as a simple phrase query and check how all the elements are tagged: “to_PRP more_DT0 than_CJS double_AV0”
- Can you find any reasons for such tagging?
RESTRICTING THE QUERY TO SELECTED PORTIONS OF THE BNC
In order to compare the frequency and distribution of the split infinitive in the spoken and written parts of the corpus or only in the written part, you can choose either the simplified Restriction option below the standard query window or the advanced Query options on the standard query page.
Exercise 5: Compare the frequency of the split infinitive in written and spoken parts of the corpus by means of separate queries
- Enter your query in the search box on the Standard query page (as in Exercise 1)
- Select Written texts from the Restriction option below the query window and click Start query
Your query "_TO0 _AV0 _VVI" in written texts returned 2981 hits in 1118 different texts (87,903,571 words [3,140 texts]; frequency: 33.91 instances per million words)
- Repeat the same query with a restriction to spoken texts.
Your query "_TO0 _AV0 _VVI" in spoken texts returned 1064 hits in 448 different texts (10,409,858 words [908 texts]; frequency: 102.21 instances per million words)
- Compare the normalized frequencies (per million words) of the split infinitive in both sections.
Exercise 6: Variant 1: Compare the frequency of the split infinitive in academic writing and in spoken demographic sections.
Run the query for the split infinitive in the academic writing section:
- On the Standard query page, select Written restrictions from the Query options in the column menu on the left and go to Restricted Range of Written Texts.
- Choose Academic prose from the corpus' written part's subsection Derived text type.
- Enter your query for the split infinitive
- Run the query
Your query "_TO0 _AV0 _VVI" restricted to "Derived text type: Academic prose" returned 416 hits in 153 different texts (15,778,028 words [497 texts]; frequency: 26.37 instances per million words)
- Go back to the Standard query page. Select Demographically sampled from the section General Resrictions for Spoken Texts and run the query again.
Your query "_TO0 _AV0 _VVI" restricted to "Text type: Demographically sampled" returned 193 hits in 79 different texts (4,233,962 words [153 texts]; frequency: 45.58 instances per million words)
- Compare the frequency of the split infinitive in both sections.
Exercise 6: Variant 2: Compare the frequency of the split infinitive in academic writing and spoken demographic sections
- Run the query for split infinitives without any restrictions
- On the query results page, select the option Distribution in the upper right drop-down menu and click Go. This will produce a chart with frequencies according to sections, registers and other categories, as well as normalized frequency and dispersion (available in the basic statistics option). This tool offers a cross-tab option for easy comparison and statistical analysis of all possible categories.
Created with the Personal Edition of HelpNDoc: Free iPhone documentation generator