KEYWORD ANALYSIS OF FAMOUS CHURCHILL SPEECHES
contents
Keyword Analysis of Famous Churchill Speeches. 3
1.3.2 Observations: Keyword Analysis Issues. 9
1.3.3 Frequency and Association Strength. 11
Keyword Analysis of Famous Churchill Speeches
1.1 Introduction
Throughout the ages, starting with the WWII era, there have arisen in our midst some guiding lights. They are called the man who stood firm for a nation and a world in guiding them both through the ravages of Nazi invasion in the 1940s. Sir Winston Churchill has been called one who stamped the world with victorious history, and it appears that he did it all only through words, powerful words (Sinclair 1991, p. 71).
Indeed it seems irreverent to disagree when considering the starkness of the daunting obstacles facing the nation of Britain at the point when Neville Chamberlain stood as its Prime Minister (PM). When Winston Churchill would soon be duly inserted into the powerful post which he would use to inspire not just a nation in need, the one to which he admiringly called “our island,” but also one world, the one which was about to be lost into an ‘abyss’ (Churchill 1940).
So, the aim of this study is, therefore, to ask not ‘if’ this masterful orator inspired us, but rather ‘why’ and ‘how’: how did Churchill’s words inspire us, the nation of Britain and the world, to unprecedented action? The aim is to do a keyword analysis to investigate these questions. Although there are many options, including the British National Corpus (BNC), this analysis incorporates AntConc designed by Anthony Laurence (http://ota.ahds.ac.uk/documents/creating/dlc/index.htm). It has been challenging in learning yet very well worth it thanks to the quality of results it generates through its extensive million words English corpus.
Based on my keyword analysis from this free open software tool, I have come to a rather satisfying plausible conclusion: Churchill’s war speeches are masterfully crafted with particular words that are powerfully combined to encode messages that move us. Recurring patterns of mainly single words whose imported meaning seems to be an unmistakable sense of strength and solidarity.
Section one will carry out the first of two necessary steps for the Churchill texts (speeches): the keyword list will be generated. Comparisons will attempt to be made, using some other speakers, including other British PMs of the WWII and Cold War eras, the American President Theodore Roosevelt. Lord Byron of Harrow Academy, to identify the similarities/differences and to determine if the Churchill war speeches lacked or possessed aspects that were comparable to those bearing his position as a world leader facing either presence of potential war. Section two will carry out the second step, a concordance analysis, using some tool options in Antconc; chapter three will begin comparisons, offering illustrations through captured images of the results; parts of speech, namely nouns with some verbs. Additionally, the Meta-data of speakers is compared to compare the node (Churchill) with the reference corpora: Metadata comparison might explain why Churchill’s speeches are more or less like the points of contrast between education, social class, nationality, etc Don't use plagiarised sources.Get your custom essay just from $11/page
1.2 Keyword Analysis
1.2.1 Background: The aim of a keyword analysis in Corpus Linguistics is to identify the words disproportionately used in corpus texts simply because these words might be meaningful. However, frequency alone is not enough, and a concordance analysis, therefore, usually follows affirms this module’s instructor, Anthony Lawrence (unit 7). He says that the examination of the overlap proportion between any two rankings is derived using the two metrics. This is done when examining all the keywords and the top one hundred keywords. The extent of the overlap indicates how similar or different metrics are. High overlap suggests that the two metrics are almost identical, while low overlay shows that one metric is not appropriate.
1.2.2 Keywords: Texts are comprised of particular word patterns that recur and are key or representative to that text, called ‘keywords.’
1.2.3 Collocation. Corpus linguistics draws quantitative associations between two words /phrases a collocation: ‘co’ meaning with and ‘location’ indicating words that occur within a specific location. Collocation is the presence of a relationship between words that appear together: keywords are usually defined according to the difference in their relations. The Keynes metric is expected to represent the extent of the difference in frequency. The difference between ‘Collocation analysis’ and ‘keyword analysis’ is that “Whereas collocation analysis measures the strength of the relationship between a node in its immediate environment, keyword analysis aims to establish whether there is a statistically significant relationship between a word and the texts” (Anthony, unit 7). In essence, keyword analysis uses statistical methods to identify which are particularly associated with a corpus and can, therefore, reflect the author’s identity (i.e., related to his meta-data profile). The significance of the metrics of the text’s producer is useful for data essential to making claims valid about the speaker/text.
1.2.4 Validity: Validity in the form of appropriate comparisons ensures the keyword list is correct, so carefully selecting a reference text is tedious yet essential to make comparisons between the Node/studied text and the reference.
For my analysis, a collection of speeches produced by British PMs has all been ‘clumped’ together and comprises most of the corpora. This is because the function of that post (Prime Minister) serves as the point of comparison since they all share similar meta-data (educational, social, nationality, etc.). In contrast, other speakers’ meta-data is not as evident while Lord Byron of Harrow was a politician, poet, and alumni of the same ‘Harrow’ Academy as young Winston. The American president Theodore Roosevelt shares ‘some’ of Churchill’s national identity being the case that his mother was American, so aside from nationality, the two are quite similar, yet nationality would appear relevant.
1.2.5 The Reference Corpus. The Reference corpora are composed of 3 sub-categories. British PMs, dating from 1918-1990s, produced ten speeches, one speech was delivered by the American president, Theodore Roosevelt, and two originate from Lord Baron Byron, author of ‘Don Juan’,. We include here two poems seeming relevant ‘The Bronze Age’ and “Darkness” as these pertain to war-like conditions such as pestilence, devastation.
1.2.5.1 British PM Speeches: The speeches produced in this group were chosen because their producers were all PMs and because they have an underlying theme: peace and security with the potentiality of war. This has been used as a general benchmark in generating a reference corpus: dates of production span the pre and post WWII era, extending into the Cold War era.
The compilation of the sub-corpora is believed to be a valid reference because of the broad span in time coupled with the general, centralized theme of peace and security, implicating war potentialities, including concerns with fortifying the economy with technology, providing national health care, rearmament, etc.
1.2.5.2 The Node Corpus. According to the National Churchill Museum, amongst others, “Winston Churchill is widely considered to be one of the greatest speakers of the twentieth century. Among his most famous speeches were those in 1940 when Churchill rallied a nation with his words and optimism.” Although Churchill made only five broadcasts to the nation during the onset of WWII (summer of 1940), all of his speeches display his masterful way with words. They transmit his mindset, his determination, and commitment, and they infused his countrymen with his confidence (www.winstonchurchill.org). The Node corpora contain four of his war speeches produced in the summer of 1940.
1.3 Methodology
The online, free software called AntConc was employed to analyze the keywords. This entailed two steps: generating a Keywords List, followed by a comparison between the corpora, using the ‘Collocates Tool,’ then a Concordance Analysis, using the ‘Concordance’ and ‘Concordance Plot’ Tools (Anthony Laurence, https://doi.org/10.1017/S0261444808005247). Moreover, because my knowledge of Keyword Analysis is limited to this module, I made use of Anthony Lawrence class syllable alongside Laurence’s tutorials, including ‘AntConc 3.4.0: no. 8: Collocates Tool’, tutorial 2: ‘Concordance Tool’ and tutorial 3: ‘Advanced Features’ accessed online/YouTube.
1.3.1 Starting Analysis
Accessing the AntConc Tool by uploading it onto my computer desktop set me on my way for my Keyword Analysis: from here, I generated my Keyword List, following this procedure:
- After selecting/preparing a Node and Reference Corpus, saved in individual files and stored in two separate folders, called: ‘Churchill_War_Speeches’ and ‘War_speeches_poems,’ bearing the extensions ‘.txt,’ I proceeded to upload both folders containing the individual texts to AntConc;
- Clicking on ‘WORD LIST,’ a word list was then generated as the software compiled two-word lists from the data.
- Using the word list, I clicked on the ‘Collocates Tool which searches for collocates – words which tend to appear closely together/are closely associated with the search term ( node word);
- I then inputted the search terms of interest, drawing from the word list, which was ‘thinned,’ meaning that I ‘top sliced’ or used words up to the 100th word, to make searches more manageable due to the extensiveness of my list: 2086 lines.
- Referring now to my ‘thinned’ wordlist, I inputted the search terms, starting with ‘we’ in the ‘Search Term’ box, and clicked ‘START.’
- Next, ‘span’ the word ‘we’ to the left and right of the search term (we), bearing in mind that AntConc’s default is five words left/5 words right, which is most commonly used to search for collocates. probably
“WE” was chosen for its apparent significance based on its ‘overuse’: it appears in the 6th position of the list, AND it is QUITE FREQUENT at 164 times.
Alternatively, searches can be modified (words, case, Regex, and advanced) affirms Laurence. However, he does not recommend lofty word studies, warning that these actually “introduce a lot of complications when interpreting the results, so staying with ‘single words’ is probably safer” (1:45 video). Table 1 below.
KEYWORD | RANK | FREQ | Log-likeliness strength |
Great | 541 | 8 | 4.15 |
Very | 550 | 9 | 4.28 |
Powerful | 482 | 3 | 4.97 |
grievous* | 143 | 1 | 5.97 |
harmful* | 147 | 2 | 5.97 |
Table 1: Modifiers
AntConc then generates the collocates for ‘we’: there are 363 collocate Types and 900 collocate Tokens, and there are over 500 lines; so a sorting approach was used to see if accuracy and frequency changed: results obtained from the ‘sorted’ list collected (Collocations_SORTED_we.txt) with some exciting results for nouns, verbs, modifiers, and prepositions (see table 2 & 3).
KEYWORD | RANK | FREQ | STRENGTH |
Affected | 452 | 3 | 5.55 |
Face | 464 | 3 | 5.97 |
Fail | 465 | 3 | 5.55 |
Hope | 471 | 3 | 4.55 |
Could | 526 | 6 | 4.75 |
Will | 524 | 5 | 2.06 |
Should | 532 | 6 | 4.38 |
Shall * | 569 | 28 | 6.1 |
Question | 484 | 3 | 4.75 |
Expect | 500 | 4 | 5.75 |
fight * | 555 | 12 | 5.47 |
Conquer | 535 | 7 | 7.19 |
Table 2: Verbs in Sorted Antconc Search
KEYWORD | RANK | FREQ | STRENGTH |
Beat | 35 | 1 | 1.80 |
Believe | 38 | 1 | 3.97 |
Defend | 82 | 1 | 4.38 |
Doubt | 96 | 1 | 4.97 |
Able | 369 | 2 | 3.51 |
Allow | 370 | 2 | 5.38 |
Hoped | 409 | 2 | 5.97 |
Fighting | 392 | 2 | 2.80 |
Table 3: Rarer: Verbs in Antconc Sorted Search.
The search for collocates in the Keyword Analysis drew slightly more results for another search term ‘our,’ with 564 to 575 hits, given a ‘sorting’; although this approach did not yield higher results for ‘we’ (575 Types-1640 Tokens), it seems that some patterns emerged in a way which appears to distinguish the producer of the Node corpus.
1.3.2 Observations: Keyword Analysis Issues
In reporting my observations for the Keyword Analysis, it seems relevant to mention here what has been remarked on by other linguists. The field of Corpus Linguistics, corpus analysis, and, therefore, Keyword and Concordance Analysis has emerged from two particular pioneers who have had provoking observations. When it comes, for example, to collocates, single words/phrases recurring together, we can be reminded of a somewhat predictable phenomenon: “You shall know a word by the company it keeps” Firth (1957 [1968]:179).
Although accurate, rapid identification of collocates is relatively new, technology transformed corpus analysis by enabling the digitalization of corpora. Previous linguists, such as Firth, saw word patterns which were as yet untraceable without the aid of computer software such as the BNCweb and AntConc. Accurate identification and quantification were, therefore, tedious and error-prone. Despite these limitations, Firth’s observations of co-occurring word patterns propagated further advances in collocation and meaning. The University of Birmingham’s own John Sinclair (1965-2000), professor of modern English language developed methodologies towards the present-day form; present-day terminology, for example, originates with Sinclair’s terms of ‘node’ and ‘span,’ referring to the investigated word whose collocates we are in search of, and the number of ‘lexical items’ on either side of the node (Anthony Lawrence, unit 5) is researched with those words in the immediate environment of span e.g., collocates. Identifying the collocates says Anthony, can be approached by studying an exciting concept in the node corpus, and such searches can be expanded by modifying the parameters: altering the number of words to the left/right of the search term. While Sinclair (1966) prescribed the correct span to be four descriptive words to the left of the node and four to the right (4:4), other tools use alternate settings, including the BNCweb, which defaults to three (3:3).
Using Orwell’s first line from‘1984’, Anthony explains why the former Sinclair Ian settings complicate things:
“It was a bright cold day in April, and the clocks were striking thirteen” (Orwell, 1949).
Essentially, all the words in the line are collocates, using Sinclair’s formula, which leaves much to be desired when searching for the meanings: words which are function words such as (in) definite articles (a the), conjunctions (and, that), some prepositions (in/at) and some verbs (was, is) and all texts are fraught in them. Measuring the associations between words is then essential in Corpus linguistics’ word analyses. The strength of the association, log-likeliness or the ‘amount of glue between words’ is feasible via two approaches: a quantitative approach to collocation and a qualitative approach via concordance analysis which facilitate identification of co-occurrence patterns whereby the node word is centered on the computer screen ( as in AntConc’s Concordance Tool). These are the most useful used together. Keyword analysis, for example, identifies word patterns, and then a follow-up concordance analysis is useful for uncovering their purpose/ meaning in the text.
1.3.3 Frequency and Association Strength
Keyness is a word that represents the log-likelihood value or chi-square statistics. It thus provides an indicator of the importance of the keyword as a descriptor of the content of the appeal. Present studies concerned with the ‘amount of glue’ or association strength between words have warned against associating collocate frequency with ‘strength’: in Anthony’s example of a ‘dangerous thing’ for example, he reports 51 occurrences; while only 38 for ‘dangerous + substances; yet it is the latter, dangerous + substances, which is most associated/used together, indicating that the frequency is not the all-in-all identifier. This has to do, in fact, with the frequency of the individual words.
(NormFreq in SC – NormFreq in RC) x 100
NormFreq in RC
SC = small corpus
RC= reference corpus
NornFreq= Normalized Frequency
Dangerous thing dangerous co-occurrences
5.621 33.87 5.621
(SC) (RC) (RC) = 2.73 % with dangerous =.5
=0.15% with dangerous
It indicates that there is a very low overlap showing that the differences in the ranking of the keywords are minimal. A keyword may be at position 20 in one ranking, and the same keyword may take place 100 in the other classification. “For substances, 2.73 of all occurrences are preceded by dangerous while for liaisons almost a fifth (18%) of all instances co-occur in the perilous. Table four below shows keywords and their frequency rankings together with their strengths in terms of keyness.
KEYWORD | RANK | FREQ | STRENGTH |
courage | ? | 1 | 4.38 |
dominion | 95 | 1 | 3.65 |
fate | 116 | 1 | 4.97 |
governments | 139 | 1 | ? |
confidence | 379 | 2 | 4.16 |
fighting | 392 | 2 | 2.80 |
power | 428 | 2 | 3.06 |
Table 4: Rarer: Sorted NOMINAL Collocates of ‘we.’
Based on the total number of words in the corpus and the frequency of 2 individual words, it is possible to calculate the number of times we would expect those two words to co=occur in a given span (Miller, unit 5). This formula would require that all words co-occur randomly, yet language in real-life communication is not random, indicating that measures calculated this way are just approximations.
The CHI-SQUARE TEST:
A comparison of expected co-occurrence of frequency based on observed rates is useful in providing a measure of association between words, and this is possible via the Chi-square test.
They were combined together with the BNC. Which calculates association measures through log-likelihood and Mutual Information (MI), these two approaches are useful for measuring collocations. This test is possible in the BNC via the ‘statistics’ menu as well as a ‘Log-likelihood’ option in AntConc.
The p-value of the frequency difference, as measured by a statistical test – usually log-likelihood or Chi-square. In table 5 below shows the chi-square values for the keywords. The ranks represent the position the keywords take, and it is used to determine the level of strength. For instance, the ‘we’ keywords have a p-value of 22and its power is 3.07. The ‘we’ keyword was ranked 565.keyword ‘battle’ was ranked the least, and it recorded the least strength.
KEYWORD | RANK | FREQ | STRENGTH |
us | 510 | 4 | 3.06 |
they | 538 | 7 | 2.78 |
we* | 565 | 22 | 3.07 |
all | 539 | 8 | 2.63 |
ourselves* | 549 | 9 | 5.18 |
victory | 493 | 3 | 4.09 |
enemy | 499 | 4 | 3.32 |
island | 507 | 4 | 3.97 |
battle | 3.59 | 5 | 3.59 |
war | 523 | 5 | 3.16 |
Table 5: Nouns in Sorted Antconc Search:’We.’ |
Although the predominance of words is function words, some previously unnoticed patterns seem to emerge in the nouns, modifiers, and verbs.
NOUNS | MODIFIERS | VERBS |
We (6th) | our (11th) | Will (20th) |
All (16th) | they are (19th) | Would (42nd) |
British (36th) | Very | Force (50th) |
Army (49th) | many | May (58th) |
War (43rd) | great | Must (59th) |
Us (52nd) | Shall (63rd) | |
Battle (56th) | Fighting (79th) | |
House (57th) | Should (81st) | |
enemy(60th) | Could (97th) | |
Forces (86th) | ||
Island (87th) |
Table 6: Word Patterns
When observed together with these patterns across the parts of speech appear to convey the speaker’s inner world, his mindset; the word list generated by AntConc was extensive, 2086 lines long, so a ‘top-slicing’ approach was employed, using the top 100 words, and then allowing me to analyses a ‘thinned’ list and make some observations:
The most frequent words, appearing at the top of the list are function words: ‘the’ in the first place, ‘and’ (2nd), occurring 396 times, ‘of’ (3rd), ‘to’ (4th/298 times, ‘in’ (5th/217 times; then an unusual pronoun ‘we,’ (6th), followed by ‘have’(7th) with 146 occurrences, and ‘that’ (9th) at 140 times.
Aside from function words such as (in) definite articles (a, an, the), conjunctions (and, that), prepositions (of, in), there is a high prevalence of nouns. The first noun, appearing very high /early in the list (6th), is ‘we’ at a stunning 22 times, catches my attention. This raises the question of whether or not the frequency of ‘we’ is significant and meaningful.
A Keywords Analysis is often followed by a Concordance analysis to uncover meanings in words previously identified during the Keywords Analysis. These tests are, therefore, complementary. Ton pursues this next phase, the AntConc ‘Concordance Tool’ was employed. The aim was to identify patterns and meaning in node corpora’s keywords list.
2.1. Methodology
To verify the presence of these particular keywords, the AntConc ‘Plot Tool’ was useful in making comparisons across the texts, and some have been captured via Windows 10 Snip Tool for illustration of the results. The Keywords Analysis has two phases, with phase one seen in section one, and phase two, the concordance analysis, needed to gain more specific information about particular words of interest such as more common terms, and AntConc’s Concordance tool, unlike the BNC equivalent, allows us to upload our own corpora.
The advantage of this tool is that it allows us to search for a word/phrase of interest in the corpus and then provides visual images that make the differences between the node and reference corpora quite noticeable.
Opening up the ‘Concordance Tool,’ we can type the target word into the search box, then click ‘start’ or hit ‘return,’ says Anthony Laurence, which prompts the software to search through the corpus and find all the hits with the search term, and then display them in the middle; showing the words on both the left and right of the search term.
The search can be expanded by clicking on the search window and increasing the number to, for example, 100, which includes 100 words on either side. The number of hits appears at the top and appear in the same order as in the corpus; however this ordering makes it challenging to see the patterns; so to see models, it is best to ‘SORT’ the results (Anthony Lawrence, tutorial no.2: How to use the Concordance Tool), which is done by using the KWIK SORT option. AntConc has three ‘sort’ levels one word to the right, two words to the right, and then three names to the right, meaning that the results are ordered to the first/second/third word to the right, allowing more patterns to appear. Some patterns which this tool made clear included searches for some interesting words like: ‘WE,’ ‘FIGHT, ‘HOUSE,’ ‘ENEMY,’ and ‘ISLAND.’
Figure 1: ‘We’ concordance image patterns
In a corpus, the display of all search items is referred to as a concordance. The presentation of the concordance on a computer screen is usually done in a format called Keyword in context (KWIC). In this format, the search items are centered, and the immediate co-text is provided to the left or to the right. Figure 1 gives an illustration of what a concordance may look like, although it is just a sample and not a full concordance. Every line in the concordance is sorted using the first word to the left of services. This implies that the display of the concordance is done using the alphabetical order of the first word, which is to the left. In the case of our illustration, the provided search option gives the user the ability for easier identification of words that are employed for modification of the search item services. Further, it’s also possible for one to sort the search items to the right by use of the subsequent second, third and fourth words. When investigating the corpus, the way the search outputs are displayed and configured has a significant role to play.
Fight Concordance Images are as shown below.
Figure 2: Fight Concordance
If we consider photos that are taken in events, and we store them in a database, then an image retrieval system that can be deemed to be appropriate should produce improved search results for the photos. In this case, an image retrieval system that is called ontology annotated image retrieval system has the ability to achieve precision in retrieval results. This is due to the fact that the ontology concept employs the reduction of the semantic gap in the process of retrieving. The significant point in forming an ontology is that it doesn’t contain many concepts unless they are really needed. In the event of fixing the classes and sub-classes of the ontology, there is a tendency of increasing confusion and relation in an exponential way as the concepts used increases. The second part involves dealing with the annotation of more than 600 photos that a photographer took in a function using this semantic concept-based technology. In this case, the annotations ought to be transferred in formats that are called RDF. This purpose can be achieved using an annotation editor where the real distribution of the images can be specified in text form.
The image for house concordance is shown below.
Figure 3: House Concordance
The content-based method is commonly applied in image retrieval. It works by retrieving images depending on what the photos contain by utilizing the image Metadata or Metadata that is human attached. However, it should be noted that social annotation is a time-consuming and challenging process, and therefore, it warrants the automation of the process of retrieval. The image search is refined by ensuring that the user is participating in the retrieval process by requesting them to incessantly mark each search result as either irrelevant, relevant, or neutral. This is the method that is commonly referred to as relevance feedback method. When a comparison is being made between a given image and a database image, what is relied on is the distance measure. This measure examines the contrast and the closeness of two given images in various parameters like texture, color, shape, and spatial locations. Therefore, if we obtain a zero-value distance, then it means that the photos being investigated are a perfect match. If the value obtained is above zero, then the photos are similar in different ways.
The image below shows enemy concordance.
TEXT TO BE ADDED about patterns.
The island concordance image is shown below.
To see where the first results appeared in the corpora, float the cursor over the search hit: ‘we’ and a pointing finger appears; once we click on the node word, the software jumps to the file view tool and shows us where the word is in the corpus. When we use searching and sorting together, the combination can sometimes enhance the patterns. If we enter ‘fight’ for example, we could then type in ‘fighting’ or ‘fought’ to get more results for this lemma.
Other Concordance samples are shown below.
FIGHT Concordance
FIGHTING Concordance
‘Fighting’ concordance
‘fought’ concordance
‘fought’ concordance
To enhance results, searches can also be carried out using a ‘Wild Card.’ Two wild card examples are given below. The wildcard setting is displayed on the computer screen when using the corpus software. In the second wildcard, the meaning of the words is displayed on the far left side of the computer screen while on the far right side are shown the keywords.
_Concordance_Wild_Card_
_Concordance_Wild_Card A2
References
Ball, C.N., 1994. Automated text analysis: Cautionary tales. Literary and Linguistic Computing, 9(4), pp.295-302.
Dipert, J.J., Hoffman, J.H., Barez, C.K., Nunez, R., Toulon, S. and Olsavsky, T., adidas-Salomon (USA) Inc, 2008. Golf club head. U.S. Patent Application 29/286,254.
Gabrielatos, C., and Marchi, A., 2011, November. Keyness: Matching metrics to definitions. In Theoretical-methodological challenges in corpus approaches to discourse studies and some ways of addressing them.
Hofmann, S., Evert, S., Smith, N., Prytz Lee, D., and Peter Lang, Y., 2008. Corpus Linguistics with BNCweb, a Practical Guide (English Corpus linguistics). Frankfurt a/M, Main.
Hunston, S., 2002. Corpora in applied linguistics. Ernst Klett Sprachen.
John, S., 2003. Reading Concordances.
Knowles, G., and Don, Z.M., 2004. The notion of a “lemma”: Headwords, roots, and lexical sets. International Journal of Corpus Linguistics, 9(1), pp.69-81.
Sinclair, J., 2005. Corpus and Text-Basic Principles in Developing Linguistic Corpora: a Guide to Good Practice, ed. M. Wynne.
Sinclair, J.M., 1991. Words and phrases. Corpus, concordance, collocation, pp.70-75.