Descriptor: "CORPORA" / Journal: language & computers - Searchworks@Jio Institute Digital Library Search Results

1. Dispersions and adjusted frequencies in corpora: further explorations.

Author: Gries, Stefan Th.
Subjects: *CORPORA, *LINGUISTIC analysis, *LINGUISTICS, *PSYCHOLINGUISTICS, *COMPARATIVE grammar, *PHILOSOPHICAL analysis
Abstract: In order to adjust observed frequencies of occurrence, previous studies have suggested a variety of measures of dispersion and adjusted frequencies. In a previous study, I reviewed many of these measures and suggested an alternative measure, DP (for 'deviation of proportions'), which I argued to be conceptually simpler and more versatile than many competing measures. However, despite the relevance of dispersion for virtually all corpus-linguistic work, it is still a very much under-researched topic: to the best of my knowledge, there is not a single study investigating how different measures compare to each other when applied to large datasets, nor is there any work that attempts to determine how different measures match up with the kind of psycholinguistic data that dispersions and adjusted frequencies are supposed to represent. This article takes exploratory steps in both of these directions. [ABSTRACT FROM AUTHOR]
Published: 2010

2. Summary and further work.

Subjects: *COMPUTATIONAL linguistics, *TAGS (Metadata), *SPANISH language, *NATURAL language processing, *CORPORA, *CZECH language
Abstract: The article presents an overview of the efficiency of a morphological-syntactic system for tagging language pairs of Russian-Czech, Portuguese-Spanish, and Catalan-Spanish words. It cites the need of having a reference grammar book, greater amount of text for lexicon learning, annotated training corpora of source language, and a dictionary in the system to provide more accuracy. It notes the significance and benefit of minimal morphology encoding of paradigms in language pairs.
Published: 2010

3. Languages, corpora and tagsets.

Subjects: *NATURAL language processing, *PORTUGUESE language, *SPANISH language, *CATALAN language, *RUSSIAN language, *CORPORA, *CZECH language, *TAGS (Metadata)
Abstract: The article provides an overview of the Czech, Russian, Portuguese, Spanish, and Catalan languages as well as various types of corpora and tagsets that were used in a natural language processing (NLP) experiment. It outlines the properties and provides comparison of the aforementioned languages in terms of inflection, morphology, and lexicons. It cites various corpora for the NLP of Czech, Russian, Catalan, and Portuguese such as Stat and Raw. It notes several types of NLP tagsets.
Published: 2010

4. How representative are the `Philosophical Transactions of the Royal Society' of 17th-century scientific writing?

Author: Moessner, Lilo
Subjects: *TECHNICAL writing, *CORPORA, *LINGUISTIC analysis, *HISTORICAL linguistics, *COMPARATIVE studies
Abstract: The focus of the paper is on the notion of representativeness. It is approached from three different angles. In the first section, representativeness as a (desirable? possible?) properly of linguistic corpora is discussed. Then the point of view is narrowed down to the R (for `representative') in ARCHER, and here in particular to the register `science'. In the following empirical part, a multidimensional analysis of English science texts of the 17th century is presented. It is based on a corpus which comes in equal parts from ARCHER and from other sources. The comparative analysis reveals major differences between the sub-corpora. They are interpreted in section 4 as different degrees of representativeness. The last section contains a summary and the conclusion that the linguistic structure of English science texts of the 17th century is not fully represented by a random sample of texts from the Philosophical Transactions. [ABSTRACT FROM AUTHOR]
Published: 2009

5. On the use of split infinitives in English.

Author: Calle-Martín, Javier and Miranda-GarcIa, Antonio
Subjects: *ADVERBS (Grammar), *ENGLISH language, *ENGLISH pronouns, *TMESIS, *CORPORA, *PROTOTYPE (Linguistics), *INFINITIVAL constructions
Abstract: A split infinitive construction denotes a particular type of syntactic tmesis in which a word or phrase, especially an adverb, occurs between the infinitive marker to and the infinitive of the verb. Although rare from a statistical viewpoint, the earliest instances of the split infinitive date back to the 13th century, in which a personal pronoun, an adverb or two or more words could appear in such environments (Visser 1984, II: 1038-1045). Its use drops drastically throughout the 16th century, but it begins to gain ground again in the 19th century, hence resisting the severe criticisms of grammarians. Nowadays, however, a search for these types of constructions in a present-day English corpus reveals that the prejudice against split infinitives is receding. Therefore, this paper investigates the actual use of the construction in different corpora with the following objectives: a) to provide the statistics of the construction from a historical perspective; b) to analyse the type of adverbs occurring in these contexts; c) to offer a taxonomy of the adverb from a functional perspective; d) to investigate the combined effect of stress and rhythm in the development of the construction; and e) to review the actual use of a prototype splitting in present-day English usage.' [ABSTRACT FROM AUTHOR]
Published: 2009

6. Re-analysing the semi-modal ought to: an investigation of its use in the LOB, FLOB, Brown and Frown corpora.

Author: Degani, Marta
Subjects: *MODALITY (Linguistics), *AMERICAN English language, *VERBS, *CORPORA, *ENGLISH language -- Variation
Abstract: Most of the research on modality in English has been devoted to the study of core modals (Bybee et al 1994; Kemenade 1993; Palmer 1979, 1986; Plank 1984; Roberts 1985; Traugott 1989; Warner 1990). As a consequence, semi-modals have been kept in a state of relative marginality. This holds particularly true in the case of ought to, as confirmed by the lack of substantial work concerning this semi-modal. The present paper addresses the need to fill this gap by providing a description of ought to in British and American English from a short-term diachronic perspective. The study takes a top-down approach since it starts from the working hypothesis that ought to, like other modal verbs, has been gradually undergoing a process of "subjectification" (Traugott 1989, 1995). The hypothesis will be tested on the four corpora constituting the so-called Brown family'(LOB, FLOB, Brown and Frown), so as to identify any possible short-term diachronic changes in the British and American varieties under scrutiny. In order to measure how and to what extent the phenomenon of `subjectification" has affected ought to, the analysis will be carried out along semantic and syntactic lines.' [ABSTRACT FROM AUTHOR]
Published: 2009

7. Global English — Global Corpora: Report on a panel discussion at the 28th ICAME conference.

Author: Hundt, Marianne
Subjects: *UNIVERSAL language, *ENGLISH language education, *CORPORA
Abstract: The article reports on the panel discussion regarding the role of corpus linguistics in the study of English as a global language. It notes that the discussion was held during the 28th International Computer Archive of Modern English (ICAME) conference and was chaired by Marianne Hundt. It cites the views and position statements of panel members Pam Peters, Anna Mauranen, and Joybrato Mukherjee on topics including English as an international lingua franca (EIL) and norms for global English.
Published: 2009

8. Awful adjectives: a type of semantic change in present-day corpora.

Author: Göran Kjellmer
Subjects: *ADJECTIVES (Grammar), *SEMANTICS, *VARIATION in language, *CORPORA, *DISCOURSE analysis
Abstract: Semantic change observable in isolated linguistic items is both frequent and interesting in itself. More interesting, perhaps, are cases of structural change, i.e. cases where one and the same tendency can be discerned in a related group of words. This paper uses modern corpus material in order to sketch the development of one such group, words meaning frightening', and suggests that they all follow the same trend in the direction of impressive, overwhelming' although they differ with respect to how far they have advanced along that route. The semantic changes of some 25 words in the chosen area are studied in detail, and their development is illustrated with corpus material. One of the conclusions of the study is that their rate of semantic progress is partly dependent on the time when they entered the semantic field. The paper deals with the adjectives in the group and leaves the adverbs, although equally interesting, out of account for a later investigation. [ABSTRACT FROM AUTHOR]
Published: 2009

9. Discourse presentation in EFL textbooks: a BNC-based study.

Author: Christoph Rühlemann
Subjects: *TEXTBOOKS, *ENGLISH as a foreign language, *DISCOURSE analysis, *SPEECH, *LEXICAL grammar, *CORPORA
Abstract: Following corpus-linguistic research which has shown the representation of certain lexico-grammatical features in EFL textbooks to be at variance with their use in native English, this paper aims to explore the match or mismatch of discourse presentation (often referred to as `speech reporting') in conversation and its representation in EFL textbooks. The analysis of selected textbooks shows that textbook representation is overwhelmingly concerned with indirect and, to a much lesser extent, narratised mode but not direct mode, the free categories and representation of voice. Further, textbooks promote quotatives typical of written registers but not informal everyday speech. Specifically, I show that discourse presentation in EFL textbooks features essential parallels with a written register, namely journalistic writing. The concluding section considers implications for EFL teaching. [ABSTRACT FROM AUTHOR]
Published: 2009

10. A corpus-based analysis of invariant tags in five varieties of English.

Author: Columbus, Georgie
Subjects: *CORPORA, *DISCOURSE markers, *GRAMMAR, *ENGLISH language, *ENGLISH as a foreign language, *DISCOURSE analysis
Abstract: Discourse markers are a feature of everyday conversation — they signal attitudes and beliefs to their interlocutors beyond the base utterance. One particular type of discourse marker is the invariant tag (InT), for example New Zealand and Canadian eh. Previous studies of InTs have clearly described InT uses in individual language varieties. Such studies have focused on sociolinguistic features and on sociolinguistic functions of single markers. However, InTs as a class have not yet been fully described, and the variety of approaches taken (corpus- as well as survey-based) means that cross-varietal or cross- linguistic comparison cannot be conducted with the results thus far. This study investigates inTs in five varieties of English from a corpus-based approach. It lists the utterance-final InTs available in NZ, British, Indian, Singapore and Hong Kong English through their occurrences in their respective International Corpus of English (ICE) corpora, and compares frequency of usage across the varieties. The quantitative analysis offers a clearer overview of the InT class for descriptive grammars, and clarifies some usage aspects for ESL/EFL pedagogy. Finally, the results offer an insight into the global status of InTs in English. [ABSTRACT FROM AUTHOR]
Published: 2009

11. Exploring change in the system of English predicate complementation, with evidence from corpora of recent English.

Author: Rudanko, Juhani
Subjects: *ENGLISH language -- Variation, *ENGLISH infinitives, *SEMANTICS, *ENGLISH adjectives, *CORPORA
Abstract: There are robust grammatical differences between to infinitive and to -ing complements in English, but some predicates have exhibited variation between the two patterns. This study examines one such predicate, the adjective accustomed, and the focus is on the period around the end of the nineteenth century, when the to -ing pattern was starting to emerge as a rival to the to infinitive pattern. The period is studied on the basis of the third part of the Corpus of Late Modern English Texts. Attention is paid to extraction as a syntactic factor bearing on complement selection. From a semantic point of view, the notion of a sense of choice on the part of the referent of the subject is then examined as a semantic property, and it is argued that lack of a sense of choice was associated with the emerging pattern. The article also inquires into the complement selection of the adjective in a corresponding corpus of present-day English, showing that the to -ing pattern is now the rule even in contexts linked to a lack of choice. At the same time, the adjective has become much less frequent in the language. [ABSTRACT FROM AUTHOR]
Published: 2009

12. A multi-dimensional analysis of a learner corpus.

Author: Van Rooy, Bertus and Terbianche, Lize
Subjects: *CORPORA, *ENGLISH essays, *COMPOSITION (Language arts), *WRITTEN English, *DISCOURSE analysis
Abstract: The present study reports on a multi-dimensional analysis (Biber, 1988) of the Tswana Learner English (TLE) corpus, together with the Louvain Corpus of Native English Essays (LOCNESS). A new multidimensional model is extracted, since the similarities between nativeness and non-nativeness mask differences between linguistic features to such an extent that it is not possible to come to a complete understanding of such differences using the standard 1988 model. A basic five factor model was extracted. Dimension I can be taken to capture advanced literacy, specifically as far as complex noun phrase structure is concerned, with the function of expressing information densely. Dimension 2 can he regarded as an indication of transparency and Dimension 3 captures a range of informal style features. The features that group together as Dimension 4 represent a style of writing that is more nuanced and precise and as a provisional label, we propose con textualisation of information. Dimension 5 can be regarded as the persuasive dimension in student writing, a feature that has been identified as a very important characteristic by Biber and Grabe (1987), and also in our own study of student writing. The most striking differences between the two corpora are on Dimensions / and 4. LOCNESS shows more advanced literacy than the TLE, and also contextualises information more extensively than the TLE. On the other dimensions, both corpora contain essays that display the various different styles available, showing that as a register, student writing allows for some internal stylistic variation independent of whether the writers are native or non-native speakers of English. The results confirm the usefulness of the multidimensional model, particularly to the extent that a new model is extracted. Substantial overlap between some of the dimensions in this study and dimensions in other models indicate that multidimensional modals are sensitive to particular kinds of feature groupings, which should be taken as evidence in favour of the general validity of this kind of approach. [ABSTRACT FROM AUTHOR]
Published: 2009

13. Weaving web data into a diachronic corpus patchwork.

Author: Kehoe, Andrew and Gee, Matt
Subjects: *WEB databases, *HISTORICAL linguistics, *CORPORA, *WEB search engines, *POLYGLOT texts, selections, quotations, etc., *LEXICOLOGY
Abstract: This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We introduce the diachronic search facilities provided by the WebCorp Linguist's Search Engine, including the use of a new `heat map' graph for the analysis of changes in collocational patterns over time. We illustrate how web data can be used to supplement data from standard corpora in lexicological studies. Our focus is on the vogue phrase credit crunch and the paper compares examples from standard corpora (BNC, Brown, LOB, Frown, LOB) with those found in web-accessible newspaper texts. Contrary to previous studies, we do not rely on the web solely for the most up-to-date usage examples. Instead, we show how web-accessible texts dating back to the beginning of the 20th Century can be used to fill gaps in and sharpen the picture provided by standard corpora. [ABSTRACT FROM AUTHOR]
Published: 2009

14. Change and constancy in linguistic change: How grammatical usage in written English evolved in the period 1931-1991.

Author: Leech, Geoffrey and Smith, Nicholas
Subjects: *CORPORA, *WRITING instruction, *GRAMMATICALIZATION, *COLLOQUIAL language, *AMERICAN English language, *ENGLISH language, *TWENTIETH century
Abstract: The creation of the Lanc-31 corpus (familiarly known as B-LOB - 'Before LOB') completes a trio of matching corpora of standard written British English 1931'- 1961 - 1991 on the model of the Brown corpus. The short-term history of English in the twentieth century can therefore now be examined using three equidistant broadly-sampled and comparable corpora of the written language, and it is possible to trace how far trends of change already observed in the comparison of LOB (1961) and F-LOB (1991) have themselves been undergoing change over the period in question. We will present in outline the recent history of a considerable range of grammatical features insofar as it can be learned from frequency counts from these three equivalently-sampled corpora. In many cases examined, the trend of increasing or decreasing frequency observed in the later period (1961-91) is found to be a continuation of a similar trend in the earlier period (1931-61).2 In other cases there is change in the rate or direction of change. In other words, there is both constancy and change in the rate of change. We provide tentative explanations of these changes, where appropriate, in terms of grammaticalization, colloquialization, Americanization and densification. Comparable developments in American English, based on analysis of the equivalent Brown and Frown corpora, are traced for the 1961-92 period, and provide insight into the relation between the two regional varieties, mostly showing AmE trends to be in advance of those for BrE. [ABSTRACT FROM AUTHOR]
Published: 2009

15. Tis well known to barbers and laundresses: Overt references to knowledge in English medical writing from the Middle Ages to the Present Day.

Author: Hiltunen, Turo and Tyrkkõ, Jukka
Subjects: *MEDICAL writing, *CORPORA, *MIDDLE English language, *DISCURSIVE practices, *EPISTEMICS, *LEXICAL grammar, *LEXICON
Abstract: The discursive representation of knowledge, the fundamental objective of scientific inquiry, reflects underlying epistemic conditions of scientific thought (Bates 1995). Knowledge is communicated in scientific writing by means of lexical choice, discourse conventions and the organization of information. Over the long history of vernacular medicine, the writers of each era — from scholasticism and empiricism to evidence based medicine — have had their own perspectives on knowledge, revealed by the discursive practices they employed. Lexical items referring to the concept of knowledge (e.g. knowledge, information, doctrine) are investigated from the late Middle English period to Present-day English. We analyze variation and change in the lexicon of knowledge and analyze the discursive contexts in which the terms appear, showing how these have changed over time in different subgenres within learned medicine. The study makes use of several medical corpora with a total word count of roughly one million words: the MEMT is used for the Middle English period, and a selection of texts from the EMEMT corpus (articles from the Philosophical Transactions and other contemporary medical texts) represent the Early Modern English period. For the PDE period, we use a selection of research articles from academic journals and texts from the Medicor. [ABSTRACT FROM AUTHOR]
Published: 2009

16. A reassessment of the syntactic classification of pragmatic expressions: the positions of you know and I think with special attention to you know as a marker of metalinguistic awareness.

Author: Van Bogaert, Julie
Subjects: *SEMANTICS, *EXPRESSIVE behavior, *FRAMES (Linguistics), *LANGUAGE awareness, *CORPORA
Abstract: This paper wishes to point out some limitations to the way in which the syntactic classification of pragmatic expressions has traditionally been handled. As an alternative, it proposes a syntactic classificatory system that pivots on the notion of scope. This alternative approach is applied to corpus data of you know and I think. An attempt is made to establish connections between the pragmatic expressions' syntactic behaviour on the one hand and their functional properties on the other hand and to compare the findings for both expressions. In so doing, special attention is devoted to local you know, a specific syntactic use of this pragmatic expression, which is found to correlate with a particular function, viz, that of marking metalinguistic awareness. The findings of this study may have implications for the way you know is commonly viewed, especially by laypeople, but also in scholarly settings. [ABSTRACT FROM AUTHOR]
Published: 2009

17. Discourse linguistics meets corpus linguistics: theoretical and methodological issues in the troubled relationship.

Author: Virtanen, Tuija
Subjects: *DISCOURSE, *CORPORA, *LINGUISTICS, *DISCOURSE analysis, *THEORY of knowledge
Abstract: Discourse linguistics and corpus linguistics have an uneasy relationship because of their inherent ontological and epistemological differences. Yet it is a steady relationship going back well into corpus-linguistic history, and one that both fields are highly motivated to maintain despite its many hazards and challenges. Singling out five complementary dimensions of discourse, understood here in a broad sense, this paper shows that not all of them will be equally accessible to users of corpus methods. Two fundamental aspects of discourse are identified as particularly challenging to corpus-linguistic enquiry, i.e. the distinction between product-and process-oriented approaches; and the status of the primary notion of context. The latter raises the issue of authenticity, suggesting a need to rethink what we mean by the notion. The important methodological distinction between a corpus-based and a corpus-driven approach to discourse serves to highlight key issues in the joint history of discourse linguistics and corpus linguistics. The paper is rounded off with a discussion of the benefits to be gained by a combination of discourse linguistic and corpus linguistic approaches and methods: each party can complement the other in constructive ways; to uncover new aspects of discourse that may suggest a reconsideration of our present understanding, and disclose our tacit assumptions about it. [ABSTRACT FROM AUTHOR]
Published: 2009

18. Corpus linguistics meets sociolinguistics: the role of corpus evidence in the study of sociolinguistic variation and change.

Author: Mair, Christian
Subjects: *CORPORA, *SOCIOLINGUISTICS, *VARIATION in language, *JAMAICANS, *ENGLISH language, *SYNONYMS, *VERBS
Abstract: The contribution opens with a general discussion of the relationship between sociolinguistics and corpus-linguistics. The point is made that while the concerns of these two traditions in the study of linguistic variability and variation were rather different at the outset they have meanwhile developed in such a way as to make co-operation fruitful and, indeed, necessary. This point is illustrated from the author's own work on the recently completed Jamaican component of the International Corpus of English. The variables analysed are the use of person(s) as a synonym for people, the presence or absence of subject-verb inversion in questions, the modals of obligation and necessity, negative and auxiliary contraction and,finally, the use of the "new" quotative be like. [ABSTRACT FROM AUTHOR]
Published: 2009

19. Introduction Corpus Linguistics: Refinements and Reassessments.

Author: Renouf, Antoinette and Kehoe, Andrew
Subjects: *CORPORA, *ENGLISH language
Abstract: The article discusses various reports published within the issue, including one by Christian Mair on the issues arising between corpus linguistics and sociolinguistics, one by Karin Aijmer on English modality in the light of translation correspondences across parallel corpora, and another by Magnus Ljung on the use of linguistic model of spoken interaction in conducting pragmatic reassessment of expletive interjections.
Published: 2009

20. Creating corpora from spoken legacy materials: variation and change meet corpus linguistics.

Author: Beal, Joan C.
Subjects: *CORPORA, *INFORMATION resources, *VARIATION in language, *SPEECH
Abstract: Contrasting the aims and methodologies of corpus linguists and variationists, Charles Meyer writes that the latter `have been more interested in spoken language' and `have tended to collect data for private use and have not generally made public their data sets' (2006: 169). Since the advent of sociolinguistics in the 1960's, individual scholars and research teams have been amassing recordings of spoken data, often for the purpose of investigating variation across a limited number of linguistic features. Surprisingly little of this material has, however, been made accessible to the wider community of scholars. As John Widdowson points out, `much of this data remains hidden and inaccessible, scattered in numerous, often obscure, repositories' (2003: 81). What is more, these valuable legacy materials are often kept in inadequate storage facilities, and in obsolescent media, leading to the danger of them being lost forever. The Newcastle Electronic Corpus of Tyneside English (NECTE) was created with the aid of a Resource Enhancement Grant from the then AHRB with the primary objective of `rescuing' legacy materials from the Tyneside Linguistic Survey collected c.1969 and creating an accessible corpus by combining these with more recently-collected data from the Phonological Variation and Change project, collected c.1994. More specifiIcally, the resultant corpus was designed to be of use to as wide a range of end-users as possible and therefore available in a number of formats: sound, phonetic transcription, orthographic transcription and grammatical mark-up. The challenges posed by this project, and the ways in which the project team overcame them, will be the main focus of this paper, and should provide useful pointers to anybody intending to embark on creating a corpus of spoken language, whether from legacy materials or from newly-collected data. The topics to be covered are: (i) ethical and legal issues surrounding the making accessible of data collected in an era before ethics review or the UK's 1998 Data Protection Act; (ii) the challenges involved in gathering metadata and digitising `old' audio material; (iii) standards of transcription and mark-up. Finally, there will be some discussion of plans to process other `legacy' materials, and progress made towards developing common standards, as set out in Kretzschmar et.al. (2006). [ABSTRACT FROM AUTHOR]
Published: 2009

21. A contrastive look at English and Dutch (negative) imperatives.

Author: Van Olmen, Daniel
Subjects: *CONTRASTIVE linguistics, *CORPORA, *DISCOURSE analysis, *PRAGMATICS, *SPEECH act theory (Communication), *GRAMMAR, *TRANSLATIONS, *ENGLISH language
Abstract: Somewhat surprisingly, the imperatives of the Germanic neighbor-languages of English and Dutch have not yet been compared in a systematic, corpus-based way. This paper is a first step towards such a contrastive study. Firstly, it looks at the frequencies of imperative subtypes in the spoken part of the International Corpus of English — Great Britain and in a comparably compiled Northern Dutch corpus out of the Spoken Dutch Corpus. These quantitative results raise questions about the use of imperative discourse markers in both languages, about the grammaticalization of hortatives and about alternative linguistic means of expression. A second section focuses on negative imperatives in English and Dutch. It provides a pragmatic analysis of prohibitives from the perspective of speech act theory and examines their translations in a two-way parallel corpus of plays. The English and Dutch negative imperatives are found to have roughly the same illocutionary profile. The parallel corpus data reveals a difference in correlation between prohibitives in both languages. It is argued that this distinction is part of the explanation for the frequency facts. [ABSTRACT FROM AUTHOR]
Published: 2009

22. Parser-based analysis of syntax-lexis interactions.

Author: Lehmann, Hans Martin and Schneider, Gerold
Subjects: *SYNTAX (Grammar), *CORPORA, *PASSIVE voice, *LEXICOLOGY, *ENGLISH language -- Verb, *GRAMMAR
Abstract: In this paper we present a corpus-driven approach to the detection of syntax-lexis interactions. Our approach is based on the output of a syntactic parser. We have parsed the British National Corpus and constructed a database of lexical dependencies. Such a large-scale approach allows for a detailed investigation of patterns and constructions associated with individual lexical items found in argument positions. We then address the methodological problems of such an approach. precision errors (unwanted instances) and recall errors (missed instances) and offer a detailed evaluation. We investigate the interaction between syntax and lexis in verb-subject and verb-object structures as well as the active-passive alternation. We show that our approach provides relatively clean data and allows for a corpus-driven investigation of rare collocations. [ABSTRACT FROM AUTHOR]
Published: 2009

23. Digital Editions for Corpus Linguistics: Representing manuscript reality in electronic corpora.

Author: Honkapohja, Alpo, Kaislaniemi, Samuli, and Marttila, Ville
Subjects: *CORPORA, *ELECTRONIC information resources, *DIGITAL resources for research, *XML (Extensible Markup Language), *MANUSCRIPT reproduction, *TRANSCRIPTION (Linguistics), *CODICOLOGY
Abstract: This paper introduces a new project, Digital Editions for Corpus Linguistics (DECL), which aims to create a framework for producing online editions of historical manuscripts suitable for both corpus linguistic and historical research. Up to now, few digital editions of historical texts have been designed with corpus linguistics in mind. Equally, few historical corpora have been compiled from original manuscripts. By combining the approaches of manuscript studies and corpus linguistics, DECL seeks to enable editors of historical manuscripts to create editions which also constitute corpora. The DECL framework will consist of encoding guidelines compliant with the TEl XML standard, together with tools based on existing open source models and software projects. DECL editions will contain diplomatic transcriptions of the manuscripts, into which linguistic, palaeographic and codicological features will be encoded. Additional layers of contextual, codicological and linguistic annotation can be added freely to the editions using standoff XML tagging. The paper first introduces the theoretical and research-ideological background of the DECL project, and then proceeds to discuss some of the limitations and problems of traditional digital editions and historical corpora. The solutions to these problems offered by DECL are then introduced, with reference to other projects offering similar solutions. Finally, the goals of the project are placed in the wider context of current trends in digital editing and corpus compilation. [ABSTRACT FROM AUTHOR]
Published: 2009

24. Caribbean ICE corpora: Some issues for fieldwork and analysis.

Author: Deuber, Dagmar
Subjects: *CORPORA, *DISCOURSE analysis, *GRAMMAR, *ENGLISH language, *SOCIOLINGUISTICS, *CONVERSATION analysis
Abstract: In the Caribbean, English forms the upper segment of speech continua ranging from the Standard to the broadest Creole of each territory; social and stylistic factors correlate with the linguistic range. This paper explores the implications of this for the Caribbean components of the International Corpus of English (ICE). The first issue addressed is how the most informal category of texts that field- workers are required to record for the corpus, conversations, can be made to fit into the segment of the continuum that can be described as English. It is shown that a compromise between the demands of recording `English' and recording `conversations' can be reached. The paper then goes on to discuss analytical approaches to grammatical variation in the Caribbean ICE corpora, demonstrating that the data can be fruitfully examined by a combination of quantitative and discourse analytic methods where corpus linguistics is closely integrated with sociolinguistics. [ABSTRACT FROM AUTHOR]
Published: 2009

25. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction.

Author: Paquot, Magali and Bestgen, Yves
Subjects: *ACADEMIC discourse, *CORPORA, *SPEECH, *KEYWORDS, *DISTRIBUTION (Probability theory)
Abstract: Most studies that make use of keyword analysis rely on log-likelihood ratio or chi-square tests to extract words that are particularly characteristic of a corpus (e.g. Scott and Tribble 2006). These measures are computed on the basis of absolute frequencies and cannot account for the fact that "corpora are inherently variable internally" (Gries 2006: 110). To overcome this limitation, measures of dispersion are sometimes used in combination with keyness values (e.g. Rayson 2003; Oakes and Farrow 2007). Some scholars have also suggested using other statistical measures (e.g. Wilcoxon-Mann- Whitney test) but these techniques have not gained corpus linguists' favour (yet?). One possible explanation for this lack of enthusiasm is that statistical tests for keyword extraction have rarely been compared. In this article, we make use of the log- likelihood ratio, the t-test and the Wilcoxon-Mann-Whitney test in turn to compare the academic and the fiction sub-corpora of the British National Corpus and extract words that are typical of academic discourse. We compare the three lists of academic keywords on a number of criteria (e.g. number of keywords extracted by each measure, percentage of keywords that are shared in the three lists, frequency and distribution of academic keywords in the two corpora) and explore the specificiiles of the three statistical measures. We also assess the advantages and disadvantages of these measures for the extraction of general academic words. [ABSTRACT FROM AUTHOR]
Published: 2009

26. Establishing the EU: The representation of Europe in the press in 1993 and 2005.

Author: Marchi, Anna and Taylor, Charlotte
Subjects: *BRITISH newspapers, *CORPORA, *RATIFICATION of treaties, *DISCOURSE analysis, TREATY on European Union (1992)
Abstract: This paper investigates how the European Union was represented in three British newspapers over two different time periods: 1993 and 2005. The Treaty on European Union, which led to the creation of the European Union, was signed in 1992 and entered into force in 1993. The Treaty Establishing a Constitution for Europe was signed in 2004, and like the Maastricht Treaty was subject to ratification. However, unlike the Maastricht Treaty, it was rejected in referendums in France and the Netherlands in 2005 and therefore was not implemented. These two events were chosen for their importance in the history of the European Union and because they allow for a diachronic comparison of the construal of Europe in the British press. Two sub-corpora were used in the study, the first, SiBoI93, contains approximately 92 million tokens from three broadsheet British newspapers collected in 1993 and the second, SiBol5, contains approximately 150 million tokens collected from the same sources in 2005. Each of these corpora covers the year after the signing of the treaties and therefore the period in which the ratification was discussed. The corpora were investigated using Corpus-Assisted Discourse Studies (CADS) which involves a shunting between quantitative and qualitative analytical approaches and starting points (see, for example, Partington 2004, forthcoming; Baker 2006). Our findings show that while there is no simplistic positive to negative reversal of evaluation, there is certainly a marked decrease in the newsworthiness of Europe and the European Union, and the problem the European Union faces is primarily one of visibility. [ABSTRACT FROM AUTHOR]
Published: 2009

27. Research on fiction dialogue: Problems and possible solutions.

Author: Axeisson, Karin
Subjects: *IMAGINARY conversations, *NARRATION, *CORPORA, *SPEECHES, addresses, etc., *STATISTICS
Abstract: It seems quite clear that there must be differences in the language of fiction dialogue and the narrative parts of fiction, but fiction is often treated in corpus linguistics as if it were a homogenous genre, and there is very little quantitative research on the language of direct speech in modern fiction. The main problem is that corpora are seldom annotated for direct speech, and if they are, the mark-up may be difficult to use. If there is no mark-up for direct speech, corpus query matches have to be categorized and sorted manually as being inside or outside direct speech, and the proportion of direct speech in the corpus fiction texts needs to be investigated using, for example, statistical methods. As these procedures are time consuming, there is a need for specially designed corpora where direct speech is annotated. Another problem is how to define direct speech, slightly different definitions may be applied depending on whether comparisons are to be made to the narrative parts of fiction or to real-life speech. A further problem is that existing corpora are not usually sampled with a view to providing a representative sample of fiction dialogue, it seems important that samples are taken from different kinds of books and from different parts of books. [ABSTRACT FROM AUTHOR]
Published: 2009

28. On the face of it: How recurrent phrases organize text.

Author: Levin, Magnus and Lindquist, Hans
Subjects: *PHRASE structure grammar, *DISCOURSE analysis, *PRAGMATICS, *COMPARATIVE grammar, *CORPORA
Abstract: This study concerns the text-organizing functions of the recurrent phrases on the face of it, on its face and in (the) face of. It is argued that the development of the pragmatic meanings is related to grammaticalization theory. It is demonstrated that the two former phrases often occur in constructions where they are followed by a hedge and a refutation (on the face of it ... this seems ... But...). In (the) face of mainly organizes text through its connection with negative evaluations (e.g. in the face of opposition). The new Time and COCA corpora compiled by Mark Davies are shown to be useful complements to the BNC. These corpora show that on its face is typical of American English, and that the article-less variant in face of is rare, and possibly decreasing in use. [ABSTRACT FROM AUTHOR]
Published: 2009

29. Self-reference and mental processes in early English personal correspondence: A corpus approach to changing patterns of interaction.

Author: Palander-Collin, Minna
Subjects: *LETTER writing, *HISTORICAL linguistics, *CORPORA, *REFERENCE (Linguistics), *WRITTEN communication
Abstract: This paper explores linguistic variation and change in the way letter writers position themselves in their letters, the focus being on gentlemen's self-reference (I) in sixteenth- and eighteenth-century personal correspondence. Self-reference is understood as a linguistic feature relevant to the identity and interpersonal functions of language. The aim is to identify broad changes in patterns of self- reference, using corpus tools, in the data extracted from the Corpus of Early English Correspondence and the Corpus of Early English Correspondence Extension. The results indicate that self-reference and self-referential mental processes increased from the sixteenth to the eighteenth century. It is suggested that this development may relate to increasing stance marking and involvement observed in the history of English from 1650 onwards, particularly as self-referential mental expressions often serve interpersonal functions. [ABSTRACT FROM AUTHOR]
Published: 2009

30. A diachronic perspective on changing routines in texts.

Author: Tanja Rütten
Subjects: *HISTORICAL linguistics, *DISCOURSE analysis, *PRAGMATICS, *CORPORA, *LINGUISTIC analysis
Abstract: This paper explores the role of exhortation in religious discourse and is intended as a case study of higher-level pragmatic analyses of texts by way of corpora and corpus methodology. Whereas diachronic pragmatic descriptions usually focus on the development of single (and isolated) items to characterize the history of a text or genre, this study accesses texts on the level of entire subsections. A diachronic outline of exhortation as one such section in religious instructive discourse shows that it undergoes a process of dissolution and integration: it falls out of use as a complex component of instructive discourse in favour of smaller, more integrated elements of exhortation. A diachronic, contrastive view of exhortation in sermons and treatises shows that this development is sensitive to time as well as genre. [ABSTRACT FROM AUTHOR]
Published: 2009

31. The pragmatics of knowledge and meaning: Corpus linguistic approaches to changing thought-styles in early modern medical discourse.

Author: Taavitsainen, Irma
Subjects: *PRAGMATICS, *CORPORA, *MIDDLE English language, *MEDICAL writing, *KNOWLEDGE transfer, *DISCOURSE analysis, *SCHOLASTICISM (Theology), *HISTORICAL linguistics, *HISTORY
Abstract: Pilot studies on the new specialized corpora with comprehensive materials, Middle English Medical Texts 1375-1500 and Early Modern English Medical Texts 1500-1700, have already shown that the lines of development in the medical register are diversified, and a dynamic picture emerges. This study relates to the dissemination of knowledge and the negotiation of meaning across a wide selection of early modern medical texts. References to authorities with specific details are typical of the top genres of teaching and research in scholasticism and continue in the early modern period, but become adapted to new functions in more popular texts. In contrast, general references marking vagueness in medieval texts and occurring more frequently in texts for heterogeneous audiences acquire special meanings connected with the rising importance of discourse communities in the top genres of the seventeenth century. My approach is connected with historical pragmatics and historical discourse analysis. Corpus linguistic methods are applied to detect the overall trends and to locate relevant passages for qualitative analysis. For a more detailed micro study, a keyword analysis is employed, with the frequencies of proper names as the prime point of interest. [ABSTRACT FROM AUTHOR]
Published: 2009

32. Historical corpus pragmatics: Focus on speech acts and texts.

Author: Kohnen, Thomas
Subjects: *HISTORICAL linguistics, *CORPORA, *PRAGMATICS, *SPEECH acts (Linguistics), *DISCOURSE analysis, *HISTORY of the English language
Abstract: This paper gives an overview of the major research in historical corpus pragmatics on speech acts and texts, adding new suggestions and insights. The section on speech acts deals with corpus-based diachronic descriptions of speech acts in the history of English and the methodological problem of retrieving speech acts in diachronic corpora. The section on texts contains a short overview of corpus-based descriptions of genres in the history of English and addresses the problems connected with the corpus-based analysis of functions and sections in texts. The final section, combining both perspectives, gives suggestions for a history of English as a history of genres and speech acts. [ABSTRACT FROM AUTHOR]
Published: 2009

33. Chapter 1: Introduction.

Subjects: *CORPORA, *AMERICAN English language, AUSTRALIAN English language
Abstract: Chapter 1 of the book "Modals and Quasi-Modals in English," by Peter Collins is presented. It explores the results of corpus-based study for the meanings of modals and quasi-modals in three parallel corpora which includes the contemporary of British English (BrE), American English (AmE), and Australian English (AusE). It highlights the three corpora which include British (ICE-GB) and Australian component (ICE-AUS) of the International Corpus of English and corpus of American English (C-US).
Published: 2009

34. Appendix.

Subjects: *ENGLISH language, *CORPORA, *MODALITY (Linguistics)
Abstract: A variety of tables for modals and quasi-modals in the British component of the International Corpus of English (ICE-GB), Australian component of the International Corpus of English (ICE-AUS), and corpus of American English (C-US) used in the book "Modals and Quasi-modals in English" is presented.
Published: 2009

35. Summary and Conclusions.

Author: Egan, Thomas
Subjects: *LINGUISTICS, *CORPORA, *LINGUISTIC analysis, *CONSTRUCTION grammar, *COMPARATIVE grammar, *LANGUAGE & languages
Abstract: Chapter 8 of the book "Language and Computers: Studies in Practical Linguistics" is presented. It contains a summary and presents conclusion on the whole study. It mentions that the book attempts to link cognitive linguistic theory to empirical evidence in the form of corpus data, which were categorized according to similarities and differences in the distribution of various constructions. The author declares that the approach in the book is usage-based and it adheres to the content requirement.
Published: 2008

36. The novel features of text. Corpus analysis and stylistics.

Author: Widdowson, Henry G.
Subjects: *STORY plots, *CORPORA, *LITERARY style, *LINGUOSTYLISTICS, *LINGUISTICS, *LITERATURE
Abstract: This paper takes up the problematic stylistic issue that Michael Stubbs addresses in his study of Conrad's Heart of Darkness of the relationship between the analysis of a literary work and its interpretation. Inspired by his example, and applying his 'quantitative stylistic methods', I go in search of textual patterns and connections in the text of Conrad's novel other than those he has noted, and consider what possible significance they might have. The findings I come up with reveal features of the text that I would not otherwise have consciously noticed. Whether these simply serve as explicit confirmation of a subliminal literary awareness, or prompt new interpretative possibilities, is an open question. But there is no direct correlation between textual findings and literary effects. The precision of the analysis of the text does not lead to any greater precision in the interpretation of the novel, but on the contrary leads to a heightened recognition of the necessary variability and elusiveness of literary significance. [ABSTRACT FROM AUTHOR]
Published: 2008

37. Linking the verbal and visual: new directions for corpus linguistics.

Author: Carter, Ronald and Adolphs, Svenja
Subjects: *CORPORA, *COMPUTER software, *COMPUTATIONAL linguistics, *LINGUISTICS, *COMMUNICATION, *LANGUAGE & languages
Abstract: This paper discusses an ongoing research project to investigate the compilation of a small corpus and the development of appropriate software tools that enable a more multi-modal approach to language data. The research draws on recent experience developed in the development of spoken corpora to explore alignments of the verbal and the visual and, as a starting point, does so with particular reference to gestures in communication and the role of head nods in particular. Issues of appropriate data capture and description are discussed alongside questions about the nature of language necessarily raised by language research that goes beyond the textual. [ABSTRACT FROM AUTHOR]
Published: 2008

38. Chapter 1 Introduction.

Author: Kaunisto, Mark
Subjects: *LINGUISTIC analysis, *LANGUAGE & languages, *CORPORA, *COMPUTERS, *LINGUISTICS
Abstract: The article presents chapter 1 of the book “Language and Computers: Studies in Practical Linguistics,” written by Mark Kaunisto. It serves as an introduction of the book which discusses the linguistic phenomena that developed rapidly as the corpus linguistics opened up new avenues for language research.
Published: 2007

39. Lexical semantics for software requirements engineering — a corpus-based approach.

Author: Lindmark, Kerstin, Natt Och Dag, Johan, and Willners, Caroline
Subjects: *LEXICOLOGY, *SEMANTICS, *COMPARATIVE linguistics, *CORPORA, *COMPUTER software, *ENGINEERING
Abstract: In companies that constantly develop new software releases for large markets, there continually arrive new requirements, written in natural language that may affect the development work. Before any decision is made about the requirements, these must be analysed and understood, and related to the current set of implemented and queued requirements. This task is time-consuming owing to the high inflow of requirements, and decision-making would be facilitated by any support that would reduce the requirements analyst's workload. One of the main tasks is finding requirement duplicates and requirements with similar content and different NLP methods have been tried for this. Simple word matching is one of the methods used for linkage between requirements. If links could be set up not only between words, but also between concepts at different semantic levels, the chances of finding content-corresponding requirements would be greater. One goal of this project is to establish a terminology for requirements as well as to establish (Wordnet-type) semantic relations between terms, in order to enable multi- level linkage. For this purpose, we use a corpus consisting of 1,932 authentic software requirements, written in English of varying grammatical and stylistic quality. First, term candidates were extracted using the WordSmith Keyword function, with BNC Sampler as reference corpus. To find out whether there is any terminology specific to the 'requirements' sub-domain of the 'software' domain, the documentation associated with the software to which the requirements relate was also used as a reference (separately). Then, lexico-semantic patterns according to Hearst (1992) were used to find hyponymy- hyperonymy relations, and to confirm manually established relations. These analyses were performed on the text both 'as is' and, reducing noise somewhat, after POS-tagging by means of the Brill tagger (Brown Corpus tag-set). The results so far suggest that corpus- based methods are of importance to the management or requirements analyses. [ABSTRACT FROM AUTHOR]
Published: 2007

40. Exploiting the Corpus of East-African English.

Author: Schmied, Josef
Subjects: *ENGLISH language, *CORPORA, *LINGUISTIC analysis, *LANGUAGE & languages
Abstract: This contribution summarises the current status and future plans for the Corpus of East African English (ICE-EA), which is part of the International Corpus of English' (cf Greenbaum 1996, Schmied 1996 and forthcoming). It concentrates on methodological problems that are related to the exploitation tools on several discovery levels, starting from the individual corpus text through the entire ICE-EA up to the World-Wide Web as a corpus (Schmied 2005). It shows that the investigations using ICE corpora for comparative theoretical and practical analyses of second-language varieties of English has hardly started, but also that interesting new research opportunities are available. [ABSTRACT FROM AUTHOR]
Published: 2007

41. The path from learner corpus analysis to language pedagogy: some neglected issues.

Author: Nesselhauf, Nadja
Subjects: *CORPORA, *LINGUISTIC analysis, *FOREIGN language education, *LANGUAGE & education, *EDUCATION
Abstract: The analysis of learner corpora is often expected to contribute to language pedagogy, but the question of how exactly results from learner corpus studies can be turned into suggestions for language teaching has received practically no attention so far. In this paper, the necessity of discussing this question is pointed out and some directions the discussion could take are outlined While the typical result of a learner corpus study is the identification of a number of features that are particularly difficult for a certain learner group, it is argued here that suggestions for language teaching should not be based exclusively on the criterion of difficult and that, moreover, the criterion of difficulty itself needs refining in order to be truly relevant. The discussion of these theoretical issues is based on results from a learner corpus study investigating the use of collocations by advanced learners. [ABSTRACT FROM AUTHOR]
Published: 2007

42. The structure of corpora in SLA research.

Author: Cowan, Ron and Leeser, Michael
Subjects: *CORPORA, *LINGUISTIC analysis, *SECOND language acquisition, *LANGUAGE acquisition, *BILINGUAL education, *BILINGUALISM
Abstract: The field of Second Language Acquisition currently uses L2 error corpora to supplement its primary methodology — grammatical judgment and production tests -for investigating specific hypotheses about the development of interlanguages. This paper argues that large L2 corpora structured according to specific criteria would allow researchers to investigate with greater precision the contribution of the native language in the evolution of interlanguages as well as concepts such as overgeneralisation and end-state grammars. Furthermore, hypotheses posed about interlanguage development after native-like attainment, such as so-called U-shaped development, could be verified. The considerations for building L2 corpora of the future are illustrated by considering the validity and reliability of a multi-level corpora of errors produced by Spanish and Korean speakers learning English. [ABSTRACT FROM AUTHOR]
Published: 2007

43. Student writing of research articles in a foreign language: metacognition and corpora.

Author: Bianchi, Francesca and Pazzaglia, Roberto
Subjects: *RESEARCH, *STUDENTS' language, *LANGUAGE & languages, *METACOGNITION, *CORPORA
Abstract: The aim of this paper is to describe the creation and use of an ESP corpus for teaching Italian undergraduate students how to write research articles in English. A metacognitive approach to reading-comprehension processes, integrated with corpus tools, was considered a prerequisite to writing activities. The students were guided in the analysis of the ways in which psychology experimental papers are structured. A corpus of psychology articles divided into moves was created (462,772 words; annotated and non-annotated parallel versions) and used in class to teach students how to analyse concordances for lexico-grammatical, and rhetorical reference in writing activities. Thanks to the availability of corpus concordances, even those students who had never used English when speaking or writing managed to produce scientific paragraphs which were grammatically and stylistically acceptable. Although the annotated files could not be used in class, the individual annotation activity carried out by the students had the primary pedagogic function of increasing awareness of the macro-structure and function-form relationships in experimental articles. [ABSTRACT FROM AUTHOR]
Published: 2007

44. NP-internal functions and extended uses of the 'type' nouns kind, sort, and type: towards a comprehensive, corpus-based description.

Author: De Smedt, Liesbeth, Brems, Lieselotte, and Davidse, Kristin
Subjects: *NOUNS, *PARTS of speech, *CORPORA, *LINGUISTIC analysis, *LANGUAGE & languages
Abstract: In this paper we investigate the various constructions containing one of the three main type nouns sort, kind and type. Basing ourselves on data from the COBUILD corpus and COLT corpus, we first present a subclassification of the main type noun constructions, which owes a lot to but also expands on Denison (2002) and Aijmer (2002). In comparison with the categories proposed in the current literature, we advocate finer distinctions mainly within the NP-internal uses of type nouns, by positing fundamental structural and semantic distinctions between head uses on the one hand and modifier uses (attributive and semi-suffix) and postdeterminer uses on the other. The subjectified qualifying uses and discourse marker uses of type nouns, by contrast, have been covered rather extensively in the literature. From the existing descriptions we retain the distinction between nominal, adverbial and sentential qualifiers, discourse markers and quotative markers. We then apply this descriptive framework to two British English data sets from opposing registers: written texts from the quality newspaper The Times (COBUILD subcorpus) and spontaneously spoken conversation between teenagers (COLT). The quantification of these analyses reveals strong asymmetries in the relative frequencies of the various type noun uses in the two data sets. While type nouns are used predominantly NP-internally in The Times, adverbial qualifiers and discourse markers predominate in the COLT-data. [ABSTRACT FROM AUTHOR]
Published: 2007

45. The semantic properties of going to: distribution patterns in four subcorpora of the British National Corpus.

Author: Berglund, Ylva and Williams, Christopher
Subjects: *CORPORA, *LINGUISTIC analysis, *BRITISH people, *SEMANTICS, *COMPARATIVE linguistics
Abstract: In this paper the authors analyse how the intentional and predictive uses of the going to construction can be seen to vary between different types of discourse, as found in BNC Baby, four one-million-word subcorpora from the British National Corpus. Selected collocational patterns of the construction are also examined. As expected, results show that the overall frequency of the construction varies considerably between the text categories examined (newspapers, fiction, academic discourse and spoken conversation). Further interesting findings are made when this difference is put in relation to the predictive vs. intentional uses and the outcome of the collocational analyses. It is shown how the choice of main verb used relates to the distribution of intentional vs. predictive uses. Person and number are also taken into consideration as factors influencing the predictive vs. intentional ratios. Differences in semantic distribution patterns are also observed when gonna is used with respect to going to. [ABSTRACT FROM AUTHOR]
Published: 2007

46. Size matters — or thus can meaningful structures be revealed in large corpora.

Author: Granath, Solveig
Subjects: *CORPORA, *LINGUISTIC analysis, *ENGLISH language, *LANGUAGE & languages, *COMMUNICATION
Abstract: After sentence-initial thus, both S-V word order and inverted word order can be found. The standardised, million-word corpora of American and British English reveal four possible ways of ordering the main sentence constituents after initial thus: S-V, V-S, Aux-V-S, and Aux-S-V. However, larger corpora are needed to determine the reason for the variation. The British Guardian/Observer 1998-2002 (approximately 50 million words per year), used in the present investigation, shows some kind of inversion in 10-15% of the sentences with initial thus. A systematic comparison of examples with inverted and non- inverted word order demonstrates that this is neither a case of free variation nor formality of style, but rather a case of word order being used to signal a difference in the function of thus. Accordingly, S-V word order is used after resultative, summative, and appositive thus, whereas inversion is used when thus is a deictic proform. [ABSTRACT FROM AUTHOR]
Published: 2007

47. Inversion in modern written English: syntactic complexity, information status and the creative writer.

Author: Kreyer, Rolf
Subjects: *COMMUNICATION, *LANGUAGE & languages, *CORPORA, *LINGUISTIC analysis, WRITING
Abstract: Full-verb inversion in English has been the subject of a large number of studies in the recent and the less recent past. The present study tries to give a corpus-based account of this phenomenon within a discourse-functional framework. First, I will describe the influence of syntactic complexity and information status. However, I will argue that inversion should not merely be regarded as a means to ensure processability and flow of information. Instead, inversion should be understood as the result of a conscious choice on the part of the writer, who makes deliberate use of this rather rare syntactic phenomenon to serve certain superordinate functions, namely text structuring and what I call the immediate-observer effect, i.e. helping the reader to immerse into the discourse world. It will be shown that the distributions of weight and information status within inverted constructions can be understood as a result of these two superordinate functions. [ABSTRACT FROM AUTHOR]
Published: 2007

48. Corpus development 25 years on: from super-corpus to cyber-corpus.

Author: Renouf, Antoinette
Subjects: *CORPORA, *LINGUISTIC analysis, *LINGUISTICS, *LANGUAGE & languages, *COMPARATIVE grammar
Abstract: By the early 1980s, corpus linguists were still considered maverick and were still pushing at the boundaries of language-processing technology, but a culture was slowly bootstrapping itself into place, as successive research results (e.g. Collins-Cobuild Dictionary) encouraged the sense that empirical data analysis was a sine qua non for linguists, and a terminology of corpus linguistics was emerging that allowed ideas to take form. This paper reviews the evolution of text corpora over the period 1980 to the present day, focussing on three milestones as a means of illustrating changing definitions of 'corpus' as well as some contemporary theoretical and methodological issues. The first milestone is the 20-million-word Birmingham Corpus (1980-1986), the second is the dynamic 'corpus (1990-2004); the third is the 'Web as corpus' (1998-2004). [ABSTRACT FROM AUTHOR]
Published: 2007

49. Corpora and spoken discourse.

Author: Wichmann, Anne
Subjects: *CORPORA, *DISCOURSE, *ORAL communication, *LANGUAGE arts, *SPEECH
Abstract: The focus of this paper is on spoken corpora — corpora of naturally occurring speech data that have been compiled for the use of linguists and discourse analysts, as opposed to speech corpora, as commonly used for applications in speech technology and containing various forms of elicited data. I discuss some of the practical and theoretical issues involved in compiling and analysing such data, especially the problems of prosodic annotation and the automatic analysis of the speech signal. I argue that the primary data, i.e. the sound files, are of crucial importance: sounds are not just an additional resource for the study of prosody but an integral part of the message. [ABSTRACT FROM AUTHOR]
Published: 2007

50. Seeing through multilingual corpora.

Author: Johansson, Stig
Subjects: *MULTILINGUALISM, *LANGUAGE & languages, *CORPORA, *ENGLISH language, *SWEDISH language, *NORWEGIAN language
Abstract: In the last 10-15 years there has been a rapidly growing interest in multilingual corpora. In this paper I comment on the development and give some examples from recent research, drawing in particular on work connected with the English-Norwegian Parallel Corpus and the English-Swedish Parallel Corpus. Notions dealt with include translation paradigms, mutual correspondence, semantic mirrors, zero correspondence, and translation effects. Special attention is paid to the English nouns person and thing in a contrastive perspective. [ABSTRACT FROM AUTHOR]
Published: 2007

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

184 results on '"CORPORA"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources