E-Book Overview
Using the Web as Corpus is one of the recent challenges for corpus linguistics. This volume presents a current state-of-the-arts discussion of the topic. The articles address practical problems such as suitable linguistic search tools for accessing the www, the question of register variation, or they probe into methods for culling data from the web. The book also offers a wide range of case studies, covering morphology, syntax, lexis, as well as synchronic and diachronic variation in English. These case studies make use of the two approaches to the www in corpus linguistics - web-as-corpus and web-for-corpus-building. The case studies demonstrate that web data can provide useful additional evidence for a broad range of research questions. Contents: Marianne HUNDT, Nadja NESSELHAUF and Carolin BIEWER: Corpus linguistics and the web Accessing the web as corpus: Anke LUEDELING, Stefan EVERT and Marco BARONI: Using web data for linguistic purposes William H. FLETCHER: Concordancing the web: promise and problems, tools and techniques Antoinette RENOUF, Andrew KEHOE and Jayeeta BANERJEE: WebCorp: an integrated system for web text search Compiling corpora from the internet: Sebastian HOFFMANN: From webpage to mega-corpus: the CNN transcripts Claudia CLARIDGE: Constructing a corpus from the web: message boards Douglas BIBER and Jerry KURJIAN: Towards a taxonomy of web registers and text types: a multidimensional analysis Critical voices: Geoffrey LEECH: New resources, or just better old ones? The Holy Grail of representativeness Graeme KENNEDY: An under-exploited resource: using the BNC for exploring the nature of language learning Language variation and change: Anette ROSENBACH: Exploring constructions on the web: a case study Günter ROHDENBURG: Determinants of grammatical variation in English and the formation / confirmation of linguistic hypotheses by means of internet data Britta MONDORF: Recalcitrant problems of comparative alternation and new insi
E-Book Content
Corpus Linguistics and the Web LANGUAGE AND COMPUTERS: STUDIES IN PRACTICAL LINGUISTICS No 59 edited by Christian Mair Charles F. Meyer Nelleke Oostdijk Corpus Linguistics and the Web Edited by Marianne Hundt, Nadja Nesselhauf and Carolin Biewer Amsterdam - New York, NY 2007 Cover design: Pier Post Online access is included in print subscriptions: see www.rodopi.nl The paper on which this book is printed meets the requirements of "ISO 9706:1994, Information and documentation - Paper for documents Requirements for permanence". ISBN-10: 90-420-2128-4 ISBN-13: 978-90-420-2128-0 ©Editions Rodopi B.V., Amsterdam - New York, NY 2007 Printed in The Netherlands Contents Corpus linguistics and the web Marianne Hundt, Nadja Nesselhauf and Carolin Biewer 1 Accessing the web as corpus Using web data for linguistic purposes Anke Lüdeling, Stefan Evert and Marco Baroni 7 Concordancing the web: promise and problems, tools and techniques William H. Fletcher 25 WebCorp: an integrated system for web text search Antoinette Renouf, Andrew Kehoe and Jayeeta Banerjee 47 Compiling corpora from the internet From webpage to mega-corpus: the CNN transcripts Sebastian Hoffmann 69 Constructing a corpus from the web: message boards Claudia Claridge 87 Towards a taxonomy of web registers and text types: a multidimensional analysis Douglas Biber and Jerry Kurjian 109 Critical voices New resources, or just better old ones? The Holy Grail of representativeness Geoffrey Leech 133 An under-exploited resource: using the BNC for exploring the nature of language learning Graeme Kennedy 151 vi Contents Language variation and change Exploring constructions on the web: a case study An