E-Book Overview
This volume will be of particular interest to readers interested in expanding the applications of corpus linguistics techniques through new tools and approaches. The text includes selected papers from the Fifth North American Symposium, hosted by the Linguistics Department at Montclair State University in Montclair New Jersey in May 2004. The symposium papers represented several areas of corpus studies including language development, syntactic analysis, pragmatics and discourse, language change, register variation, corpus creation and annotation, and practical applications of corpus work, primarily in language teaching, but also in medical training and machine translation. A common thread through most of the papers was the use of corpora to study domains longer than the word. Not surprisingly, fully half of the papers deal with the computational tools and linguistic strategies needed to search for and analyze these longer spans of language while most of the remaining papers examine particular syntactic and rhetorical properties of one or more corpora. Contents: Preface Analysis Tools and Corpus Annotation: Leslie BARRETT, David F. GREENBERG, and Mark SCHWARTZ: A Syntactic Feature Counting Method for Selecting Machine Translation Training Corpora Angus B. GRIEVE-SMITH: The Envelope of Variation in Multidimensional Register and Genre Analyses Paul DEANE and Derrick HIGGINS: Using Singular-Value Decomposition on Local Word Contexts to Derive a Measure of Constructional Similarity Sebastian VAN DELDEN: Problematic Syntactic Patterns Mark DAVIES: Towards a Comprehensive Survey of Register-based Variation in Spanish Syntax Gregory GARRETSON and Mary Catherine O'CONNOR: Between the Humanist and the Modernist: Semi-automated Analysis of Linguistic Corpora Carson MAYNARD and Sheryl LEICHER: Pragmatic Annotation of an Academic Spoken Corpus for Pedagogical Purposes MarÃa José GarcÃa VIZCAÃNO: Using Oral Corpora in Contrastive Studies of Linguistic Politeness Corpu
E-Book Content
Corpus Linguistics Beyond the Word
LANGUAGE AND COMPUTERS: STUDIES IN PRACTICAL LINGUISTICS No 60 edited by Christian Mair Charles F. Meyer Nelleke Oostdijk
Corpus Linguistics Beyond the Word Corpus Research from Phrase to Discourse
Edited by
Eileen Fitzpatrick
Amsterdam - New York, NY 2007
Cover design: Pier Post Online access is included in print subscriptions: see www.rodopi.nl The paper on which this book is printed meets the requirements of "ISO 9706:1994, Information and documentation - Paper for documents Requirements for permanence". ISBN-10: 90-420-2135-7 ISBN-13: 978-90-420-2135-8 ©Editions Rodopi B.V., Amsterdam - New York, NY 2007 Printed in The Netherlands
Contents
Preface
iii
Analysis Tools and Corpus Annotation
A Syntactic Feature Counting Method for Selecting Machine Translation Training Corpora Leslie Barrett, David F. Greenberg, and Mark Schwartz
1
The Envelope of Variation in Multidimensional Register and Genre Analyses Angus B. Grieve-Smith
21
Using Singular-Value Decomposition on Local Word Contexts to Derive a Measure of Constructional Similarity Paul Deane and Derrick Higgins
43
Problematic Syntactic Patterns Sebastian van Delden
59
Towards a Comprehensive Survey of Register-based Variation in Spanish Syntax Mark Davies
73
Between the Humanist and the Modernist: Semi-automated Analysis of Linguistic Corpora Gregory Garretson and Mary Catherine O’Connor
87
Pragmatic Annotation of an Academic Spoken Corpus for Pedagogical Purposes Carson Maynard and Sheryl Leicher
107
Using Oral Corpora in Contrastive Studies of Linguistic Politeness María José García Vizcaíno
117
Corpus Applications: Pedagogy and Lingu