书签分享收藏举报版权申诉 / 5

立即下载

当前位置：首页 > 应用文书 > 解决方案 > 语料库语言学.docx

语料库语言学.docx

上传人：太**

文档编号：86749742

上传时间：2023-04-14

格式：DOCX

页数：5

大小：20.65KB

( 4.5 )

《语料库语言学.docx》由会员分享，可在线阅读，更多相关《语料库语言学.docx（5页珍藏版）》请在淘文阁 - 分享文档赚钱的网站上搜索。

1、语料库语言学维基百科语料库语言学(英语：corpus linguistics)是基于语言运用的实例(即语料库)的语言讨论。语料库语言学可以对自然语言进行语法与句法分析，还可以讨论它与其他语言的关系。语料库最初由手工完成，而现在主要是由计算机自动完成。语料库语言学家信任，牢靠的语言分析需建立在新奇的语料、自然的语言环境，和最小的试验干扰之上。在语料库语言学中，语料标注的意义众说纷纭，从约翰辛克莱主见最少量的标注，并允许文本“为自己说话，至英语用法调杳组”(设在伦敦高校学院)2鼓舞更多的标注，并认为它是通向更完备和严谨的语言理解的道路。名目1历史2方法3参考文献o3.1引用o3.2期刊o

2、3.3书籍4外部链接5参见历史编辑现代语料库语言学的一个里程碑是亨利库切拉和W.纳尔逊弗朗西斯在1967年出版的当代美语的计算分析(Computational Analysis of Present-Day American English)一书。该项工作基于对布朗语料库的分析，布朗语料库是一个细心编制的美国英语语料库，规模约有一百万词次。库切拉和弗朗西斯将这些语料用于各种计算分析，获得了丰富和多样化的成果，该成果结合了语言学、语言教、心理学、统计学、和社会学元素。另一关键出版物是 1960年伦道夫夸克的当代英语语法(Towards a description of English U

3、sage) 口】，在这本书中他介绍了英语用法调杳”项目(The Survey of English Usage) 此后不久，波士顿出版商霍顿米夫林邀请库切拉为其新的美国传统英语字典供应百万词次，三线引文的来进行词典编纂。美国传统英语字典创新地将规定性元素(应如何使用语言) 和描述性元素(语言实际上是如何被使用)结合在了一起。其他出版社纷纷效仿。英国出版商柯林斯COBUILD单语学习词典，就是为非英语母语者学习英语而出版的，它使用了“英语银行(Bank of English)语料库。“英语用法调查”语料库被用于由夸克等人编著的综合英语语法(A Comprehensive Grammar

4、of the English Language)中。布朗语料库也催生了类似的语料库：LOB语料库（Lancaster-Oslo-Bergen Corpus, 20世纪60年月英国英语），科尔哈帕（Kolhapur,印度英语），惠灵顿（Wellington,新西兰英语），澳大利亚英语语料库（Australian Corpus of English,澳大利亚英语），皱眉语料库（Frown Corpus, 20世纪90年月初，美国英语），以及FLOB语料库（FLOB Corpus, 20世纪90年月，英国英语）。其他语料库包括国际英语语料库（International Corpus of En

5、glish）,和英国我国语料库（Biitish National Corpus,收集了 1亿词次的口头和书面语料，在20世纪90年月时由出版商、牛津高校、兰卡斯特高校和大英图书馆创建）。至于说到当代的美国英语，现已有了美国我国语料库（英语：American National Corpus）,以及可以在线访问的4亿多词次的美国当代英语语料库（英语：Corpus of Contemporary American English, 1990 年仓犍）。第一个电脑转录口语语料库，建于1971年蒙特利尔法语项目（Montreal French Project）,【用有一亿词次，这一项目还启发了夏

6、娜帕普拉克建立了规模更大的渥太华-赫尔地区法语口语语料库（lang-en|Corpus of spoken French in the Ottawa-Hull area）。语料库除了收集现存语言，也收集古代语言。比如20世纪70年月建立的希伯来文圣经的安徒生福布斯数据库（英语：Andersen-Forbes database of the Hebrew Bible,数据库的每个子句的语法分析都使用了多达七级语构的图表,每一部分都标注了七个方面的信息。古兰经阿拉伯语语料库（英语：Quranic Arabic Corpus）是古典的阿拉伯文古兰经的标注语料库。它包含多层次的标注，包括形态

7、分割，词性标注，以及使用依存语法进行的句法分析。网方法编辑语料库语言学已经有了一大批讨论方法，这些讨论方法都试图找到从数据到理论的解决方案。瓦利斯和尼尔森最先介绍了他们的3A观点（英语：3A perspective）：注释（英语: Annotation）,抽象（英语：Abstraction）和分析（英语:Analysis） o 注释包括语料的数据库方案。注释可能包括结构标注，词性标注，句法分析和其他形式。抽象包括该方案在理论上的启发式模型或数据集中的翻译（映射）。抽象通常包括面对语言学家的定向搜寻，但也可能包括句法讨论者的句法规章学习。分析包括统计学探测，操纵和对数据集的归纳概括。

8、分析可能包括统计学评估，规章库优化和学问探究方法。如今大多数词汇语料库采纳词性标注（英语：part-of-speech-tagged）。然而，即使是采纳未标注语料的语料库语言学家也无疑会使用一些方法来从句子中隔离出他们感爱好的词。在这种状况下，注释和抽象在词汇搜素中结合起来了。发布标注语料库的优点是其他用户可以在语料库中进行讨论与试验。语言学家与其他相关人士就可以采用语料库来工作通过数据共享，语料库语言学家能将语料库视为语言研讨的核心，而不是学问的源泉。Corpus linguisticsFrom Wikipedia, the free encyclopediaCorpus lingu

9、istics is the study of language as expressed in samples (corpora) of “real world text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived

10、 by an automated process.Corpus linguistics adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference. Within corpus linguistics there are divergent views as to the value of corpus annotation, from John Si

11、nclair advocating minimal annotation and allowing texts to speak for themselves, to others, such as the Survey of English Usage team (based in University College, London) advocating annotation as a path to greater linguistic understanding and rigour.LinguisticsTheoretical CognitiveGenerative Quantit

12、ativeFunctional theories of grammar PhonologyMorphology*Morphophonology SyntaxLexis SemanticsPragmatics GraphcmicsOrthography SemioticsDescriptive AnthropologicalComparativeHistoricalEtymology Graphetics PhoneticsSociolinguisticsApplied and experimentalComputationalContrastiveEvolutionar ForensicInt

13、ernetLanguage acquisitionSecond-language acquisitionLanguage assessmentLanguage developmentLanguage educationLinguistic anthropologyNeurolinguisticsPsycholinguisticsRelated articlesHistory of linguisticsLinguistic prescriptionList of linguistsUnsolved linguistics problemsLinguistics portalContentshi

14、de1 History2 Methods3 See also4 Referenceso4.1 Journalso4.2 Book serieso4.3 Other5 External linksHistoryeditSome of the earliest efforts at grammatical description were based at least in part on corpora of particular religious or cultural significance. For example, Pratisakhya literature described t

15、he sound patterns of Sanskrit as found in the Vedas, and Paninis grammar ofclassical Sanskrit was based at least in part on analysis of that same corpus. Similarly, the early Arabic arammarians paid particular attention to the language of the Quran. In the Western European tradition, scholars prepar

16、ed concordances to allow detailed study of the language of the Bible and other canonical texts.A landmark in modern corpus linguistics was the publication by Henry Kucera and W. Nelson Francis of Computational Analysis of Present-Day American English in 1967, a work based on the analysis of the Brow

17、n Corpus, a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources. Kucera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics,

18、language teaching,psychology, statistics, and sociology. A further key publication was Randolph Quirks Towards a description of English Usage (I960)团 in which he introduced The Survey of English Usage.Shortly thereafter, Boston publisher Houghton-Mifflin approached Kucera to supply a million word, t

19、hree-line citation base for its new American Heritage Dictionary, the first dictionary to be compiled using corpus linguistics. The AHD took the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used).Other publishers fo

20、llowed suit. The British publisher Collins* COBUILD monolingual learners dictionary, designed for users learning English as a foreign language, was compiled using the Bank of English. The Survey of English Usage Corpus was used in the development of one of the most important Corpus-based Grammars, t

21、he Comprehensive Grammar of English (Quirk et al. 1985).图The Brown Corpus has also spawned a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (New Zealand English), Australian Corpus of English (Australian English), the Frown Corpu

22、s (early 1990s American English), and the FLOB Corpus (1990s British English). Other corpora represent many languages, varieties and modes, and include the International Corpus of English, and theBritish National Corpus, a 100 million word collection of a range of spoken and written texts, created i

23、n the 1990s by a consortium of publishers, universities (Oxford and Lancaster) and the British Library. For contemporary American English, work has stalled on the American National Corpus, but the 400+ million word Corpus of Contemporary American English (1990-present) is now available through a web

24、 interface.The first computerized corpus of transcribed spoken language was constructed in 1971 by the Montreal French Project,囱 containing one million words, which inspired Shana Poplacks much larger corpus of spoken French in the Ottawa-Hull area.Besides these corpora of living languages, computer

25、ized corpora have also been made of collections of texts in ancient languages. An example is the Andersen-Forbes database of the Hebrew Bible, developed since the 1970s, in which every clause is parsed using graphs representing up to seven levels of syntax, and every segment tagged with seven fields

26、 of information. The Quranic Arabic Corpus is an annotated corpus for the Classical Arabic language of the Quran. This is a recent project with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar.倒MethodseditCorp

27、us Linguistics has generated a number of research methods, attempting to trace a path from data to theory. Wallis and Nelson (2001)3 first introduced what they called the 3A perspective: Annotation, Abstraction and Analysis. Annotation consists of the application of a scheme to texts. Annotations ma

28、y include structural markup,part-of-speech tagging, parsing, and numerous other representations. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.

29、g., rule-learning for parsers. Analysis consists of statistically probing, manipulating and generalising from the dataset. Analysis might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods.Most lexical corpora today are part-of-speech-tagged (POS-tagged). Howe

30、ver even corpus linguists who work with unannotated plain text inevitably apply some method to isolate salient terms. In such situations annotation and abstraction are combined in a lexical search.The advantage of publishing an annotated corpus is that other users can then perform experiments on the

31、 corpus (through corpus managers), Linguists with other interests and differing perspectives than the originators can exploit this work. By sharing data, corpus linguists are able to treat the corpus as a locus of linguistic debate, rather than as an exhaustive fount of knowledge.Recent studies have suggested treatment outcome in adolescents with social anxiety disorder can also be assessed by analysing language by means of Corpus Linguistics 也

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

15 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 语料库语言学

淘文阁 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：语料库语言学.docx
链接地址：https://www.taowenge.com/p-86749742.html