欢迎来到淘文阁 - 分享文档赚钱的网站! | 帮助中心 好文档才是您的得力助手!
淘文阁 - 分享文档赚钱的网站
全部分类
  • 研究报告>
  • 管理文献>
  • 标准材料>
  • 技术资料>
  • 教育专区>
  • 应用文书>
  • 生活休闲>
  • 考试试题>
  • pptx模板>
  • 工商注册>
  • 期刊短文>
  • 图片设计>
  • ImageVerifierCode 换一换

    corpus-introduction--section-1--语料库.ppt

    • 资源ID:34061040       资源大小:471KB        全文页数:47页
    • 资源格式: PPT        下载积分:20金币
    快捷下载 游客一键下载
    会员登录下载
    微信登录下载
    三方登录下载: 微信开放平台登录   QQ登录  
    二维码
    微信扫一扫登录
    下载资源需要20金币
    邮箱/手机:
    温馨提示:
    快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如填写123,账号就是123,密码也是123。
    支付方式: 支付宝    微信支付   
    验证码:   换一换

     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    corpus-introduction--section-1--语料库.ppt

    CL timetable 27/03 (Wed) 18:30-21:30E6-224 28/03 (Thu) 13:15-16:40E6-219 29/03 (Fri)14:05-17:30E6-219 03/04 (Wed) 18:30-21:30E6-224 07/04 (Fri)14:05-17:30E6-219 10/04 (Wed) 18:30-21:30E6-224 11/04 (Thu)13:15-16:40E6-219 12/04 (Fri)14:05-17:30E6-219Introducing Corpus LinguisticsCorpus LinguisticsRichard XModule description Since the 1990s, the corpus methodology has revolutionized nearly all branches of linguistics Corpus analysis can be illuminating in “virtually all branches of linguistics or language learning.” (Leech 1997) One of the strengths of corpus data lies in its empirical and attested nature pools together the intuitions of a great number of speakers makes linguistic analysis more objective This module introduces the theoretical and practical issues of using corpora in linguistic studies explores how the corpus-based approach and other methodologies can be combined in linguistic studiesAims of the module The module aims to provide an introduction to corpus linguistics; familiarise students with major corpus resources and tools; pass on essential knowledge and skills for building DIY corpora; to keep students up to date with the latest developments in corpus research; develop students ability in corpus-based language studies.Contents1)Introducing corpus linguistics2)Corpus design and types of corpora3)Data capture and markup4)Corpus annotation5)Making statistic claims6)Corpus analysis (1): concordance and wordlist7)Corpus analysis (2): keyword analysis8)Corpora in lexicographic and lexical studies9)Corpora in grammatical studies10) Corpora in diachronic studies11)Corpora in language variation research12)Corpora in sociolinguistic studies13)Corpora in language education14)Corpora in literary and stylistic studies15)Corpora in critical discourse analysis16)Corpora in contrastive and translation studiesLearning outcomesOn successful completion of the module, students will be able tounderstand the major theoretical frameworks in corpus linguistics and formulate research questions that are amenable to corpus research;think critically about the strengths and weaknesses of the corpus methodology and decide when and how to interface it with other methodologies;get familiar with major corpus resources and tools and to develop DIY corpora when necessary;apply the corpus-based approach in their own research.Teaching/learning strategies With a dual focus on why and how to in corpus-based language studies, this practical module will be delivered through a series of lectures and hands-on lab sessions The module also engages students in extensive reading and interaction with corpus data outside of classAssessment Option A A 1,000-word essay that critically reviews a corpus exploration tool or a corpus-based study (40%) A 2,500-word project report (60%) Option B One 3,500-word essay based on a research project of your own choice (100%) Deadline: Friday 31 May 2013 Submission A Word copy as email attachment Reading list Set text McEnery, A., Xiao, R. and Tono, Y. (2006) Corpus-Based Language Studies: An Advanced Resource Book. London & New York: Routledge. Wynne, M. (2005) Developing Linguistic Corpora. Oxford: Oxbow Books. Available online at http:/www.ahds.ac.uk/creating/guides/linguistic-corpora Recommended reading See the module syllabus at the course website www.lancs.ac.uk/fass/projects/corpus/ZJU/CL_syllabus.htm(pass for unzipping ebooks: lancs)Outline of this session Lecture: introducing key concepts and debates in corpus linguistics What is and is not a corpus? Why use corpora? Corpora vs. intuitions The corpus methodology A brief history of Corpus Linguistics Nature and applications of corpus-based studies Lab: testing your intuitions + exploring online resourcesWhat is a corpus? The word corpus comes from Latin (“body”) and the plural is corpora A corpus is a body of naturally occurring language but rarely a random collection of text Corpora “are generally assembled with particular purposes in mind, and are often assembled to be (informally speaking) representative of some language or text type.” (Leech 1992) “A corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety.” (MXT 2006: 5)What is not a corpus?A list of words is not a corpus Building blocks of languageA text archive is not a corpus A random collection of textsA collection of citations is not a corpus A short quotation which contains a word or phrase that is the reason for its selectionA collection of quotations is not a corpus A short selection from a text chosen on internal criteria by human beingsA text is not a corpus Intending to be read in different waysThe Web is not a corpus Its dimensions unknown, constantly changing, not designed from a linguistic perspectiveSinclair (2005)What is a corpus for? A corpus is made for the study of language in a broad sense To test existing linguistic theory and hypotheses To generate and verify new linguistic hypotheses Beyond linguistics, to provide textual evidence in text-based humanities and social sciences subjects The purpose is reflected in a well-designed corpusWhy use corpora? Even expert speakers have only a partial knowledge of a language A corpus can be more comprehensive and balanced Even expert speakers tend to notice the unusual and think of what is possible A corpus can show us what is common and typical Even expert speakers cannot quantify their knowledge of language A corpus can readily give us accurate statisticsWhy use corpora? Even expert speakers cannot remember everything they know A corpus can store and recall all the information that has been stored in it Even experts speakers cannot make up natural examples A corpus can provide us with a vast number of examples in real communication context Even expert speakers have prejudices and preferences and every language has cultural connotations and underlying ideology A corpus can give you more objective evidenceWhy use corpora? Even expert speakers are not always available to be consulted A corpus can be made permanently accessible to all Even expert speakers cannot keep up with language change A constantly updated corpus can reflect even recent changes in the language Even expert speakers lack authority: they can be challenged by other expert speakers A corpus can encompass the actual language use of many expert speakersIntuitions as an alternative Intuitions are always useful in linguistics To invent (grammatical, ungrammatical, or questionable) example sentences for linguistic analysis To make judgments about the acceptability / grammaticality or meaning of an expression To help with categorizationIntuitions as an alternative Intuitions should be applied with caution Possibly biased as they are likely to be influenced by ones dialect or sociolect Introspective data is artificial and may not represent typical language use as one is consciously monitoring ones language production Introspective data is decontextualized because it exists in the analysts mind rather than in any real linguistic context Intuitions are not observable and verifiable by everyone as corpora are Excessive reliance on intuitions blinds the analyst to the realities of language usage because we tend to notice the unusual but overlook the commonplace There are areas in linguistics where intuitions cannot be used reliably e.g. language variation, historical linguistics, register and style, first and second language acquisition Human beings have only the vaguest notion of the frequency of a construct or a wordBenefits of corpus data Corpus data is more reliable A corpus pools together linguistic intuitions of a range of language speakers, which offsets the potential biases in intuitions of individual speakers Corpus data is more natural It is used in real communications instead of being invented specifically for linguistic analysis Corpus data is contextualized Attested language use which has already occurred in real linguistic context Corpus data is quantitative Corpora can provide frequencies and statistics readily Corpus data can find differences that intuitions alone cannot perceive E.g. synonyms totally, absolutely, utterly, completely, entirelyCorpora vs. intuitions Not necessarily antagonistic, but rather corroborate each other and can be gainfully viewed as being complementary Armchair linguists and corpus linguists “need each other. Or better, the two kinds of linguists, wherever possible, should exist in the same body.” (Fillmore 1992) “Neither the corpus linguist of the 1950s, who rejected intuitions, nor the general linguist of the 1960s, who rejected corpus data, was able to achieve the interaction of data coverage and the insight that characterize the many successful corpus analyses of recent years.” (Leech 1991) The key to using corpus data is to find the balance between the use of corpus data and the use of ones intuitionsThe corpus methodology It is debatable whether CL is a methodology or a branch of linguistics CL goes well beyond this methodological role and has become an independent discipline In spite of the name, CL is indeed a methodology rather than an independent branch of linguistics in the same sense as phonetics, syntax, semantics or pragmatics These latter areas of linguistics describe, or explain, a certain aspect of language use Corpus linguistics, in contrast, is not restricted to a particular aspect of language - it can be employed to explore almost any area of linguistic researchA brief history of CL The term corpus linguistics first appeared only in the early 1980s, but corpus-based language study has a substantial history The history of CL can be split into two periods: before and after ChomskyA brief history of CL Before Chomsky Field linguists and linguists of the structuralist tradition used “shoebox corpora” shoeboxes filled with paper slips Their methodology was essentially “corpus-based” in the sense that it was empirical and based on observed data The work of early corpus linguistics was underpinned by two fundamental, yet flawed assumptions The sentences of a natural language are finite. The sentences of a natural language can be collected and enumerated. Most linguists saw the “corpus” as the only source of linguistic evidence in the formation of linguistic theoriesA brief history of CLChomsky revolution: Between 1957 and 1965 Chomsky changed the direction of linguistics from empiricism towards rationalism “Any natural corpus will be skewed. Some sentences wont occur because they are obvious, others because they are false, still others because they are impolite. The corpus, if natural, will be so wildly skewed that the description would be no more than a mere list.” (Chomsky 1962) Our internal knowledge of language in human brain (competence, I-language) replaces observed data (performance, E-language) Intuitions started to be relied on as evidenceXiao, R. (2008) “Theory-driven corpus research: using corpora to inform aspect theory”. In A. Ldeling & M. Kyto (eds.) Corpus Linguistics: An International Handbook. Berlin: Mouton de GruyterA brief history of CL Revival of CL Corpus research was continued in a few centres (Brown, Lancaster) in the 60s-70s The Brown University Standard Corpus of Present-day American English (Brown corpus) Lancaster-Oslo-Bergen Corpus of BrE (LOB) The hardware still imposed some restrictions until the real development started in the 1980s The marriage of corpora with computer technology rekindled interest in the corpus methodology Since then, the number and size of corpora and corpus-based studies have increased dramatically Nowadays, the corpus methodology enjoys widespread popularity, and has opened up or foregrounded many new areas of researchAreas that have used corpora Lexicography Lexical studies Grammatical studies Register/genre analysis Language variation Contrastive analysis Translation studies Language change Language teaching Semantics Pragmatics Stylistics Literary study Sociolinguistics Discourse analysis Forensic linguistics Computational linguistics Nature of corpus-based approach It is empirical, analysing the actual patterns of use from natural texts It utilises a large and principled collection of natural texts as the basis for analysis It makes extensive use of computers for analysis, using both automatic and interactive techniques It integrates both quantitative and qualitative analytical techniques(Biber et al 1998: 4-5)Why use computers? Development of computer technology has revived CL Machine-readability is a de facto attribute of modern corpora Electronic corpora have advantages unavailable to their “shoebox” ancestors It is the use of computerized corpora, together with computer programs which facilitate linguistic analysis, that distinguishes modern electronic corpora from early drawer-cum-slip corporaWhy use computers? Computerized corpora can be processed and manipulated rapidly at minimal cost E.g. searching, selecting, sorting and formatting Computers can process machine-readable data accurately and consistently Computers can avoid human bias in an analysis, thus making the result more reliable Machine-readability allows further automatic processing to be performed on the corpus so that corpus texts can be enriched with various metadata and linguistic analyses Corpus markup and corpus annotationA question for Deep Thought“Alright,” said the computer Deep Thought. “The Answer to the Great Question.” “Yes.!”“Of Life, the Universe and Everything .” said Deep Thought. “Yes.!”“Is.”“Yes.!.?” “Forty-two,” said Deep Thought, with infinite majesty and calm.It was a long time before anyone spoke. “Forty-two!” yelled someone in the audience. “Is that all youve got to show for seven and a half million years work?”“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that youve never actually known what the question is.” Hitchhikers Guide to the Galaxy by Douglas AdamsWhat can we learn from this story?What corpora cannot do Corpora do not provide negative evidence Cannot tell us what is possible or not possible Can show what is central and typical in language Corpora can yield findings but rarely provide explanations for what is observed Interfacing other methodologies The use of corpora as a methodology also defines the boundaries of any given study Importance of amenable research questions The findings based on a particular corpus only tell us what is true in that corpus Generalisation vs. representativeness See Unit B2 for pros and cons of corporaAsk corpora the right questions Corpus linguistics as a methodology is only one of the (many) ways of doing things “doing linguistics” The usefulness of corpora depends upon the research question being investigated “They are invaluable for doing what they do, and what they do not do must be done in another wa

    注意事项

    本文(corpus-introduction--section-1--语料库.ppt)为本站会员(豆****)主动上传,淘文阁 - 分享文档赚钱的网站仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知淘文阁 - 分享文档赚钱的网站(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    关于淘文阁 - 版权申诉 - 用户使用规则 - 积分规则 - 联系我们

    本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

    工信部备案号:黑ICP备15003705号 © 2020-2023 www.taowenge.com 淘文阁 

    收起
    展开