语料库研究基本方法.pptx
主要内容语料库语言学的性质语料库语言学的性质几个常用术语几个常用术语语料库研究的基本方法语料库研究的基本方法第1页/共26页语料库语言学的性质1理性主义与经验主义理性主义与经验主义Rationalism:I think therefore I am.Empiricism:My mind is a blank slate.Seeing is believing.第2页/共26页语料库语言学的性质1the Wax Argument:He considers a piece of wax;his senses inform him that it has certain characteristics,such as shape,texture,size,color,smell,and so forth.When he brings the wax towards a flame,these characteristics change completely.However,it seems that it is still the same thing:it is still a piece of wax,even though the data of the senses inform him that all of its characteristics are different.第3页/共26页语料库语言学的性质the Wax Argument:Therefore,in order to properly grasp the nature of the wax,he cannot use the senses.He must use his mind.Descartes concludes:“And so something which I thought I was seeing with my eyes is in fact grasped solely by the faculty of judgment which is in my mind.1第4页/共26页语料库语言学的性质Empiricism:Empiricism emphasizes those aspects of scientific knowledge that are closely related to evidence,especially as discovered in experiments.It is a fundamental part of the scientific method that all hypotheses and theories must be tested against observations of the natural world,rather than resting solely on reasoning and intuition.1第5页/共26页语料库语言学的性质Science is considered to be methodologically empirical in nature.Corpus linguistics is empirical in nature.1第6页/共26页语料库语言学的性质语言研究中的数据类型语言研究中的数据类型内省数据(内省数据(introspective data):rationalism实验数据(实验数据(experimental data):empiricism真实数据(真实数据(anthentic data):empricism1第7页/共26页语料库语言学的性质语料库语言学提倡真实数据语料库语言学提倡真实数据我们不排斥其他数据类型我们不排斥其他数据类型1第8页/共26页语料库语言学的性质即便在语料库语言学阵营之中即便在语料库语言学阵营之中Corpus-driven:minimum theory-reliance.Exclusive reliance on corpus data for all theoriesCorpus-based:Reliance on corpus data for hypothesis-testingCorpus-referenced/informed:Occasionally resorting to corpus data for illustrations 1第9页/共26页语料库语言学的性质我们坚决反对不顾语言事实的任何论断我们坚决反对不顾语言事实的任何论断No introspection can claim credence without verification through real language data(Teubert 2005).1第10页/共26页几个常用术语2CorpusCorpus linguistics第11页/共26页几个常用术语Token,type,lemmaThe little boy looked at the other boys.2第12页/共26页几个常用术语Collocation is defined as a sequence of words which co-occur more often than would be expected by chance.a big smoker a strong smoker a hard smoker a heavy smoker a furious smoker 2第13页/共26页几个常用术语It is quite possible,in fact,to describe a woman as handsome.However,this implies that she is not beautiful at all in the traditional sense of female beauty,but rather that she is mature in age,has large features and a certain strength of character.Similarly,a man could be described as beautiful,but this would usually imply that he had feminine features.2第14页/共26页几个常用术语Colligation is defined as a sequence of grammatical categories which co-occur more often than would be expected by chance.2第15页/共26页几个常用术语Semantic prosody is instantiated when a word such as CAUSE co-occurs regularly with words that share a given meaning or meanings,and then acquires some of the meaning(s)of those words as a result.This acquired meaning is known as semantic prosody.(Stewart 2010)2第16页/共26页语料库研究的基本方法3Corpus-based approach:a hypothesis-testing approachCorpus-driven approach:with as“few preconceived ideas”as possible,“keeping the amount of theory-reliance to a minimum in order not to hinder the process of discovering new phenomena”(Rmer 2005)第17页/共26页语料库研究的基本方法Both approaches almost always involve a comparion of some kind.3第18页/共26页语料库研究的基本方法Sizes of corpora in comparison(Rayson 2003)Small bigEqual sizes3第19页/共26页语料库研究的基本方法Types of comparisonAcross genresAcross usersAcross different timesAcross(varieties of)language(s)3第20页/共26页语料库研究的基本方法Corpus comparability3第21页/共26页语料库研究的基本方法Linguistic features in corpus comparisonLexicalLexico-grammaticalSyntacticDiscoursal3第22页/共26页语料库研究的基本方法Statistic tests in corpus comparisonSimple:Relationship(correlation,etc)Difference(chi-square,loglikelihood,etc.)Complicated:regression analysis,factor analysis,cluster analysis,correspondence analysis3第23页/共26页语料库研究的基本方法语语料料库库研究问题研究问题研究设计研究设计软件软件统计检验统计检验结结论论?参参照照语语料料库库对比对比结果:结果:词汇词汇短语短语搭配搭配语义韵语义韵类联接类联接句式句式等等数据呈现数据呈现数据分析、解释与讨论数据分析、解释与讨论3第24页/共26页内容55Thank you.第25页/共26页感谢您的观看!第26页/共26页