大型语言模型综述(英)-85页-WN7.pdf
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_05.gif)
《大型语言模型综述(英)-85页-WN7.pdf》由会员分享,可在线阅读,更多相关《大型语言模型综述(英)-85页-WN7.pdf(85页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、1A Survey of Large Language ModelsWayne Xin Zhao,Kun Zhou*,Junyi Li*,Tianyi Tang,Xiaolei Wang,Yupeng Hou,Yingqian Min,BeichenZhang,Junjie Zhang,Zican Dong,Yifan Du,Chen Yang,Yushuo Chen,Zhipeng Chen,Jinhao Jiang,Ruiyang Ren,Yifan Li,Xinyu Tang,Zikang Liu,Peiyu Liu,Jian-Yun Nie and Ji-Rong WenAbstrac
2、tEver since the Turing Test was proposed in the 1950s,humans have explored the mastering of language intelligenceby machine.Language is essentially a complex,intricate system of human expressions governed by grammatical rules.It poses asignificant challenge to develop capable artificial intelligence
3、(AI)algorithms for comprehending and grasping a language.As a majorapproach,language modeling has been widely studied for language understanding and generation in the past two decades,evolvingfrom statistical language models to neural language models.Recently,pre-trained language models(PLMs)have be
4、en proposed by pre-training Transformer models over large-scale corpora,showing strong capabilities in solving various natural language processing(NLP)tasks.Since the researchers have found that model scaling can lead to an improved model capacity,they further investigate the scalingeffect by increa
5、sing the parameter scale to an even larger size.Interestingly,when the parameter scale exceeds a certain level,theseenlarged language models not only achieve a significant performance improvement,but also exhibit some special abilities(e.g.,in-context learning)that are not present in small-scale lan
6、guage models(e.g.,BERT).To discriminate the language models in differentparameter scales,the research community has coined the term large language models(LLM)for the PLMs of significant size(e.g.,containing tens or hundreds of billions of parameters).Recently,the research on LLMs has been largely ad
7、vanced by both academiaand industry,and a remarkable progress is the launch of ChatGPT(a powerful AI chatbot developed based on LLMs),which hasattracted widespread attention from society.The technical evolution of LLMs has been making an important impact on the entire AIcommunity,which would revolut
8、ionize the way how we develop and use AI algorithms.Considering this rapid technical progress,in thissurvey,we review the recent advances of LLMs by introducing the background,key findings,and mainstream techniques.In particular,we focus on four major aspects of LLMs,namely pre-training,adaptation t
9、uning,utilization,and capacity evaluation.Furthermore,wealso summarize the available resources for developing LLMs and discuss the remaining issues for future directions.This survey providesan up-to-date review of the literature on LLMs,which can be a useful resource for both researchers and enginee
10、rs.Index TermsLarge Language Models;Emergent Abilities;Adaptation Tuning;Utilization;Alignment;Capacity Evaluation1INTRODUCTION“The limits of my language mean the limits of my world.”Ludwig WittgensteinLANGUAGEis a prominent ability in human beings toexpress and communicate,which develops in earlych
11、ildhood and evolves over a lifetime 1,2.Machines,however,cannot naturally grasp the abilities of understand-ing and communicating in the form of human language,unless equipped with powerful artificial intelligence(AI)algorithms.It has been a longstanding research challengeto achieve this goal,to ena
12、ble machines to read,write,andcommunicate like humans 3.Technically,language modeling(LM)is one of the majorapproaches to advancing language intelligence of machines.In general,LM aims to model the generative likelihoodof word sequences,so as to predict the probabilities offuture(or missing)tokens.T
13、he research of LM has receivedextensive attention in the literature,which can be dividedinto four major development stages:Statistical language models(SLM).SLMs 47 are de-Version:v11(major update on June 29,2023).GitHub link:https:/ and J.Li contribute equally to this work.The authors are mainly wit
14、h Gaoling School of Artifi cial Intelligence andSchool of Information,Renmin University of China,Beijing,China;Jian-Yun Nie is with DIRO,Universit e de Montr eal,Canada.Contact e-mail:veloped based on statistical learning methods that rose inthe 1990s.The basic idea is to build the word predictionmo
15、del based on the Markov assumption,e.g.,predicting thenext word based on the most recent context.The SLMs witha fixed context lengthnare also calledn-gram languagemodels,e.g.,bigram and trigram language models.SLMshave been widely applied to enhance task performancein information retrieval(IR)8,9 an
16、d natural languageprocessing(NLP)1012.However,they often suffer fromthe curse of dimensionality:it is difficult to accuratelyestimate high-order language models since an exponentialnumber of transition probabilities need to be estimated.Thus,specially designed smoothing strategies such as back-off e
17、stimation 13 and GoodTuring estimation 14 havebeen introduced to alleviate the data sparsity problem.Neural language models(NLM).NLMs 1517 character-ize the probability of word sequences by neural networks,e.g.,recurrent neural networks(RNNs).As a remarkablecontribution,the work in 15 introduced the
18、 concept ofdistributed representation of words and built the word predic-tion function conditioned on the aggregated context features(i.e.,the distributed word vectors).By extending the ideaof learning effective features for words or sentences,ageneral neural network approach was developed to builda
19、 unified solution for various NLP tasks 18.Further,word2vec 19,20 was proposed to build a simplified shal-low neural network for learning distributed word represen-tations,which were demonstrated to be very effective acrossarXiv:2303.18223v11 cs.CL 29 Jun 202327LPH*37%(57*377*37&RGH,QVWUXFW*37&KDW*3
20、7/D0$*377LPH7*37&RGH,QVWUXFW*37&KDW*37/D0$*37(a)Query=”Language Model”7LPH*37%(57*377*37&RGH,QVWUXFW*37&KDW*37/D0$*377LPH7*37&RGH,QVWUXFW*37&KDW*37/D0$*37(b)Query=”Large Language Model”Fig.1:The trends of the cumulative numbers of arXiv papers that contain the keyphrases“language model”(since June 2
21、018)and“large language model”(since October 2019),respectively.The statistics are calculated using exact match by queryingthe keyphrases in title or abstract by months.We set different x-axis ranges for the two keyphrases,because“languagemodels”have been explored at an earlier time.We label the poin
22、ts corresponding to important landmarks in the researchprogress of LLMs.A sharp increase occurs after the release of ChatGPT:the average number of published arXiv papersthat contain“large language model”in title or abstract goes from 0.40 per day to 8.58 per day(Figure 1(b).a variety of NLP tasks.Th
23、ese studies have initiated theuse of language models for representation learning(beyondword sequence modeling),having an important impact onthe field of NLP.Pre-trained language models(PLM).As an early at-tempt,ELMo 21 was proposed to capture context-awareword representations by first pre-training a
24、 bidirectionalLSTM(biLSTM)network(instead of learning fixed wordrepresentations)and then fine-tuning the biLSTM networkaccording to specific downstream tasks.Further,based onthe highly parallelizable Transformer architecture 22 withself-attention mechanisms,BERT 23 was proposed by pre-training bidir
25、ectional language models with specially de-signed pre-training tasks on large-scale unlabeled corpora.These pre-trained context-aware word representations arevery effective as general-purpose semantic features,whichhave largely raised the performance bar of NLP tasks.Thisstudy has inspired a large n
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 大型 语言 模型 综述 85 WN7
![提示](https://www.taowenge.com/images/bang_tan.gif)
限制150内