华为诺亚方舟实验室首席科学家刘群教授谈ChatGPT技术-83页-WN6.pdf
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_1.gif)
![资源得分’ title=](/images/score_05.gif)
《华为诺亚方舟实验室首席科学家刘群教授谈ChatGPT技术-83页-WN6.pdf》由会员分享,可在线阅读,更多相关《华为诺亚方舟实验室首席科学家刘群教授谈ChatGPT技术-83页-WN6.pdf(81页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、ChatGPT技术分析刘群 LIU Qun华为诺亚方舟实验室 Huawei Noahs Ark Lab在线讲座(an online lecture)2023-02-16ChatGPT概览ChatGPT的出色表现ChatGPT的关键技术ChatGPT的不足之处ChatGPT未来发展方向ContentChatGPT概览ChatGPT的出色表现ChatGPT的关键技术ChatGPT的不足之处ChatGPT未来发展方向ContentChatGPT轰动效应用户数:5天100万,2个月达到1亿所有人都开始讨论ChatGPT,传播速度堪比新冠病毒Google内部拉响红色警报Google紧急仅仅发布Bard,
2、但因发布现场出现错误导致股票蒸发8%微软追加投资OpenAI一百亿美元微软迅速推出加载了ChatGPT的New Bing,并计划将ChatGPT接入Office套件国内外大厂迅速跟进1 total:40ChatGPT官方博客:简介TryChatGPT Learn moreAPIRESEARCHABOUTBLOGChatGPT:OptimizingLanguage Modelsfor DialogueWeve trained a model called ChatGPT which interacts in aconversational way.The dialogue format make
3、s it possible forChatGPT to answer followup questions,admit its mistakes,challenge incorrect premises,and reject inappropriate requests.ChatGPT is a sibling model to InstructGPT,which is trained tofollow an instruction in a prompt and provide adetailedresponse.November 30,202213 minute readWe are ex
4、cited to introduce ChatGPT to get users feedback and learn about its strengths and weaknesses.During the researchpreview,usage of ChatGPT is free.Try it now at .SamplesTRY CHATGPTIn the following sample,ChatGPT asks clarifying questions to debug code.(?/?)USERthis code is not working like i expect h
5、ow do i fixit?resultWorkerErr:=make(chan error)defer close(resultWorkerErr)go func()defer cancel()resultWorkerErr-b.resultWorker(ctx)We are excited to introduce ChatGPT to get users feedback and learn about its strengths and weaknesses.During the researchpreview,usage of ChatGPT is free.Try it now a
6、t .SamplesTRY CHATGPTIn the following sample,ChatGPT asks clarifying questions to debug code.(?/?)USERthis code is not working like i expect how do i fixit?resultWorkerErr:=make(chan error)defer close(resultWorkerErr)go func()defer cancel()resultWorkerErr 0PLM(s)=110(1)total:40语言模型定义Language Modelin
7、g is the task of predicting what word comes next.the students opened their _More formally:given a sequence of words ,compute the probability distribution of the next word :where can be any word in the vocabularyA system that does this is called a Language Model.Language Modelingexamsmindslaptopsbook
8、s15Christopher Manning,Natural Language Processing with Deep Learning,Standford U.CS224n10(2)total:40语言模型的发展n元语言模型神经网络语言模型循环神经网络语言模型Transformer语言模型预训练语言模型(Pre-trained Language Models,PLMs)BERT:双向掩码语言模型GPT:纯解码器语言模型大型生成式预训练语言模型(Large Language Models,LLMs)GPT-3ChatGPT11 total:40预训练语言模型(Pre-trained Lang
9、uage Models,PLMs)典型代表:ELMo,BERT,GPTPre-training-then-fine-tuning范式将在pre-training阶段学习到的语言表示迁移到下游任务12 total:40Transformer模型Liliang Wen,Generalized Language Models:Ulmfit&OpenAI GPT(blog)13 total:40自注意力机制(self-attention)(Vaswani et al.,2017)14(1)total:40自注意力机制(self-attention)每个token是通过所有词动态加权得到动态权重会随着输
10、入的改变而变化(BertViz tool,Vig et al.,2019)14(2)total:40ChatGPT的关键技术预训练语言模型(Pre-trained Language Models,PLMs)大型生成式预训练语言模型(Large Language Models,LLMs)人类反馈强化学习(RLHF)Content大型生成式预训练语言模型(LLM)预训练语言模型大型生成式预训练语言模型Pre-trained LanguageModels,PLMsLargeLanguageModels,LLMs典型模型ELMo,BERT,GPT-2GPT-3模型结构BiLSTM,Transforme
11、rTransformer注意力机制双向、单向单向训练方式Mask&PredictAutoregressive Generation擅长任务类型理解生成模型规模1-10亿参数10-x1000亿参数下游任务应用方式Fine-tuningFine-tuning&Prompting涌现能力小数据领域迁移Zero/Few-shot Learning,In-context Learning,Chain-of-Thought15 total:40GPT-3简介GPT-3(Generative Pre-trained Transformer 3)是一个自回归语言模型,目的是为了使用深度学习生成人类可以理解的自
12、然语言。GPT-3是由在旧金山的人工智能公司OpenAI训练与开发,模型设计基于谷歌开发的变换语言模型。GPT-3的神经网络包含1750亿个参数,在发布时为参数最多的神经网络模型。OpenAI于2020年5月发表GPT-3的论文,在次月为少量公司与开发团队发布应用程序界面的测试版。微软在2020年9月22日宣布取得了GPT-3的独家授权。16 total:40GPT-3模型家族ELMo:93M params,2-layer biLSTMBERT-base:110M params,12-layer TransformerBERT-large:340M params,24-layer Transf
13、ormerThe language model“scaling wars”!Mohit Iyyer,slides for CS685 Fall 2020,University of Massachusetts Amherst17 total:40GPT-3数据来源DatasetTokens(billion)AssumptionsTokens per byte(Tokens/bytes)RatioSize(GB)Web dataWebText2Books1Books2Wikipedia410B19B12B55B3B25%WebTextGutenbergBibliotikSee RoBERTa0.
14、710.380.570.540.261:1.91:2.61:1.751:1.841:3.8570502110111.4Total499B753.4GBTable.GPT-3 Datasets.Disclosed in bold.Determined in italics.Alan D.Thompson,GPT-3.5+ChatGPT:An illustrated overview,https:/lifearchitect.ai/chatgpt/18(1)total:40GPT-3数据来源数据来源:跟其他大规模语言模型的对比18(2)total:40GPT-3训练数据量看一下大语言模型训练的to
15、ken数量:GPT-3(2020.5)是500B(5000亿),目前最新数据为止;Google的PaLM(2022.4)是780B;DeepMind的Chinchilla是1400B;Pangu-公布了训练的token数,约为40B,不到GPT-3的十分之一;国内其他的大模型都没有公布训练的token数。19(1)total:40GPT-3训练数据量ELMo:1B training tokensBERT:3.3B training tokensRoBERTa:30B training tokensThe language model“scaling wars”!Mohit Iyyer,slid
16、es for CS685 Fall 2020,University of Massachusetts Amherst19(2)total:40GPT-3算力消耗The language model“scaling wars”!Log scale!Mohit Iyyer,slides for CS685 Fall 2020,University of Massachusetts Amherst20 total:40Few-shot and zero-shot learning(in-context learning)Brown et al.,Language Models are Few-Sho
17、t Learners,arXiv:2005.14165,202121(1)total:40Few-shot and zero-shot learning(in-context learning)Brown et al.,Language Models are Few-Shot Learners,arXiv:2005.14165,202121(2)total:40Chain-of-thoughtPreprint:https:/arxiv.org/pdf/2201.11903.pdf22 total:40Magic word:Lets think step-by-step(c)Zero-shotQ
18、:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.How many blue golf balls are there?A:The answer(arabic numerals)is(Output)8 X(d)Zero-shot-CoT(Ours)Q:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.
19、How many blue golf balls are there?A:Lets think step by step.(Output)There are 16 balls in total.Half of the balls are golf balls.That means that there are 8 golf balls.Half of the golf balls are blue.That means that there are 4 blue golf balls.Q:Roger has 5 tennis balls.He buys 2 more cans of tenni
20、s balls.Each can has 3 tennis balls.How many tennis balls does he have now?A:Roger started with 5 balls.2 cans of 3 tennis balls each is 6 tennis balls.5+6=11.The answer is 11.Q:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.How many blue golf ball
21、s are there?A:(Output)The juggler can juggle 16 balls.Half of the balls are golf balls.So there are 16/2=8 golf balls.Half of the golf balls are blue.So there are 8/2=4 blue golf balls.The answer is 4.(b)Few-shot-CoT(a)Few-shotQ:Roger has 5 tennis balls.He buys 2 more cans of tennis balls.Each can h
22、as 3 tennis balls.How many tennis balls does he have now?A:The answer is 11.Q:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.How many blue golf balls are there?A:(Output)The answer is 8.XFigure 1:Example inputs and outputs of GPT-3 with(a)standard
23、Few-shot(Brown et al.,2020),(b)Few-shot-CoT(Wei et al.,2022),(c)standard Zero-shot,and(d)ours(Zero-shot-CoT).Similar toFew-shot-CoT,Zero-shot-CoT facilitates multi-step reasoning(blue text)and reach correct answerwhere standard prompting fails.Unlike Few-shot-CoT using step-by-step reasoning example
24、spertask,ours does not need any examples and just uses the same prompt“Lets think step by step”acrossall tasks(arithmetic,symbolic,commonsense,and other logical reasoning tasks).In contrast to the excellent performance of LLMs in intuitive and single-step system-1 Stanovichand West,2000 tasks with t
25、ask-specific few-shot or zero-shot prompting Liu et al.,2021b,evenlanguage models at the scale of 100B or more parameters had struggled on system-2 tasks requiringslow and multi-step reasoning Rae et al.,2021.To address this shortcoming,Wei et al.2022,Wang et al.2022 have proposed chain of thought p
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 华为 诺亚方舟 实验室 首席 科学家 教授 ChatGPT 技术 83 WN6
![提示](https://www.taowenge.com/images/bang_tan.gif)
限制150内