BloombergGPT:一个用于金融的大型语言模型.pdf
《BloombergGPT:一个用于金融的大型语言模型.pdf》由会员分享,可在线阅读,更多相关《BloombergGPT:一个用于金融的大型语言模型.pdf(66页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、BloombergGPT:A Large Language Model for FinanceShijie Wu1,OzanIrsoy1,Steven Lu1,Vadim Dabravolski1,Mark Dredze1,2,Sebastian Gehrmann1,Prabhanjan Kambadur1,David Rosenberg1,Gideon Mann11Bloomberg,New York,NY USA2Computer Science,Johns Hopkins University,Baltimore,MD USAAbstractThe use of NLP in the r
2、ealm of financial technology is broad and complex,with applicationsranging from sentiment analysis and named entity recognition to question answering.LargeLanguage Models(LLMs)have been shown to be effective on a variety of tasks;however,noLLM specialized for the financial domain has been reported i
3、n literature.In this work,wepresent BloombergGPT,a 50 billion parameter language model that is trained on a widerange of financial data.We construct a 363 billion token dataset based on Bloombergsextensive data sources,perhaps the largest domain-specific dataset yet,augmented with 345billion tokens
4、from general purpose datasets.We validate BloombergGPT on standardLLM benchmarks,open financial benchmarks,and a suite of internal benchmarks that mostaccurately reflect our intended usage.Our mixed dataset training leads to a model thatoutperforms existing models on financial tasks by significant m
5、argins without sacrificingperformance on general LLM benchmarks.Additionally,we explain our modeling choices,training process,and evaluation methodology.As a next step,we plan to release traininglogs(Chronicles)detailing our experience in training BloombergGPT.Contents1Introduction31.1BloombergGPT.3
6、1.2Broader Contributions.42Dataset52.1Financial Datasets(363B tokens 54.2%of training).72.1.1Web(298B tokens 42.01%of training).72.1.2News(38B tokens 5.31%of training).72.1.3Filings(14B tokens 2.04%of training).72.1.4Press(9B tokens 1.21%of training).82.1.5Bloomberg(5B tokens 0.70%of training).82.2P
7、ublic Datasets(345B tokens 48.73%of training).92.2.1The Pile(184B tokens 25.9%of training).92.2.2C4(138B tokens 19.48%of training).92.2.3Wikipedia(24B tokens 3.35%of training).92.3Tokenization.9.Co-first authors.1arXiv:2303.17564v1 cs.LG 30 Mar 20233Model113.1Architecture.113.2Model Scaling.123.3Tra
8、ining Configuration.133.4Large-scale Optimization.144Training Run155Evaluation165.1Few-shot Methodology.185.2Heldout Loss.185.3Financial Tasks.195.3.1External Financial Tasks.205.3.2Internal Task:Sentiment Analysis.225.3.3Exploratory Task:NER.235.4BIG-bench Hard.265.5Knowledge Assessments.265.6Readi
9、ng Comprehension.285.7Linguistic Tasks.295.8Summary.306Qualitative Samples317Related Work328Ethics,Limitations,and Implications378.1Ethical Use.378.2Openness.389Conclusion38A Architecture60A.0 Notation.60A.1 Full Architecture.60A.2 SelfAttention with ALiBi(SA).61A.3 LayerNorm(LN).62A.4 FeedForwardNe
10、twork(FFN).62A.5 List of All Trainable Parameters.63B Details on external financial tasks6421.IntroductionThe release of GPT-3 in 2020(Brown et al.,2020)demonstrated the powerful benefitsof training very large auto-regressive language models(LLMs).GPT-3 had 175 billionparameters,a hundredfold increa
11、se over the previous GPT-2 model,and did remarkablywell across a wide range of now popular LLM tasks,including reading comprehension,open-ended question answering,and code generation.This performance has been replicatedacross several other models(Chowdhery et al.,2022;Scao et al.,2022;Zhang et al.,2
12、022a).Furthermore,evidence suggests that large models exhibit emergent behaviors;growth allowsthem to acquire abilities not present in smaller models(Wei et al.,2022a).A notableexample of emergent behavior is the ability to perform tasks via few-shot prompting,where amodel can learn a task from just
13、 a few examples.This ability improves well-above random aswe increase the size of language models.Broadly speaking,few-shot prompting dramaticallyexpands the range of tasks supported by models and lowers the barrier to entry for usersseeking automation for new language tasks.After GPT-3,models grew
14、in size to 280 billion(Gopher,Rae et al.,2021),540 bil-lion(PaLM,Chowdhery et al.,2022),and 1 trillion parameters(Megatron,Korthikantiet al.,2022).Work also explored other important aspects of achieving a high-performingLLM,such as different training objectives(Tay et al.,2022b),multilingual models(
15、Scaoet al.,2022),more efficient and smaller models(Black et al.,2022),and finding data andparameter-efficient training sizes(Hoffmann et al.,2022).These efforts have almost exclusively focused on general LLMs,trained on datasets thatcover a broad range of topics and domains.While these have included
16、 some datasets forspecialized domains(e.g.,code(Chen et al.,2021a)or biomedical articles Gao et al.(2021)the focus has been on building LLMs with broad capabilities.Recent efforts training modelsusing only domain-specific data have yielded models that,while much smaller,beat generalpurpose LLMs on t
17、asks within those domains,such as science Taylor et al.(2022)andmedicine Bolton et al.(2023);Luo et al.(2022);Lehman et al.(2023).These findingsmotivate further development of models focused on specific domains.Financial Technology(FinTech)is a large and growing area with NLP technologieshaving an i
18、ncreasingly important role Xing et al.(2018);Fisher et al.(2016);Dredzeet al.(2016).Financial NLP tasks Shah et al.(2022)include sentiment analysis Araci(2019),named entity recognition Salinas Alvarado et al.(2015),news classification Sinhaand Khandait(2020),and question answering Chen et al.(2021b,
19、2022).While the range oftasks is similar to those found in general NLP benchmarks,the complexity and terminologyof the financial domain warrant a domain-specific system.For all of the reasons generativeLLMs are attractive in general few-shot learning,text generation,conversational systems,etc.it wou
20、ld be valuable to have a LLM focused on the financial domain.While thereare masked language models tuned for the financial domain Araci(2019),no LLM has beentuned for or evaluated on tasks for this domain.1.1 BloombergGPTWe train BloombergGPT,a 50 billion parameter language model that supports a wid
21、erange of tasks within the financial industry.Rather than building a general-purpose LLM,or a small LLM exclusively on domain-specific data,we take a mixed approach.General3models cover many domains,are able to perform at a high level across a wide variety of tasks,and obviate the need for specializ
22、ation during training time.However,results from existingdomain-specific models show that general models cannot replace them.At Bloomberg,wesupport a very large and diverse set of tasks,well served by a general model,but the vastmajority of our applications are within the financial domain,better serv
23、ed by a specificmodel.For that reason,we set out to build a model that achieves best-in-class results onfinancial benchmarks,while also maintaining competitive performance on general-purposeLLM benchmarks.We achieve this goal by constructing the largest domain-specific dataset yet,drawing onexisting
24、 data creation,collection,and curation resources at Bloomberg.As Bloomberg isprimarily a financial data company,our data analysts have collected and curated financiallanguage documents over the span of forty years.We have extensive archives of financialdata that cover a range of topics,with careful
25、tracking of data sources and usage rights.Weadd this data to public datasets to create a large training corpus with over 700 billion tokens.Using a portion of this training corpus,we train a BLOOM-style,50 billion parametermodel designed based on guidelines from Hoffmann et al.(2022)and Le Scao et a
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- BloombergGPT 一个 用于 金融 大型 语言 模型
限制150内