LLMs之ERNIE 3.0ERNIE 3.0 Titan:《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language|电子爱好者

admin管理员组
文章数量:1564179

LLMs之ERNIE 3.0/ERNIE 3.0 Titan:《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读
《ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读

导读：ERNIE 3.0框架可用于预训练处理语言理解和生成任务的知识增强大模型。它在40亿字语料库上进行训练，可用于零样本学习、少量学习和微调。多项任务实验结果证明ERNIE 3.0的有效性。基于ERNIE 3.0框架预训练了拥有2600亿参数的知识增强语言模型ERNIE 3.0 Titan。验证结果表明它取得了新领域的效果。此外，我们提出了一种新的方法来控制生成结果以及使结果与现实世界相一致。我们还设计了在线蒸馏框架，并进行了不同规模的蒸馏模型，考虑到大规模预训练模型的计算开销。

>> 实现NLU+NLG任务：ERNIE 3.0是百度发布的知识增强的预训练大模型，参数规模为10B。ERNIE 实现了兼顾自然语言理解和自然语言生成的统一预训练框架，使得经过训练的模型可以通过零样本学习、少样本学习或微调轻松地针对自然语言理解和生成任务进行定制。

>> ERNIE 3.0 Titan参数是GPT-3的1.5倍：ERNIE 3.0 Titan 是百度与鹏城实验室发布的目前为止全球最大的中文单体模型，它是ERNIE 3.0的扩大和升级，模型参数规模达到 260B，相对GPT-3的参数量提升50%。

>> 生成可信和可控的文本=自监督的对抗性损失+可控的语言建模损失：ERNIE 3.0 Titan 在预训练阶段还设计了一个自监督的对抗性损失和一个可控的语言建模损失，使 ERNIE 3.0 Titan 生成可信和可控的文本（Credible and Controllable Generations）。

>> 在线蒸馏框架来降低计算开销：为了减少计算开销，ERNIE 3.0 Titan 提出了一个在线蒸馏框架，教师模型将同时教授学生模型和训练自己以更高效地利用计算资源。ERNIE 3.0 Titan 在 68 个 NLP 数据集上的表现优于最先进的模型。

《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读

Abstract摘要

《ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读

Abstract摘要

《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读

地址	ERNIE 3.0：https://arxiv/abs/2107.02137
时间	2021年7月5日
作者	百度研究团队

Abstract摘要

Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%).

预训练模型在各种自然语言处理（NLP）任务中取得了最先进的成果。最近的作品，如T5和GPT-3，表明扩大预训练语言模型可以提高它们的泛化能力。特别是，拥有1750亿参数的GPT-3模型展示了其强大的任务无关的零样本/少样本学习能力。尽管取得了成功，但这些大规模模型是在纯文本上进行训练的，没有引入语言知识和世界知识等知识。此外，大多数大规模模型都是以自回归方式进行训练。因此，这种传统的微调方法在解决下游语言理解任务时表现相对较弱。为了解决上述问题，我们提出了一个名为ERNIE 3.0的统一框架，用于预训练大规模知识增强模型。它融合了自回归网络和自编码网络，使得训练的模型可以轻松适应自然语言理解和生成任务，实现零样本学习、少样本学习或微调。我们使用了100亿参数在一个由纯文本和大规模知识图构成的4TB语料库上训练了该模型。实证结果显示，该模型在54个中文NLP任务中胜过了最先进的模型，而其英文版本在SuperGLUE基准测试中排名第一（2021年7月3日），超过人类表现0.8%（90.6% vs. 89.8%）。

6 Conclusion

we proposed the ERNIE 3.0 framework to pre-train a knowledge enhanced 10-billion parameter model on a 4TB corpus including plain texts and a knowledge graph. In order to handle both language understanding and generation tasks with zero-shot learning, few-shot learning and fine-tuning, ERNIE 3.0 designs a unified pre-training framework that integrates both auto-encoder networks and auto-regressive networks. We construct extensive experiments on various datasets from different task paradigms and fields, and the results demonstrate the effectiveness of ERNIE 3.0 as compared to the previous state-of-the-art pre-trained models.

6 结论

我们提出了ERNIE 3.0框架，使用一个4TB的语料库（包括纯文本和知识图）对一个具备100亿参数的知识增强模型进行了预训练。为了处理零样本学习、少样本学习和微调等语言理解和生成任务，ERNIE 3.0设计了一个统一的预训练框架，将自编码器网络和自回归网络相结合。我们在不同任务范例和领域的各种数据集上进行了大量实验，结果与之前最先进的预训练模型相比，证明了ERNIE 3.0的有效性。

《ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读

地址	ERNIE 3.0 Titan：https://arxiv/abs/2112.12731
时间	2021年12月23日
作者	百度研究团队

Abstract摘要

Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. Furthermore, we design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts. To reduce the computation overhead and carbon emission, we propose an online distillation framework for ERNIE 3.0 Titan, where the teacher model will teach students and train itself simultaneously. ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets.

预训练语言模型在各种自然语言处理（NLP）任务中取得了最先进的结果。GPT-3展示了通过扩大预训练语言模型可以进一步发挥其巨大潜力。最近提出了一个名为ERNIE 3.0的统一框架，用于预训练大规模知识增强模型，并训练了一个拥有100亿参数的模型。ERNIE 3.0在各种NLP任务中表现优于最先进的模型。为了探索扩大ERNIE 3.0的性能，我们在PaddlePaddle平台上训练了一个拥有2600亿参数的名为ERNIE 3.0 Titan的百亿参数模型。此外，我们设计了一个自监督对抗损失和可控的语言建模损失，使ERNIE 3.0 Titan能够生成可信且可控的文本。为了减少计算开销和碳排放，我们提出了一个在线蒸馏框架，用于ERNIE 3.0 Titan，其中教师模型将同时教导学生并自我训练。ERNIE 3.0 Titan是迄今为止最大的中文密集预训练模型。实证结果显示，ERNIE 3.0 Titan在68个NLP数据集上胜过了最先进的模型。

7 Conclusion

We pre-train a knowledge-enhanced language model with 260 billion parameters named ERNIE 3.0 Titan based on the ERNIE 3.0 framework. It is the largest Chinese dense pre-training model as far as we know. We have validated it on 68 datasets, and the results show that ERNIE 3.0 Titan achieves new state-of-the-art results. In addition, We propose a novel method for users to control the generation result and obtain the result factually consistent with the real world. We also devise an online distillation framework and conduct several distilled models of different sizes concerning the

computation overhead of large-scale pre-training models. In the next stage, we will continually update ERNIE 3.0 Titan with more data to further explore the limit of the performance of large-scale pre-trained language models. We will also endeavor to explore the potential of knowledge-enhanced large-scale multi-modal models for more and various tasks.

7 结论

基于ERNIE 3.0框架，我们预训练了一个拥有2600亿参数的知识增强语言模型，命名为ERNIE 3.0 Titan。据我们所知，这是迄今为止最大的中文密集预训练模型。我们在68个数据集上进行了验证，结果显示ERNIE 3.0 Titan取得了最新的最先进结果。此外，我们提出了一种新颖的方法，使用户可以控制生成结果，并获得与真实世界一致的结果。我们还设计了一个在线蒸馏框架，并根据大规模预训练模型的计算开销制定了几种不同规模的蒸馏模型。在接下来的阶段，我们将持续使用更多数据更新ERNIE 3.0 Titan，进一步探索大规模预训练语言模型性能的极限。我们还将努力探索知识增强的大规模多模态模型在更多和不同任务中的潜力。

本文标签： Titan Large LLMs ERNIE Scale

版权声明：本文标题：LLMs之ERNIE 3.0ERNIE 3.0 Titan:《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/dongtai/1727482029a1116930.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

LLMs之ERNIE 3.0ERNIE 3.0 Titan:《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language

《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读

Abstract摘要

《ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读

Abstract摘要

更多相关文章

The Rise and Potential of Large Language Model Based Agents: A Survey

基于预训练模型 ERNIE 实现语义匹配

ERNIE源码学习与实践：为超越ChatGPT打下技术基础！

【模型精调LoRA】LoRA 低秩适应微调的工作原理和代码实现示例 What is LoRA? Low-Rank Adaptation for finetuning LLMs EXPLAINED

Nginx出现413 Request Entity Too Large问题的解决方法

LLMs：《BLOOM: A 176B-Parameter Open-Access Multilingual Language Model》翻译与解读

LLMs之GLM-130BChatGLM-1：《GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL》翻译与解读

论文解读 | 百度 ERNIE: Enhanced Representation through Knowledge Integration

论文阅读 【CVPR-2022】 A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for

Large Language Models on Graphs: A Comprehensive Survey

GOT-10k: A Large High-Diversity Benchmark forGeneric Object Tracking in the Wild（论文翻译）

LLMs：《Efficient And Effective Text Encoding For Chinese Llama And Alpaca—6月15日版本》翻译与解读

LLMs模型速览（GPTs、LaMDA、GLMChatGLM、PaLMFlan-PaLM、BLOOM、LLaMA、Alpaca）

LLMs之GopherChinchilla：《Training Compute-Optimal Large Language Models》的翻译与解读

Build a Large Language Model (From Scratch)GPT-4o翻译和代码每行中文注释Ch 1

解锁LLMs的“思考”能力：Chain-of-Thought(CoT) 技术推动复杂推理的新发展

LLMs之Koala：《Koala: A Dialogue Model for Academic Research一款针对学术研究的对话模型》翻译与解读

AssertionError: Attempted unscale_ but _scale is None

CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health QA

LLMs之ERNIE 3.0ERNIE 3.0 Titan:《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language

发表评论

推荐文章

android 使用应用市场进行版本更新

手机浏览器调试方案--Debugging Firefox for Android with WebIDE

被各种手机浏览器的推荐烦死，推荐一种干净、有效的浏览器操作

linux 挂载硬盘_Linux系列教程（十八）——Linux文件系统管理之文件系统常用命令...

网心云x86简易版 v1.0.0.17

热门文章

windows本地电脑连接阿里云Windows server2012 服务器

操作系统史上最全学习笔记

手机浏览器开发初步调研

讯飞输入法iOS新版让iPhone Xs Max大得有道理

win10蓝牙鼠标不稳定解决办法

linux磁盘刻录ISO工具,技术|Ubuntu下的三个好用的CDDVD刻录工具

打开游戏缺少dll文件，分享5种解决dll丢失的方法

华为云电脑怎么玩云顶之弈_云电脑可以玩云顶之弈吗？主机上的那个游戏

爬虫:JS逆向前置准备

零基础入门chrome浏览器扩展插件开发教程

最新文章

Android 10.0 app获取当前已连接wifi列表ssid和密码功能实现

再一次获取你的WIFI密码（fluxion附视频）

分分钟搞定python破解无线wifi

记一次老手机连接Wifi显示已保存，却不真正连接

安卓手机WiFi信号桥，增强版个人热点，wifi中继（第三方软件设置）root权限设置增强版个人热点名称密码

修改家中的WiFi密码

越狱iPhone手机使用openSSH wifi和usb连接mac电脑再免密码登录再用shell脚本执行教程

android 手机wifi重启,android – 如何通过重启来记住wifi配置和连接网络

android wifi名称修改器,360随身WIFI(SSID)名称修改工具v1.5.0

教你如何查看连接过的wifi密码

[MT8766][Android12] 修改WIFI热点默认名称、密码、IP地址以及默认开启热点

真正的手机破解wifi密码，aircrack-ng,reaver,仅限mx2（BCM4330芯片）

kali linux破解wifi密码-超详细过程

MAC系统下破解WIFI密码

Android Wifi热点通信，及Android7.0上修改手机连接wifi方法，和其他大神提供的方法稍作修改

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

论文阅读【CVPR-2022】 A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载