openai-gpt_GPT-3与Rasa聊天机器人

编程知识行业动态更新时间:2024-06-13 00:18:26

openai-gpt

In 1829, an event took place that unleashed a technological revolution. At the Rainhill Trials a group of steam locomotives squared off to determine which one could win a series of tests of speed, strength and reliability. The winning machine, Rocket, not only blew away its competition at the trials, it also set the direction for steam locomotive development for the following century.

1829年，发生了一场引发技术革命的事件。在Rainhill试验中，一组蒸汽机车排成一列，以确定哪个可以赢得一系列的速度，强度和可靠性测试。获奖的机器Rocket不仅在试验中击败了竞争对手，还为下个世纪的蒸汽机车发展指明了方向。

Stephenson’s Rocket (Shutterstock)

斯蒂芬森的火箭(Shutterstock)

What does all this have to do with GPT-3, the transformer language model that OpenAI made available in a limited beta starting in June? Some reviewers have heralded GPT-3 as the first glimpse of artificial general intelligence, while others are calling it a massive lookup table. I don’t think GPT-3 is AGI, but I do think the approach used in GPT-3 will be transformative. Using massive computing power and a huge training suite, OpenAI has created a model that is genuinely general-purpose. By drawing comparisons to the dawn of the railway age we can put GPT-3 in context and see its impact more clearly.

这一切与GPT-3有什么关系？GPT-3是OpenAI于6月开始在限定版中提供的转换语言模型？一些评论家将GPT-3视为人工通用情报的第一眼，而另一些评论家则将其称为庞大的查找表。我不认为GPT-3是AGI，但我确实认为GPT-3中使用的方法将具有变革性。利用强大的计算能力和庞大的培训套件，OpenAI创建了一个真正通用的模型。通过与铁路时代的来临进行比较，我们可以将GPT-3置于背景之中，并更加清楚地看到其影响。

In this article I will describe a simple test that I did to compare GPT-3’s performance with a Rasa chatbot. This test was certainly no Rainhill Trials, but I think the results do shed some light on the role that massive transformer models like GPT-3 will play in the future. I will argue that GPT-3 isn’t the AI equivalent of Rocket, but it just might play the role of the locomotives designed by Richard Trevithick in the decades prior to the Rainhill Trials. Trevithick’s machines were slow and so heavy that they destroyed their tracks. However, despite their flaws, these locomotives had the right essential ingredients and they paved the way for the world-changing success of Rocket.

在本文中，我将描述一个简单的测试，该测试用于比较GPT-3和Rasa聊天机器人的性能。该测试当然不是Rainhill试验，但我认为结果确实为将来像GPT-3这样的大型变压器模型将发挥的作用提供了一些启示。我会争辩说GPT-3并不等同于Rocket的AI，但它可能只是在Rainhill试验之前的几十年里，Richard Trevithick设计的机车的作用。特雷维西克(Trevithick)的机器很慢而且很重，以至于摧毁了他们的足迹。然而，尽管它们有缺陷，但这些机车具有正确的基本要素，并且为火箭改变世界的成功铺平了道路。

photo by author

作者照片

GPT-3与Rasa聊天机器人的直接比较 (A direct comparison of GPT-3 and a Rasa chatbot)

Access to the beta for GPT-3 is still limited. The standard application process did not work for me, but I followed the advice in this video, and I was granted access a few days later.

GPT-3的Beta版访问仍然受到限制。标准的申请流程对我而言不起作用，但是我按照此视频中的建议进行操作，几天后我被授予访问权限。

Once I had access to GPT-3 I wanted to do a simple test to compare its capabilities with an existing application. Earlier this year I created a chatbot in Rasa to answer general questions about movies. This chatbot took 4 months to develop and was explicitly trained on an extensive movie dataset. For the comparison between GPT-3 and the Rasa chatbot, I picked 7 random questions from the regression test set for the chatbot. I compared the answers that Rasa and GPT-3 provided for these questions to get the results of the test.

访问GPT-3之后，我想做一个简单的测试，将其功能与现有应用程序进行比较。今年年初，我在Rasa中创建了一个聊天机器人，以回答有关电影的一般问题。这个聊天机器人花了4个月的时间开发，并经过广泛的电影数据集的明确培训。为了比较GPT-3和Rasa聊天机器人，我从聊天机器人的回归测试集中选择了7个随机问题。我比较了Rasa和GPT-3为这些问题提供的答案，以获得测试结果。

Rasa的测试结果 (Test results for Rasa)

To run the test with the Rasa chatbot, I started the trained Rasa model using the “rasa shell” command and interactively entered the questions. Here are the results, with my input in bold and the chatbot’s responses in regular font:

为了使用Rasa聊天机器人运行测试，我使用“ rasa shell”命令启动了受过训练的Rasa模型，并以交互方式输入了问题。结果如下，我的输入为粗体，聊天机器人的响应为常规字体：

Test responses from the Rasa chatbot

测试来自Rasa聊天机器人的响应

The Rasa chatbot got 6 of 7 questions right:

Rasa聊天机器人正确回答了7个问题中的6个：

The answer for the “list comedy vampire movies” question was incorrect. I tried several different variations of the question but the results were the same.
“列出喜剧吸血鬼电影”问题的答案不正确。我尝试了该问题的几种不同变体，但结果是相同的。
I counted the answer for the cast of The Ten Commandments as technically correct because the list returned by Rasa was indeed made up of cast members from the movie. However, the answer was not as expected since the two most memorable stars, Charlton Heston and Yul Brynner, were omitted.
我认为《十诫》演员阵容的答案在技术上是正确的，因为Rasa返回的列表确实由电影中的演员组成。但是，答案却不如预期，因为省略了两个最令人难忘的明星，查尔顿·赫斯顿和尤尔·布林纳。

GPT-3的测试结果 (Test results for GPT-3)

To run the test with GPT-3, I selected the Q&A preset in the Playground tab of the GPT-3 dashboard and entered the same questions as I had entered in the Rasa command line interface. Here are results, right out of the box, with my input in bold and GPT-3’s answers in regular font:

要使用GPT-3进行测试，我选择了GPT-3仪表板的“操场”选项卡中的“问答”预设，并输入了与在Rasa命令行界面中输入的相同的问题。以下是开箱即用的结果，我的输入为粗体，GPT-3的答案为常规字体：

Test responses from GPT-3

来自GPT-3的测试回复

GPT-3 got 5 of 7 questions completely correct. Of the two remaining test cases:

GPT-3的7个问题中有5个完全正确。在其余两个测试用例中：

Soylent Green is arguably funny — “Soylent Green is People!” — but I think that GPT-3 got it wrong by labelling this movie as a comedy.
Soylent Green可以说很有趣-“ Soylent Green是人！” -但我认为GPT-3通过将这部电影标记为喜剧来弄错了。
GPT-3 had a good answer for the “list comedy vampire movies” question, but it repeated a subset of the correct answer several times. Also, Fright Night was by far the best of the comedy vampire sub-genre, so I was disappointed that GPT-3 omitted it from the repetition.
GPT-3对于“列出喜剧吸血鬼电影”问题有很好的答案，但它多次重复了正确答案的一部分。另外，《 惊魂之夜》是迄今为止喜剧吸血鬼类中最好的，所以令我感到失望的是GPT-3从重复中省略了它。

I decided to see what happened when I provided with a few examples to help GPT-3 to answer the questions it didn’t answer correctly.

当我提供一些示例以帮助GPT-3回答其未正确回答的问题时，我决定看看会发生什么。

First, for movie genre, I provided a few examples and then asked again for the genre of Soylent Green:

首先，对于电影类型，我提供了一些示例，然后再次询问了Soylent Green的类型：

GPT-3 gets the right answer with a bit of prompting

GPT-3会提示您正确答案

As you can see, with a bit of prompting GPT-3 gets the correct answer for the genre of Soylent Green.

如您所见，稍加提示，GPT-3便获得了Soylent Green类型的正确答案。

I tried a similar approach for the overzealous response to the vampire comedy movie question:

对于吸血鬼喜剧电影问题的过分热情，我尝试了一种类似的方法：

Additional training didn’t help the vampire comedy movie query

额外的培训没有帮助吸血鬼喜剧电影查询

More examples of multi-genre responses didn’t help to get a correct answer to this question, and I got similar results with other multi-genre questions:

有关多体裁回答的更多示例无助于对该问题的正确答案，而其他多体裁问题也得到了类似的结果：

GPT-3 struggles with answering questions about multi-genre movies

GPT-3努力回答有关多类型电影的问题

结论 (Conclusion)

Does this limited test demonstrate that GPT-3 can replace Rasa? The simple answer is “no” for the following reasons:

这项有限的测试是否表明GPT-3可以取代Rasa？出于以下原因，简单的答案是“否”：

The use case of a movie trivia application is admittedly simplistic and the number of testcases in this exercise is very small. Such a limited test cannot, by itself, establish whether GPT-3 can replace Rasa.
电影琐事应用程序的用例很简单，此练习中的用例数量很少。这种有限的测试本身无法确定GPT-3是否可以代替Rasa。
The Rasa framework is remarkably flexible and well-developed, and it can be trained with specialized and current data. By contrast, GPT-3 was trained on data that was current up to October 2019. You can see the result below where GPT-3 isn’t able to provide the plot for Birds of Prey, which was released in 2020:
Rasa框架非常灵活且开发完善，并且可以使用专门的最新数据进行培训。相比之下，GPT-3接受了截至2019年10月的最新数据培训。您可以在下面的结果中看到GPT-3无法提供2020年发布的“猛禽 ”图的结果：

When Rasa fails, it is relatively easy to debug the root cause of the problem. GPT-3, by contrast, fails in unexpected ways, and nudging it back to a correct answer with examples can be hit and miss, as you can see from the results of the test described in this article.
当Rasa失败时，调试问题的根本原因相对容易。相比之下，GPT-3会以意想不到的方式失败，并将其拖回带有示例的正确答案会很容易出错，正如您从本文所述的测试结果可以看到的那样。

While this simple test does not demonstrate that GPT-3 can replace Rasa, it does yield an astonishing result: almost right out of the box, with no more than a couple of examples to correct one answer, GPT-3 matched the performance of a Rasa chatbot that required 4 months of painstaking training and development. With 4 months of effort, the Rasa chatbot can do exactly one thing — answer questions about movies. In addition to matching the performance of this Rasa chatbot, GPT-3 can tackle a massive range of additional problems, including natural language translation and generation of code from English text.

尽管此简单测试无法证明GPT-3可以代替Rasa，但确实可以产生惊人的结果：几乎可以立即使用，仅用几个示例来纠正一个答案，GPT-3的性能与Rasa聊天机器人需要4个月的艰苦训练和开发。经过4个月的努力，Rasa聊天机器人可以完成一件事-回答有关电影的问题。除了与该Rasa聊天机器人的性能相匹配之外，GPT-3还可以解决各种其他问题，包括自然语言翻译和从英文文本生成代码。

photo by author

作者照片

Returning to the steam locomotive analogy I introduced at the beginning of this article, I believe that while GPT-3 may not be the AI equivalent of Rocket, it is certainly showing the promise of Trevithick’s locomotives. 190 years ago, at the dawn of the railway age, the people who had grown rich shipping goods on canals had a choice. They could take comfort in the limitations and flaws of the nascent steam locomotives and assume that the age of canals would last forever, or they could embrace the new technology and prepare for the railway revolution. I believe that GPT-3 represents a new direction forward for AI because it is applicable to so many problem “out of the box” with little or no additional work. I also believe that the full potential of massive transformer language models is still to come, so the successors of GPT-3 will have an even bigger impact. The AI version of Rocket is on its way, and the canal barge owners of the AI world need to get ready.

回到我在本文开头介绍的蒸汽机车类比，我相信尽管GPT-3可能不等同于Rocket的AI，但它肯定显示了Trevithick机车的希望。 190年前，在铁路时代的曙光中，那些在运河上种了丰富的海运货物的人有了选择。他们可以对新生的蒸汽机车的局限性和缺陷感到安慰，并假设运河的寿命将永远持续下去，或者他们可以接受新技术并为铁路革命做准备。我相信GPT-3代表了AI的新方向，因为它适用于这么多“开箱即用”的问题，几乎不需要额外的工作。我还认为，大规模转换器语言模型的全部潜力仍在发挥作用，因此GPT-3的后继者将产生更大的影响。 AI版本的Rocket正在进行中，AI世界的运河驳船所有者需要做好准备。

You can find the code for the Rasa movie chatbot described in this article here: https://github/ryanmark1867/chatbot_production

您可以在此处找到本文所述的Rasa电影聊天机器人的代码： https ： //github/ryanmark1867/chatbot_production