Spark MapReduce 中的意外结果

编程入门 行业动态 更新时间:2024-10-26 03:24:00
本文介绍了Spark MapReduce 中的意外结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我是 Spark 的新手,想了解 MapReduce 是如何在后台完成的,以确保我正确使用它.这篇文章提供了一个很好的答案,但我的结果似乎不符合所描述的逻辑.我在 Scala 中运行 Spark 快速入门 指南命令行.当我正确地添加行长时,结果就很好了.总行长为 1213:

I'm new to Spark and want to understand how MapReduce gets done under the hood to ensure I use it properly. This post provided a great answer, but my results don't seem to follow the logic described. I'm running the Spark Quick Start guide in Scala on command line. When I do line length addition properly, things come out just fine. Total line length is 1213:

scala> val textFile = sc.textFile("README.md") scala> val linesWithSpark = textFile.filter(line => line.contains("Spark")) scala> val linesWithSparkLengths = linesWithSpark.map(s => s.length) scala> linesWithSparkLengths.foreach(println) Result: 14 78 73 42 68 17 62 45 76 64 54 74 84 29 136 77 77 73 70 scala> val totalLWSparkLength = linesWithSparkLengths.reduce((a,b) => a+b) totalLWSparkLength: Int = 1213

当我稍微调整它以使用 (a-b) 而不是 (a+b) 时,

When I tweak it slightly to use (a-b) instead of (a+b),

scala> val totalLWSparkTest = linesWithSparkLengths.reduce((a,b) => a-b)

根据这篇文章中的逻辑,我预计为-1185:

I expected -1185, according to the logic in this post:

List(14,78,73,42,68,17,62,45,76,64,54,74,84,29,136,77,77,73,70).reduce( (x,y) => x - y ) Step 1 : op( 14, 78 ) will be the first evaluation. x is 14 and y is 78. Result of x - y = -64. Step 2: op( op( 14, 78 ), 73 ) x is op(14,78) = -64 and y = 73. Result of x - y = -137 Step 3: op( op( op( 14, 78 ), 73 ), 42) x is op( op( 14, 78 ), 73 ) = -137 and y is 42. Result is -179. ... Step 18: op( (... ), 73), 70) will be the final evaluation. x is -1115 and y is 70. Result of x - y is -1185.

然而,奇怪的事情发生了:

However, something strange happens:

scala> val totalLWSparkTest = linesWithSparkLengths.reduce((a,b) => a-b) totalLWSparkTest: Int = 151

当我再次运行它时......

When I run it again...

scala> val totalLWSparkTest = linesWithSparkLengths.reduce((a,b) => a-b) totalLWSparkTest: Int = -151

谁能告诉我为什么结果是 151(或 -151)而不是 -1185?

Can anyone tell me why the result is 151 (or -151) instead of -1185?

推荐答案

之所以会这样,是因为减法既不结合也不可交换.让我们从关联性开始:

It happens because subtraction is neither associative nor commutative. Lets start with associativity:

(- (- (- 14 78) 73) 42) (- (- -64 73) 42) (- -137 42) -179

不一样

(- (- 14 78) (- 73 42)) (- -64 (- 73 42)) (- -64 31) -95

现在是交换性的时候了:

Now its time for commutativity:

(- (- (- 14 78) 73) 42) ;; From the previous example

不一样

(- (- (- 42 73) 78) 14) (- (- -31 78) 14) (- -109 14) -123

Spark 首先对各个分区应用 reduce,然后以任意顺序合并部分结果.如果您使用的函数不满足一个或两个标准,则最终结果可能是不确定的.

Spark first applies reduce on individual partitions and then merges partial results in arbitrary order. If function you use doesn't meet one or both criteria final results can be non-deterministic.

更多推荐

Spark MapReduce 中的意外结果

本文发布于:2023-11-13 14:15:34,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1584543.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:意外   Spark   MapReduce

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!