将我的代码从OCaml / F#映射到Scala

编程入门 行业动态 更新时间:2024-10-19 17:28:44
将我的代码从OCaml / F#映射到Scala - 一些问题(Mapping my code from OCaml/F# to Scala - some questions)

我在空闲时间学习Scala,作为学习练习,我将另一个StackOverflow问题中写的一些OCaml代码翻译成Scala。 由于我是斯卡拉新手,我会很感激一些建议......

但在问我的问题之前 - 这是原始的OCaml代码:

let visited = Hashtbl.create 200000

let rec walk xx yy =
    let addDigits number =
        let rec sumInner n soFar =
            match n with
            | x when x<10  -> soFar+x
            | x -> sumInner (n/10) (soFar + n mod 10) in
        sumInner number 0 in
    let rec innerWalk (totalSoFar,listOfPointsToVisit) =
        match listOfPointsToVisit with
        | [] -> totalSoFar
        | _ ->
            innerWalk (
                listOfPointsToVisit
                (* remove points that we've already seen *)
                |> List.filter (fun (x,y) ->
                    match Hashtbl.mem visited (x,y) with
                    | true -> false (* remove *)
                    | _    -> (Hashtbl.add visited (x,y) 1 ; true))
                (* increase totalSoFar and add neighbours to list *)
                |> List.fold_left
                    (fun (sum,newlist) (x,y) ->
                        match (addDigits x)+(addDigits y) with
                        | n when n<26 ->
                            (sum+1,(x+1,y)::(x-1,y)::(x,y+1)::(x,y-1)::newlist)
                        | n -> (sum,newlist))
                    (totalSoFar,[])) in
    innerWalk (0,[(xx,yy)])

let _ =
    Printf.printf "Points: %d\n" (walk 1000 1000)
 

...以下是我将它翻译为的Scala代码:

import scala.collection.mutable.HashMap

val visited = new HashMap[(Int,Int), Int]

def addDigits(number:Int) = {
    def sumInner(n:Int, soFar:Int):Int =
      if (n<10)
        soFar+n
      else
        sumInner(n/10, soFar+n%10)
    sumInner(number, 0)
}

def walk(xx:Int, yy:Int) = {
    def innerWalk(totalSoFar:Int, listOfPointsToVisit:List[(Int,Int)]):Int = {
        if (listOfPointsToVisit.isEmpty) totalSoFar
        else {
            val newStep = 
                listOfPointsToVisit.
                // remove points that we've already seen
                filter(tupleCoords => {
                    if (visited.contains(tupleCoords))
                        false
                    else {
                        visited(tupleCoords)=1 
                        true
                    }
                }).
                // increase totalSoFar and add neighbours to list
                foldLeft( (totalSoFar,List[(Int,Int)]()) )( (state,coords) => {
                    val (sum,newlist) = state
                    val (x,y) = coords
                    if (addDigits(x)+addDigits(y) < 26)
                        (sum+1,(x+1,y)::(x-1,y)::(x,y+1)::(x,y-1)::newlist)
                    else
                        (sum,newlist)
                });
            innerWalk(newStep._1, newStep._2)
        }
    }
    innerWalk(0, List((xx,yy)))
}

println("Points: " + walk(1000,1000))
 

Scala代码编译并正确工作,报告正确的结果。

但...

除非我错过了某些东西,否则我在Scala中找不到管道运算符(即OCaml和F#的|> ),所以我使用了相应的列表方法( filter和fold Left )。 在这种情况下,最终结果与原始结果非常接近,但我想知道 - 对于功能型解决方案而言,管道运营商是不是一种普遍有利的和更通用的方法? 为什么斯卡拉没有装备它?

在Scala中,我必须特别地启动我的折叠状态(这是一个元组(Int, List[Int,Int])和一个类型特定的空列表。用普通的话来说, List()并没有削减它 -要明确指定List[(Int,Int)]() ,否则我得到一个...非常困难的错误消息。我基于上下文解密它 - 它抱怨Nothing - 我意识到这个小代码中唯一的地方,类型Nothing出现可能是我的空列表。仍然,结果是丑陋的,相比OCaml ...有什么更好的我可以做什么?

同样,OCaml能够将折叠的结果元组作为innerWalk的参数。 在Scala中,我必须分配给一个变量,并用innerWalk(newStep._1, newStep._2)调用尾递归调用innerWalk(newStep._1, newStep._2) 。 在元组和函数参数之间似乎没有等价关系 - 也就是说,我不能在具有两个参数的函数中传递一个2元组的元组 - 我同样不能将元组的参数解构成变量的函数参数不得不明确地分配state和coords并在折叠函数的内部coords它们,我错过了什么?

总的来说,我对结果感到满意 - 我想说,如果我们将这个例子的OCaml代码分级为100%,那么Scala大约为85-90% - 这比OCaml更详细一点,但是它非常多更接近OCaml而不是Java。 我只是想知道我是否使用了Scala的全部潜力,或者是否遗漏了一些可以改进代码的构造(更可能)。

请注意,我避免将我的原始OCaml的模式匹配映射到Scala,因为在这种情况下,我认为它是过度的 - if表达式在两个地方都更清晰。

预先感谢任何帮助/建议。

PS附注 - 我在walk呼叫周围添加了计时指令(从而避免了JVM的启动成本),并测量了我的Scala代码 - 它运行速度约为OCaml的50% - 这足够有趣,速度完全相同退出Mono执行F#等效代码(如果您关心这种比较,请参阅我的原始SO问题以获取F#代码)。 由于我目前在企业环境中工作,50%的速度是我很乐意花费的代价来编写简洁的类似ML的代码, 并且仍然可以访问广泛的JVM / .NET生态系统(数据库,Excel文件生成等) 。 对不起OCaml,我曾尝试过你 - 但你不能完全“说”Oracle :-)

编辑1 :在@senia和@lmm提供的建议之后,代码显着改进 。 希望从@lmm获得更多关于foldMap和Shapeless如何帮助的建议:-)

编辑2 :我用scalaz的flatMap进一步清理了代码 - gist就在这里 。 不幸的是,这种变化还导致了10倍的放缓 - 猜测由foldMap完成的列表连接要比foldLeft的“只添加一个新节点”慢得多。 想知道如何改变代码以快速添加...

编辑3 :在@lmm的另一个建议之后,我将使用List的scalaz-flatMap版本切换为使用immutable.Vector :这有很大的帮助,速度从10倍慢回到...慢了2倍(比原始代码慢)。 那么,干净的代码还是2x速度? 决定,决定...... :-)

I am learning Scala in my free time - and as a learning exercise, I translated some OCaml code that I wrote about in another StackOverflow question to Scala. Since I am new to Scala, I'd appreciate some advice...

But before asking my questions - here's the original OCaml code:

let visited = Hashtbl.create 200000

let rec walk xx yy =
    let addDigits number =
        let rec sumInner n soFar =
            match n with
            | x when x<10  -> soFar+x
            | x -> sumInner (n/10) (soFar + n mod 10) in
        sumInner number 0 in
    let rec innerWalk (totalSoFar,listOfPointsToVisit) =
        match listOfPointsToVisit with
        | [] -> totalSoFar
        | _ ->
            innerWalk (
                listOfPointsToVisit
                (* remove points that we've already seen *)
                |> List.filter (fun (x,y) ->
                    match Hashtbl.mem visited (x,y) with
                    | true -> false (* remove *)
                    | _    -> (Hashtbl.add visited (x,y) 1 ; true))
                (* increase totalSoFar and add neighbours to list *)
                |> List.fold_left
                    (fun (sum,newlist) (x,y) ->
                        match (addDigits x)+(addDigits y) with
                        | n when n<26 ->
                            (sum+1,(x+1,y)::(x-1,y)::(x,y+1)::(x,y-1)::newlist)
                        | n -> (sum,newlist))
                    (totalSoFar,[])) in
    innerWalk (0,[(xx,yy)])

let _ =
    Printf.printf "Points: %d\n" (walk 1000 1000)
 

...and here's the Scala code I translated it to:

import scala.collection.mutable.HashMap

val visited = new HashMap[(Int,Int), Int]

def addDigits(number:Int) = {
    def sumInner(n:Int, soFar:Int):Int =
      if (n<10)
        soFar+n
      else
        sumInner(n/10, soFar+n%10)
    sumInner(number, 0)
}

def walk(xx:Int, yy:Int) = {
    def innerWalk(totalSoFar:Int, listOfPointsToVisit:List[(Int,Int)]):Int = {
        if (listOfPointsToVisit.isEmpty) totalSoFar
        else {
            val newStep = 
                listOfPointsToVisit.
                // remove points that we've already seen
                filter(tupleCoords => {
                    if (visited.contains(tupleCoords))
                        false
                    else {
                        visited(tupleCoords)=1 
                        true
                    }
                }).
                // increase totalSoFar and add neighbours to list
                foldLeft( (totalSoFar,List[(Int,Int)]()) )( (state,coords) => {
                    val (sum,newlist) = state
                    val (x,y) = coords
                    if (addDigits(x)+addDigits(y) < 26)
                        (sum+1,(x+1,y)::(x-1,y)::(x,y+1)::(x,y-1)::newlist)
                    else
                        (sum,newlist)
                });
            innerWalk(newStep._1, newStep._2)
        }
    }
    innerWalk(0, List((xx,yy)))
}

println("Points: " + walk(1000,1000))
 

The Scala code compiles and works correctly, reporting the proper result.

But...

Unless I missed something, I found no pipe operator in Scala (i.e. the |> of OCaml and F#) so I used the corresponding list methods (filter and fold Left). In this case the end result is pretty close to the original, but I am wondering - isn't the pipe operator a generally favorable - and more generic - approach for functional-style solutions? Why isn't Scala equipped with it?

In Scala, I had to specifically initiate my folding state (which is a tuple of (Int, List[Int,Int]) with a type-specific empty list. In plain words, List() didn't cut it - I had to explicitly specify List[(Int,Int)](), otherwise I got a... rather difficult error message. I deciphered it based on context - it complained about Nothing - and I realized the only place in this tiny code where a type Nothing appeared could be my empty List. Still, the result is uglier, compared to OCaml... Is there anything better I can do?

In the same vein, OCaml was able to pass the fold's resulting tuple as an argument to innerWalk. In Scala, I had to assign to a variable and invoke the tail-recursive call with innerWalk(newStep._1, newStep._2). There appears to be no equivalence between tuples and function arguments - i.e. I can't pass a tuple of 2-arity in a function with two arguments - and similarly, I can't tuple-destructure the arguments of a function to variables (I had to explicitely assign state and coords and de-structure them inside the folding function's body. Am I missing something?

Overall, I am pleased with the result - I'd say that if we grade the OCaml code of this example at 100%, then Scala is at about 85-90% - it's a bit more verbose than OCaml, but it's much, much closer to OCaml than it is to Java. I am just wondering whether I used Scala to its full potential or whether I missed some constructs that would improve the code (more likely).

Note that I avoided mapping my original OCaml's pattern matching to Scala's, since in this case I think it was overkill - an if expression is much clearer in both places.

Thanks in advance for any help / suggestions.

P.S. A side note - I added timing instructions around the walk call (thus avoiding the startup cost of the JVM) and measured my Scala code - it runs at about 50% of OCaml's speed - which is, funnily enough, exactly the same speed I get out of Mono executing the F# equivalent code (see my original SO question to get the F# code, if you care about this sort of comparison). Since I currently work in enterprise environments, 50% speed is a price I'll gladly pay to write concise ML-like code and still get access to the vastness of the JVM/.NET ecosystems (databases, Excel-file generation, etc). Sorry OCaml, I did try you - but you can't fully "speak" Oracle :-)

EDIT 1: After the kind suggestions from @senia and @lmm, the code is significantly improved. Hoping for more advice from @lmm about how foldMap and Shapeless will additionally help :-)

EDIT 2: I cleared up the code further with flatMap from scalaz - gist is here. Unfortunately, the change also caused a massive 10x slowdown - guessing that the list concatenation done by foldMap is much slower than foldLeft's "add just one new node". Wondering how I can change the code to make the addition fast...

EDIT 3: After another suggestion from @lmm, I switched the scalaz-flatMap version from using List to using immutable.Vector: This helped a lot, bringing the speed from 10x slower back to... only 2x slower (than the original code). So, clean code or 2x speed? Decisions, decisions... :-)

最满意答案

斯卡拉斯确实提供了一个操作符,或者你可以自己写一个操作符。 一般来说,在Scala中对它的需求少得多,因为对象有方法,就像你在某些翻译中看到的那样(例如somethingThatReturnsList.filter(...) ,在OCaml中你必须写somethingThatReturnsList |> List.filter(...) ,所以它不是内置于语言中,但如果你需要它,它就在那里。 foldLeft有点一般; 你可以使用Scalaz foldMap编写更清晰的代码(在你的元组中你可能还需要shapeless-contrib,这样才能派生出适当的类型类实例)。 但从根本上来说,Scala类型推断将不如OCaml可靠,您将发现自己不得不添加显式类型注释(有时是因为不清楚Nothing错误消息) - 这是我们为允许传统OO extends继承而付出的代价。 你可以使用(innerWalk _).tupled来获得一个带有元组的函数。 或者你可以编写你的函数来接受元组,并利用参数auto-tupling来调用它们,而不用明确的元组语法。 但是,没有多参数函数的通用编码(您可以使用Shapeless将它们转换为该格式),我怀疑主要是因为JVM兼容性。 我怀疑,如果现在编写标准库,它将使用HList来处理所有事情,并且在普通函数和HList表示之间会有等价关系,但这将是一种向后兼容的非常艰难的改变。

你似乎使用了很多if s,并且有一些你正在做的功能,例如visited.put(tupleCoords, 1)返回值是否被替换的布尔值,所以你可以使用它作为您的filter调用的整个身体。 正如我所说,如果你愿意使用Scalaz, foldLeft可以被重写为更清晰的foldMap 。 我怀疑整个递归循环可以用一个命名构造来表达,但是没有什么可以立刻想到,所以也许不会。

Scalaz does provide a |> operator, or you can write one yourself. In general there's a lot less need for it in Scala because objects have methods, as you can see in some of your translation (e.g. somethingThatReturnsList.filter(...) where in OCaml you'd have to write somethingThatReturnsList |> List.filter(...). So it's not built into the language. But if you need it, it's out there. foldLeft is a bit general; you might be able to write clearer code using e.g. Scalaz foldMap (in the case of your tuple you might also need shapeless-contrib so that the appropriate typeclass instance is derived). But fundamentally yes, Scala type inference will be less reliable than OCaml and you will find yourself having to add explicit type annotations (sometimes because of unclear Nothing error messages) - it's the price we pay for allowing traditional-OO extends inheritance. You can use (innerWalk _).tupled to get a function that takes a tuple. Or you could write your functions to accept tuples and take advantage of argument auto-tupling to call them without explicit tuple syntax. But yeah, there is no generic encoding of multi-argument functions (you can use Shapeless to convert them into that form), I suspect largely because of JVM compatibility. I suspect that if the standard library were written now it would use HLists for everything and there would be an equivalence between ordinary functions and a HList representation, but this would be a very hard change to make in a backwards-compatible way.

You seem to be using quite a lot of ifs, and there are functions for some of what you're doing, e.g. visited.put(tupleCoords, 1) returns a boolean for whether a value was replaced, so you could use that as the entire body of your filter call. And as I said, if you're willing to use Scalaz the foldLeft could be rewritten as a clearer foldMap. I suspect the whole recursive loop could be expressed with a named construct, but nothing immediately comes to mind, so maybe not.

更多推荐

本文发布于:2023-07-23 18:39:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1235662.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:代码   OCaml   Scala

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!