使用基于 Spark 数据集的 ML API 时初始化逻辑回归系数?

编程入门行业动态更新时间:2024-10-27 03:38:28

本文介绍了使用基于 Spark 数据集的 ML API 时初始化逻辑回归系数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号..

默认情况下，逻辑回归训练将系数初始化为全零.但是，我想自己初始化系数.这将很有用，例如，如果之前的训练运行在多次迭代后崩溃——我可以简单地使用最后一组已知的系数重新开始训练.

By default, logistic regression training initializes the coefficients to be all-zero. However, I would like to initialize the coefficients myself. This would be useful, for example, if a previous training run crashed after several iterations -- I could simply restart training with the last known set of coefficients.

是否可以使用任何基于数据集/数据帧的 API，最好是 Scala?

Is this possible with any of the dataset/dataframe-based APIs, preferably Scala?

看Spark源码，好像有一个方法setInitialModel来初始化模型及其系数，.

Looking at the Spark source code, it seems that there is a method setInitialModel to initialize the model and its coefficients, but it's unfortunately marked as private.

基于 RDD 的 API 似乎允许初始化系数:LogisticRegressionWithSGD.run(...) 的重载之一接受 initialWeights 向量.但是，我想使用基于数据集的 API 而不是基于 RDD 的 API，因为(1)前者支持弹性网络正则化(我不知道如何使用基于 RDD 的逻辑回归来做弹性网络)和(2) 因为基于RDD的API处于维护模式.

The RDD-based API seems to allow initializing coefficients: one of the overloads of LogisticRegressionWithSGD.run(...) accepts an initialWeights vector. However, I would like to use the dataset-based API instead of the RDD-based API because (1) the former supports elastic net regularization (I couldn't figure out how to do elastic net with the RDD-based logistic regression) and (2) because the RDD-based API is in maintenance mode.

我总是可以尝试使用反射来调用私有的 setInitialModel 方法，但如果可能的话我想避免这种情况(也许这甚至行不通......我也无法分辨如果 setInitialModel 有充分的理由被标记为私有).

I could always try using reflection to call that private setInitialModel method, but I would like to avoid this if possible (and maybe that wouldn't even work... I also can't tell if setInitialModel is marked private for a good reason).

推荐答案

随意覆盖该方法.是的，您需要将该类复制到您自己的工作区中.没关系:不要害怕.

Feel free to override the method. Yes you will need to copy that class into your own work area. That's fine: do not fear.

当您构建项目时 - 无论是通过 maven 还是 sbt - 您的类的本地副本将获胜"并在 MLlib.幸运的是，同一个包中的其他类不会被着色.


When you build your project -either via maven or sbt - your local copy of the class will "win" and shade the one in MLlib.  Fortunately the other classes in that same package will not be shaded. 
我多次使用这种方法来覆盖 Spark 类:实际上您的构建时间也应该很短.
I have used this approach many times with overriding Spark classes: actually your build times should be small as well.

                        这篇关于使用基于 Spark 数据集的 ML API 时初始化逻辑回归系数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！



 更多推荐
[db:关键词]



 

本文发布于:2023-04-19 01:51:08，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/951982.html


版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

      本文标签：初始化   系数   逻辑   数据   Spark
      
        上一篇： XXXXXXSS 
        下一篇： chomd 777


    
     
    
    
     
      
      
      
        发布评论取消回复
        
          	
            
            
            
            
          
            
            
          
          
          
            
              
            
          
          
            
            
              
                
                 
              
            
          
        
      
       
      
      
        
           评论列表 （有 0 条评论）


  
  	
      最近发表
      
        荆门网站建设的重要性
        win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法
        您可以尝试添加 --skip-broken 选项来解决该问题  您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案
        关于无线网络波动大的解决办法
        Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法
        VS 2019 点击页面自动定位到解决方案资源管理器目录位置
        （亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法
        Typora官网下载的最新版本mac10.13以下版本用不了的解决办法
        成功解决ModuleNotFoundError: No module named ‘torch._C‘
        MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题
      
    
      
        
        
          >www.elefans.com
          编程频道|电子爱好者 - 技术资讯及电子产品介绍！
          

          
        
      
      
      
      
      
      
      
      
      
      
            
      
      
        热门文章
        
          
             从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止  
             币安API错误代码1102，未发送强制参数“时间戳”  
             如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题  
             在 Node.js 中从网络流创建 blob  
             使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？  
             使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流  
             如何从nodejs连接laravel>laravel  
             使用nodejs观看目录  
             如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？  
             FirebaseError：无法从.env加载环境变量  
          
        
      
      
      
      
      
      
            
      
      
          标签列表
          
            文件
            如何在
            Python
            系统
            java
            方法
            数据
            错误
            windows
            函数
            android
            linux
            教程
            如何使用
            代码
            字符串
            计算机
            电脑
            服务器
            NET
            应用程序
            数组
            PHP
            MySQL
            SQL
            对象
            项目
            程序
            数据库
            word